Theses

Learning to Rank Relevant Files for Bug Reports Using Domain knowledge, Replication and Extension of a Learning-to-Rank Approach

Abstract

Bug localization is one of the most important stages of the bug fixing process. Bad practices make the debugging a tedious task. Investigating bugs can contribute up to a large portion of the aggregate cost for a software project. An automated strategy that can provide a ranked list of source code files with respect to how likely they contain the root cause of the problem would help the development teams to decrease the search space and leads to increase in the productivity. In this work, I have replicated the bug localization approach presented in \cite{ye2014learning} that applies the learning-to-rank technique to rank the relevant files for each bug. This technique applies domain knowledge by evaluating the textual similarity between bug reports and source code files and API specification documents plus bug fixing and code alteration history. For a given bug report, the ranking function is constructed based on the linear combination of weighted features where the features are trained on previously solved bug reports. In addition to replication of the mentioned technique, I have extended the study by evaluating the role of different text preprocessing techniques such as Stemming and Lemmatization and also a randomized selection of training folds on the overall performance of the ranking model. I found that Lemmatization of the words and randomized selection of the training folds have an adverse effect on the performance of the ranking model and consequently having lower accuracy and precision of the results.

Library of Congress Subject Headings

Ranking and selection (Statistics); Debugging in computer science; Pattern perception; Natural language processing (Computer science); Information retrieval

Publication Date

5-2018

Document Type

Thesis

Student Type

Graduate

Degree Name

Software Engineering (MS)

Department, Program, or Center

Software Engineering (GCCIS)

Advisor

Mohamed Wiem Mkaouer

Advisor/Committee Member

Christian Newman

Advisor/Committee Member

Yasmin El-Glaly

Recommended Citation

Safdari, Nasir, "Learning to Rank Relevant Files for Bug Reports Using Domain knowledge, Replication and Extension of a Learning-to-Rank Approach" (2018). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/9770

Campus

RIT – Main Campus

Plan Codes

SOFTENG-MS

Download

COinS

Theses

Learning to Rank Relevant Files for Bug Reports Using Domain knowledge, Replication and Extension of a Learning-to-Rank Approach

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Search

Browse

Author Corner

RIT Links

Theses

Learning to Rank Relevant Files for Bug Reports Using Domain knowledge, Replication and Extension of a Learning-to-Rank Approach

Author

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Share

Search

Browse

Author Corner

RIT Links