Abstract

Datasets are crucial to advance automated software traceability research. Acquiring such datasets come in a high cost and require expert knowledge to manually collect and validate them. Obtaining such software development datasets has been one of the most frequently reported barrier for researchers in the software engineering domain in general. This problem is even more acute in field of requirement traceability, which plays crucial role in safety critical and highly regulated systems. Therefore, the main motivation behind this work is to analyze the current state of art of datasets used in the field of software traceability.

This work presents a first-of-its-kind literature study to review and assess the datasets that have been used in software traceability research over the last fifteen years. It articulates several attributes related to these datasets such as their characteristics, threats and diversity.

Firstly, 202 primary studies (refer Appendix A) were identified for purpose of this study, which were used to derive 73 unique datasets. These 73 datasets were studied in-depth and several attributes (size, type, domain, availability, artifacts) were extracted (refer Appendix B). Based on analysis of the primary studies, a threat to validity reference model, tailored to Software traceability datasets was derived (refer to figure 4.4). Furthermore, to put some light upon the dataset diversity trend in the Software traceability community, a metric called Dataset Diversity Ratio was derived for 38 authors (refer to figure 4.5) who have published more than one publication in field of software traceability.

Library of Congress Subject Headings

Requirements engineering--Data processing; Software engineering--Data processing; Database management

Publication Date

12-2017

Document Type

Thesis

Student Type

Graduate

Degree Name

Software Engineering (MS)

Department, Program, or Center

Software Engineering (GCCIS)

Advisor

Mehdi Mirakhorli

Advisor/Committee Member

J. Scott Hawker

Recommended Citation

Sharma, Palak, "Datasets Used in Fifteen Years of Automated Requirements Traceability Research" (2017). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/9692

Campus

RIT – Main Campus

Plan Codes

SOFTENG-MS

Download

COinS

Theses

Datasets Used in Fifteen Years of Automated Requirements Traceability Research

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Search

Browse

Author Corner

RIT Links

Theses

Datasets Used in Fifteen Years of Automated Requirements Traceability Research

Author

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Share

Search

Browse

Author Corner

RIT Links