Automating Requirements Traceability Using Natural Language Processing: A Comparison of Information Retrieval Techniques
Abstract
This thesis compares histogram distance and cosine similarity measures used as information retrieval (IR) techniques in automated requirements tracing. We first build a software application that computes a Term FrequencyInverse Document Frequency (TD-IDF) matrix of a National Aeronautics and Space Administration (NASA) public requirements dataset; classify requirement pairs using each similaritymeasure across a variety of similarity thresholds; derive performance achieved by each IR-based similarity measure in terms of precision, recall and F-score; and compare them for real-world effectiveness when used for requirements tracing. Given the analyzed dataset, cosine similarity outperformed histogram distance with respect to overall precision and recall. Overall, further research is needed to yield higher levels of precision and recall for automated tracing methods, simplify automated tracing use, and to ultimately instill enough confidence in systems engineers to supplant time-consuming and error prone conventional requirements tracing methods.
Document Details
- Document Type
- Technical Report
- Publication Date
- Sep 01, 2021
- Accession Number
- AD1164341
Entities
People
- Christopher D Laliberte
Organizations
- Naval Postgraduate School