CodeFault: Analyzing Human Dimensions of Software Engineering Processes
Abstract
We propose CodeFault, a system that automatically gathers software development behavior and issue/vulnerability data from a number of heterogenous sources and then mines that data using machine learning techniques to generate models that help predict a range of software bugs, including potential vulnerabilities. Sources we will gather from include: the code itself (as a document), source control systems, social coding sites, social media sources, discussion forums, and issue repositories. To learn predictors, we will identify raw (or derived) features from the combined set of sources, and employ ensemble-style machine learning techniques to do the learning. By predicting these faults, teams can proactively respond to potential flaws, as well as gradually learn what human behavioral trends that are conducive towards that problem. The result saves not only time and money for organizations, but more importantly helps ensure better security. CodeFault builds on previous technology we have built in both the vulnerability aggregation and code analysis space.
Document Details
- Document Type
- Technical Report
- Publication Date
- Apr 10, 2022
- Accession Number
- AD1170934