Mining Developer Communications to Create a Web-Scale Repository of Documented and Analyzable Snippets
Abstract
Typical software developer communications, such as chat conversations or forum posts, contain both code snippets and natural language text that describes those code snippets. Thus, developer communications on the web are an important resource for large-scale mining of information about the functionality, quality, and other properties of code snippets. Consistent with the goals of the DARPA Mining and Understanding of Software Enclaves (MUSE) program, the goal of this research project was to enable targeted access to the software development knowledge captured in the code snippets, as well as the natural language text describing those code snippets, embedded within developer communications. Key research accomplishments that stem from this project include an understanding of what information about code snippets is available in different kinds of developer communications, an in-depth analysis of two kinds of developer communication to compare their efficacy in supporting software engineering tools, a new technique to extract the available information from a specific kind of developer communication, and a technique to enable web-scale code clone detection and search, and ultimately, curation of documented, analyzable code snippets extracted from developer communications.
Document Details
- Document Type
- Technical Report
- Publication Date
- Aug 01, 2018
- Accession Number
- AD1059446
Entities
People
- David Shepherd