Reducing the Burden of Massive Training Data for Deep Learning
Abstract
Over the past decade, the amount of effort in deep learning (machine learning with many layered neural networks) has increased exponentially due to the impressive performance of these networks on historically difficult problems, such as computer vision, understanding natural languages, and decision-making. Correspondingly, the importance of deep learning to the Navy has become increasingly clear in the past several years. This impressive performance by neural networks is generally achieved by supervised learning, in which the model trains on large, balanced labeled datasets. Studies have shown that more training data allows networks to reach higher accuracy and generalize better, which has led to labeled datasets typically containing thousands to millions of labeled images. Consequently, research in deep learning has exploded following the creation of large benchmark datasets, such as ImageNet and the Enron dataset. While raw data is often plentiful for real world applications, labeling data is hard. Manually labeling thousands of data samples is labor intensive and can be a barrier in practice. Furthermore, in numerous fields (such as medicine, defense, and other scientific fields) a limited number of people who are experts in their fields can often only correctly classify data. Therefore, the goal of this project was to be able to achieve high performance without having to manually label massive amounts of data (i.e., learning with limited labeled data).
Document Details
- Document Type
- Technical Report
- Publication Date
- Aug 30, 2021
- Accession Number
- AD1149304
Entities
People
- Leslie N. Smith
Organizations
- United States Naval Research Laboratory