Reducing the Burden of Massive Training Data for Deep Learning

Abstract

Over the past decade, the amount of effort in deep learning (machine learning with many layered neural networks) has increased exponentially due to the impressive performance of these networks on historically difficult problems, such as computer vision, understanding natural languages, and decision-making. Correspondingly, the importance of deep learning to the Navy has become increasingly clear in the past several years. This impressive performance by neural networks is generally achieved by supervised learning, in which the model trains on large, balanced labeled datasets. Studies have shown that more training data allows networks to reach higher accuracy and generalize better, which has led to labeled datasets typically containing thousands to millions of labeled images. Consequently, research in deep learning has exploded following the creation of large benchmark datasets, such as ImageNet and the Enron dataset. While raw data is often plentiful for real world applications, labeling data is hard. Manually labeling thousands of data samples is labor intensive and can be a barrier in practice. Furthermore, in numerous fields (such as medicine, defense, and other scientific fields) a limited number of people who are experts in their fields can often only correctly classify data. Therefore, the goal of this project was to be able to achieve high performance without having to manually label massive amounts of data (i.e., learning with limited labeled data).

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Aug 30, 2021
Accession Number: AD1149304

Entities

People

Leslie N. Smith

Organizations

United States Naval Research Laboratory

Reducing the Burden of Massive Training Data for Deep Learning

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas