The Rate of Progress in Natural Language Processing

Abstract

The rate of progress in natural language processing has been disappointing to many, including myself. It is not just that the popular press has had overblown expectations, but that we at this meeting have. The consequences of these errors could be severe. Hopefully, this short note will give an accurate evaluation of our rate of progress, identify what some of the problems have been, and present some reasonable suggestions on what can be done to improve the situation. Given that we want to take our ideas down the chain from theoretical research to empirical study and beyond and that natural language is an extremely difficult task, how can we proceed? There is only one answer: work within our current limits. Let's treat our work as that of successive approximations. Let us forget about the unexplored problems for the time being. Let us see what we can really do with the proposals we have that seem to work. Basically, let us emphasize building systems and full-scale components for a while. For example, why don't a group of us take the best parser, the best semantic interpreter, the best generator, the best inference system, etc., and tie them together? Then let's pick a domain of discourse and make them work for more than a few sentences. Let's beat on them until they work for as much of language as they appear capable. While we are at it, let's make the system as fast, as robust, as portable, as maintainable, etc., as we possibly can. Similarly, let's beat on individual components in the same way. I know there is no guarantee this approach will produce a useful system or component. But even if we fail to produce something worth going further with, we will have learned a lot about what works and what doesn't. If those results are not allowed to be lost, the next effort can do better. Of course, a problem with this approach lies in the source of our funds.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1987
Accession Number
ADA460356

Entities

People

  • Norman K. Sondheimer

Organizations

  • University of Southern California

Tags

DTIC Thesaurus Topics

  • Applied Computer Science
  • California
  • Case Studies
  • Computer Languages
  • Computing-Related Activities
  • Formal Languages
  • Information Operations
  • Information Science
  • Language
  • Linguistics
  • Machine Translation
  • Mathematics
  • Natural Language Processing
  • Natural Languages
  • Scientific Research
  • Social Sciences

Readers

  • Computational Linguistics
  • Educational Psychology

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Machine Translation