Understanding of Navy Technical Language via Statistical Parsing

Abstract

A key problem in indexing technical information is the interpretation of technical words and word senses, expressions not used in everyday language. This is important for captions on technical images, whose often pithy descriptions can be valuable to decipher. We describe the natural-language processing for MARIE-2, a natural-language information retrieval system for multimedia captions. Our approach is to provide general tools for lexicon enhancement with the specialized words and word senses, and to learn word usage information (both on word senses and word-sense pairs) from a training corpus with a statistical parser. Innovations of our approach are in statistical inheritance of binary co-occurrence probabilities and in weighting of sentence subsequences. MARIE-2 was trained and tested on 616 captions (with 1009 distinct sentences) from the photograph library of a Navy laboratory. The captions had extensive nominal compounds, code phrases, abbreviations, and acronyms, but few verbs, abstract nouns, conjunctions, and pronouns. Experimental results fit a processing time in seconds of 876.2 0858.0 n and a number of tries before finding the best interpretation of 668 .1809.1 n where n is the number of words in the sentence. Use of statistics from previous parses definitely helped in reparsing the same sentences, helped accuracy in parsing of new sentences, and did not hurt time to parse new sentences. Word-sense statistics helped dramatically; statistics on word-sense pairs generally helped but not always.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 2004
Accession Number
ADA465750

Entities

People

  • Neil C. Rowe

Organizations

  • Naval Postgraduate School

Tags

Communities of Interest

  • Air Platforms
  • Space
  • Weapons Technologies

DTIC Thesaurus Topics

  • Aircrafts
  • Airframes
  • Artificial Intelligence
  • Computational Linguistics
  • Computer Programs
  • Computer Science
  • Fuel Air Explosives
  • Grammars
  • Information Processing
  • Information Retrieval
  • Information Science
  • Information Systems
  • Language
  • Linguistics
  • Natural Language Processing
  • Natural Languages
  • Statistics

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Translation