Normalization of Natural Language for Information Retrieval.

Abstract

Work on the first component of the Linguistic Research System, a general language data processing system, is described. This 'normalization' component analyzes sentences in natural language and assigns to each sentence its semantic reading or readings. Further, it reduces to one representation paraphrases of sentences which result from syntactic transformations, substitution of synonyms with identical parts of speech, substitution of synonyms with different parts of speech, and substitution of terms of length one by synonymous lexical collocations. The normalization component processes text in natural language by means of a 'subscript grammar', a system of rule schemata which permits the recognition algorithms to construct context-free grammar rules. (Author)

Document Details

Document Type
Technical Report
Publication Date
Feb 01, 1972
Accession Number
AD0737740

Entities

People

  • R. A. Stachowitz
  • W. P. Lehmann

Organizations

  • University of Texas at Austin

Tags

DTIC Thesaurus Topics

  • Algorithms
  • Context Free Grammars
  • Data Processing
  • Grammars
  • Information Retrieval
  • Language
  • Linguistics
  • Natural Languages
  • Recognition

Readers

  • Computational Linguistics

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Translation