RAND Corporation Data in Systran. Volume 2.

Abstract

NTS SOME EMPIRICAL LINGUISTIC FINDINGS BASED ON A MILLION-WORD Russian corpus with syntactic annotations. The corpus, consisting of Russian mathematics, physics, cybernetics, astrobotany and physiology, has been produced by the Rand Corp., Santa Monica, California and converted for use by SYSTRAN language-analysis processing procedures. Since all syntagmas are explicitly marked in the Rand data base, little or no contextual reference is necessary in order to establish semosyntactic relationships that may be utilized as the most essential components of an automatic parser for S+T text. Volume II deals with text statistics, the bulk of which is high-frequency wordlists in descending frequency order as well as alphabetical order for both individual and combined subject matters. (Modified author abstract)

Document Details

Document Type
Technical Report
Publication Date
Aug 01, 1973
Accession Number
AD0769560

Entities

People

  • Ludek A. Kozlik
  • Peter P. Toma

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Automatic
  • California
  • Corporations
  • Cybernetics
  • Databases
  • Frequency
  • Information Science
  • Language
  • Mathematics
  • Physiology
  • Statistics

Readers

  • Academic Conference Management
  • Computational Linguistics