A Preliminary Statistical Investigation into the Impace of an N-Gram Analysis Approach Based on World Syntactic Categories Toward Text Author Classification

Abstract

Quantitative analysis of literary style has heretofore utilized semantic elements-word counts. This research attempts to identify quantifiable syntactic elements of style that can be used for author identification. The measurement of syntactic elements utilizes a dictionary with one part of speech per word and looks at phrases delimited by punctuation marks. Different size permutations of words - referred to as grams - are counted within each text. Correlations are measured amongst the gram frequencies of eight texts pertaining to four authors, both contemporary and non-contemporary. The correlations are performed across different gram sizes of words. The same treatment is applied to a target text, the Funeral Elegy text. The approach holds for classifying texts temporally consistently across the various gram sizes. Yet a finer grained investigation is required to certify the authorship of the Funeral Elegy text.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 2000
Accession Number
ADA455142

Entities

People

  • John Schuster
  • Mona Diab
  • Peter Bock

Organizations

  • University of Maryland

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Artificial Intelligence
  • Classification
  • Coefficients
  • Computer Science
  • Computers
  • Dictionaries
  • English Language
  • Frequency
  • Hypotheses
  • Language
  • Linguistics
  • Natural Languages
  • Permutations
  • Standards
  • Statistics
  • Universities

Readers

  • Aerosol Science/Aerosol Physics
  • Computational Linguistics
  • Systems Analysis and Design