A Preliminary Statistical Investigation into the Impace of an N-Gram Analysis Approach Based on World Syntactic Categories Toward Text Author Classification
Abstract
Quantitative analysis of literary style has heretofore utilized semantic elements-word counts. This research attempts to identify quantifiable syntactic elements of style that can be used for author identification. The measurement of syntactic elements utilizes a dictionary with one part of speech per word and looks at phrases delimited by punctuation marks. Different size permutations of words - referred to as grams - are counted within each text. Correlations are measured amongst the gram frequencies of eight texts pertaining to four authors, both contemporary and non-contemporary. The correlations are performed across different gram sizes of words. The same treatment is applied to a target text, the Funeral Elegy text. The approach holds for classifying texts temporally consistently across the various gram sizes. Yet a finer grained investigation is required to certify the authorship of the Funeral Elegy text.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jun 01, 2000
- Accession Number
- ADA455142
Entities
People
- John Schuster
- Mona Diab
- Peter Bock
Organizations
- University of Maryland