A Language-Independent Approach to Automatic Text Difficulty Assessment for Second-Language Learners

Abstract

In this paper, we introduce a new baseline for language-independent text difficulty assessment applied to the Interagency Language Roundtable (ILR) proficiency scale. We demonstrate that reading level assessment is a discriminative problem that is best-suited for regression. Our baseline uses z-normalized shallow length features and TF-LOG weighted vectors on bag-of-words for Arabic, Dari, English, and Pashto. We compare Support Vector Machines and the Margin-Infused Relaxed Algorithm measured by mean squared error. We provide an analysis of which features are most predictive of a given level.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 01, 2013
Accession Number
ADA595522

Entities

People

  • Elizabeth Salesky
  • Jennifer Williams
  • Tamas Marius
  • Wade Shen

Organizations

  • Massachusetts Institute of Technology

Tags

Communities of Interest

  • Human Systems

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Applied Psychology
  • Automatic
  • Computational Linguistics
  • Computer Languages
  • Foreign Languages
  • Language
  • Linguistics
  • Machine Learning
  • Natural Language Processing
  • Supervised Machine Learning
  • United States

Readers

  • Computational Linguistics
  • Computational Modeling and Simulation
  • Instructional Design and Training Evaluation.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Machine Translation