Detecting the Difficulty Level of Foreign Language Texts

Abstract

This report describes experiments conducted on automatically determining the difficulty level of foreign language materials for the purpose of aiding teachers, students, and DoD linguists in finding suitable materials for supporting language learning and sustainment. The measure used as the indicator of difficulty is based on the Interagency Language Roundtable (ILR) proficiency scale, which is used to measure the proficiency levels of DoD linguists in listening, reading, speaking, writing, translating, and interpreting. The experiments described were conducted with a corpus of authentic Arabic and Mandarin Chinese materials from several genres that were hand-labeled for ILR level. The corpus contained materials at the 2, 2+, and 3 levels. ILR level detectors were built for these levels for both the original Arabic and Mandarin sources as well as for human-produced English translations of these sources. The detectors were based on statistical language modeling techniques. The equal error rates (EERs) obtained ranged from 12.4-49.4% depending on the language, ILR level, language model order, and various other factors related to the experimental design. In general, the performance was best for discriminating level 3 materials from level 2 and 2+ materials, with EERs ranging from 12.4-33.3% across the languages (and translations), language model level, and experimental design. The performance was worst for discriminating level 2+ materials from level 2 and 3 materials, with EERs ranging from 31.2-49.4%.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Feb 01, 2010
Accession Number
ADA516000

Entities

People

  • Eric G. Hansen
  • Raymond E. Slyh

Organizations

  • Air Force Research Laboratory

Tags

Communities of Interest

  • Biomedical
  • Human Systems

DTIC Thesaurus Topics

  • Air Force Research Laboratories
  • Applied Psychology
  • Computational Linguistics
  • Computational Science
  • Detection
  • Detectors
  • Experimental Design
  • Foreign Languages
  • Grammars
  • Information Science
  • Language
  • Linguistics
  • Machine Learning
  • Materials
  • Natural Language Processing
  • Supervised Machine Learning
  • Warning Systems

Fields of Study

  • Education

Readers

  • Computational Linguistics
  • Mathematics or Statistics
  • Speech Processing/Speech Recognition.