Accurate Arabic Script Language/Dialect Classification

Abstract

Correctly identifying the language/dialect of a text is a critical first step for many natural language processing systems, including machine translation systems. To date, most language identification efforts have focused on distinguishing between European languages. Increasingly, historically-unwrittenArabic dialects are appearing online in social media. This report describes state-of-the-art classifiers for automatically distinguishing between Arabic script languages and between Arabic dialects.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2014
Accession Number
ADA597898

Entities

People

  • Stephen C. Tratz

Organizations

  • United States Army Research Laboratory

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Computational Linguistics
  • Computational Science
  • Computer Languages
  • Identification
  • Information Science
  • Language
  • Linguistics
  • Machine Learning
  • Machine Translation
  • Military Research
  • Natural Language Processing
  • Natural Languages
  • Online Communications
  • Social Media
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Computational Linguistics

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation
  • AI & ML - Neural Networks