Accurate Arabic Script Language/Dialect Classification
Abstract
Correctly identifying the language/dialect of a text is a critical first step for many natural language processing systems, including machine translation systems. To date, most language identification efforts have focused on distinguishing between European languages. Increasingly, historically-unwrittenArabic dialects are appearing online in social media. This report describes state-of-the-art classifiers for automatically distinguishing between Arabic script languages and between Arabic dialects.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 2014
- Accession Number
- ADA597898
Entities
People
- Stephen C. Tratz
Organizations
- United States Army Research Laboratory