Arabic Natural Language Processing System Code Library

Abstract

This technical note provides a brief description of a Java library for Arabic natural language processing (NLP) containing code for training and applying the Arabic NLP system described in the paper "A Cross-Task Flexible Transition Model for Arabic Tokenization, Affix Detection, Affix Labeling, POS Tagging, and Dependency Parsing" by Stephen Tratz presented at the Statistical Parsing of Morphologically Rich Languages (SPMRL) workshop held in Seattle in conjunction with the Empirical Methods in Natural Language Processing (EMNLP) conference of October 2013. The system is capable of clitic separation, inflectional affix identification and labeling, part-of-speech tagging, and dependency parsing for Arabic. The code, which is extended from previously released graduate student code, also supports English part-of-speech tagging, dependency parsing, and semantic disambiguation tasks. In general, the code library is expected to be of most value to natural language processing researchers.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 2014
Accession Number
ADA603814

Entities

People

  • Stephen C. Tratz

Organizations

  • United States Army Research Laboratory

Tags

DTIC Thesaurus Topics

  • Computational Linguistics
  • Computer Programs
  • Converters
  • Department Of Defense
  • Directories
  • Governments
  • Information Science
  • Instructions
  • Language
  • Linguistics
  • Military Research
  • Natural Language Processing
  • Natural Language Understanding
  • Natural Languages
  • Students
  • Training
  • United States Government

Fields of Study

  • Computer science

Readers

  • Academic Conference Management
  • Computational Linguistics

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation