PRIS at 2012 Microblog Track

Abstract

Take account of that most tags are keyword rich and indicate the topic of tweets directly, but there was no space between two words. So Word Segmentation was used to separate the tags by space. This time we used the former max matching algorithm. The problem is that no dictionary is appropriate. Common words dictionary is partial and Oxford Dictionary doesn t distinguish plurality. Then we made a combination of Common words dictionary, Oxford Dictionary and D.A.B. (Dictionary of American Biography). But due to abbreviation and unknown words, there is still some mistakes. To avoid undesirable influence from these mistakes, we remained both the original tags and separated tags.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2012
Accession Number
ADA581499

Entities

People

  • Jiayue Zhang
  • Jie Yin
  • Jun Guo
  • Qianqian Wang
  • Sijia Chen
  • Weiran Xu
  • Yue Liu

Organizations

  • Beijing University of Posts and Telecommunications

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Biographies
  • Communication Systems
  • Computer Vision
  • Dictionaries
  • Engineering
  • Governments
  • Information Operations
  • Information Retrieval
  • Learning
  • Preprocessing
  • Resistance
  • Schools
  • Singapore
  • Standards
  • Word Lists

Readers

  • Applied Combinatorial Optimization and Logic Circuit Design.
  • Computational Linguistics
  • Information Retrieval

Technology Areas

  • Space