PRIS at 2012 Microblog Track
Abstract
Take account of that most tags are keyword rich and indicate the topic of tweets directly, but there was no space between two words. So Word Segmentation was used to separate the tags by space. This time we used the former max matching algorithm. The problem is that no dictionary is appropriate. Common words dictionary is partial and Oxford Dictionary doesn t distinguish plurality. Then we made a combination of Common words dictionary, Oxford Dictionary and D.A.B. (Dictionary of American Biography). But due to abbreviation and unknown words, there is still some mistakes. To avoid undesirable influence from these mistakes, we remained both the original tags and separated tags.
Document Details
- Document Type
- Technical Report
- Publication Date
- Nov 01, 2012
- Accession Number
- ADA581499
Entities
People
- Jiayue Zhang
- Jie Yin
- Jun Guo
- Qianqian Wang
- Sijia Chen
- Weiran Xu
- Yue Liu
Organizations
- Beijing University of Posts and Telecommunications