CRL/Brandeis: The DIDEROT System
Abstract
Diderot is an information extraction system built at CRL and Brandeis University over the past two years. It was produced as part of our efforts in the Tipster project. The same overall system architecture has been used for English and Japanese and for the micro-electronics and joint venture domains. The past history of the system is discussed and the operation of its major components described. A summary of scores at the 24 month workshop is given. Because of the emphasis on different languages and different subject areas the research has focused on the development of general purpose, re-usable techniques. The CRL/Brandeis group have implemented statistical methods for focusing on the relevant parts of texts, programs which recognize and mark names of people, places and organizations and also dates. The actual analysis of the critical parts of the texts is carried out by a parser controlled by lexical structures for the `key' words in the text. To extend the system's coverage of English and Japanese some of the content of these lexical structures was derived from machine readable dictionaries. These were then enhanced with information extracted from corpora.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 1993
- Accession Number
- ADA461001
Entities
People
- J. Wang
- James Pustejovsky
- Jim Cowie
- Louise Guthrie
- Rong Wang
- Scott Waterman
- Takahiro Wakao
- William Ogden
- Yorick Wilks
Organizations
- New Mexico State University