CRL/NMSU and Brandeis: Description of the MucBruce System as Used for MUC-4
Abstract
Through their involvement in the Tipster project the Computing Research Laboratory at New Mexico State University and the Computer Science Department at Brandeis University are developing a method for identifying articles of interest and extracting and storing specific kinds of information from large volumes of Japanese and English texts. We intend that the method be general and extensible. The techniques involved are not explicitly tied to these two languages nor to a particular subject area. Development for Tipster has been going on since September, 1992. The system we have used for the MUC-4 tests has only implemented some of the features we plan to include in our final Tipster system. It relies intensively on statistics and on context-free text marking to generate templates. Some more detailed parsing has been added for a limited lexicon, but lack of fuller coverage places an inherent limit on its performance. Most of the information produced in our MUC templates is arrived at by probing the text which surrounds `significant' words for the template type being generated, in order to find appropriately tagged fillers for the template fields.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 1992
- Accession Number
- ADA461003
Entities
People
- James Pustejovsky
- Jim Cowie
- Louise Guthrie
- Scott Waterman
- Yorick Wilks
Organizations
- New Mexico State University