SAWUS: Siena's Automatic Wikipedia Update System

Abstract

The National Institute of Standards and Technology (NIST) has been running an annual Text Retrieval Competition and Conference (TREC) since 1992. This is a premier conference that offers researchers in the field of Computational Linguistics the opportunity to showcase their work and compare their results against other leading researchers. Our Siena research team participated in the TREC Knowledge Based Acquisition (KBA) Track which was offered for the first time in 2012. The objective of this track is to drive research into automatic acquisition of knowledge such as automatically updating Wikipedia by utilizing online news. Specifically our team of researchers developed a system that filters a stream of content for information that should be included on a given Wikipedia page. It was not yet clear how traditional Information Retrieval (IR) techniques perform for this task therefore we began with a baseline test using current state of the art IR techniques. We then went on to experiment with query expansion building a module that utilized Wikipedia Infoboxes to add terms to our query. This module was incorporated with our IR component to create SAWUS. Four submissions were sent to NIST to undergo a formal evaluation.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2012
Accession Number
ADA581303

Entities

People

  • Carl Tompkins
  • Sharon G. Small
  • Zachary Witter

Organizations

  • Siena College

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Abstracts
  • Acquisition
  • Artificial Intelligence
  • Automatic
  • Computational Linguistics
  • Computational Processes
  • Computational Science
  • Computing-Related Activities
  • Directories
  • Governments
  • Information Operations
  • Information Retrieval
  • Social Media
  • Standards
  • Test And Evaluation
  • United States
  • Universities

Fields of Study

  • Computer science

Readers

  • Academic Conference Management
  • Computational Linguistics
  • Information Retrieval

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval