Entity Resolution Workflow Installation Process and User Guide
Abstract
Entity resolution, in the context of text processing and information extraction domain, refers to the process of uniquely disambiguating a specific person or an object that appears in a text. For instance, if John Smith appears in a document, entity resolution seeks to identify who that John Smith specifically refers to from available choices in a database. This report describes the setup and configuration of the U.S. Army Research Laboratory s (ARL) software implementation of an entity resolution algorithm called Relationship-based Data Cleaning (RelDC), which systematically exploits not only features but also relationships among entities for the purpose of disambiguation. (The main concept is that) RelDC views the database as a graph of entities that are linked to each other via relationships. It first utilizes a feature-based method to identify a set of candidate entities (choices) for a reference to be disambiguated. Graph theoretic techniques are then used to discover and analyze relationships that exist between the entity containing the reference and the set of candidates. * In order to demonstrate the RelDC entity resolution algorithm in an intuitive and seamless way, ARL developed an Entity Resolution Workflow (ERW).
Document Details
- Document Type
- Technical Report
- Publication Date
- Jul 01, 2013
- Accession Number
- ADA586761
Entities
People
- Michael H. Lee
Organizations
- United States Army Research Laboratory