Monitoring Business Activity

Abstract

Under this project, the authors studied and developed technologies to "score" entities to build models that will produce an estimate of the likelihood that an entity exhibits some characteristic. For example, a social network may include malicious individuals. Suspicion scoring assigns a numeric value to each entity in the network, representing the estimated likelihood that the entity is malicious. The authors have focused on scoring entities that are interconnected in some sort of network and on techniques for building and using scoring models when important information is unknown, but may be acquired at a cost. This project has built an integrated toolkit, called Netkit, of methods for scoring networked entities, relaxing the standard assumption that entities to be scored are independent. NetKit has been applied to various benchmark networked data sets, showing that simple methods alone can produce remarkably good scores. Additional development and experimentation was conducted with NetKit's Relational Neighbor (RN) algorithms, which combine a form of guilt-by-association with collective inferencing in which the entire network is scored simultaneously, so that scores of related entities can affect each other. The RN algorithms were applied to the terrorist-world simulation data produced under another project within this program. The Automated Construction of Relational Attributes (ACORA) system addresses a particular characteristic of building and using scoring models with networked data, and other relational data. Under this project the authors introduced techniques for automatically constructing attributes from high-dimensional categorical attributes, and showed that they can consistently and sometimes dramatically improve modeling and scoring. They also have produced a collection of techniques and results focused on the problem of how to utilize information-gathering resources most cost-effectively, when building and using classification/scoring models.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 01, 2006
Accession Number
ADA446996

Entities

People

  • Foster Provost
  • Sofus Macskassy

Organizations

  • New York University

Tags

Communities of Interest

  • Autonomy
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Acquisition
  • Air Force
  • Air Force Research Laboratories
  • Birds
  • Case Studies
  • Commerce
  • Data Acquisition
  • Data Mining
  • Data Sets
  • Information Science
  • Knowledge Management
  • Linear Accelerators
  • Machine Learning
  • Network Science
  • Particle Physics
  • Predictive Modeling
  • Social Networks

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Database Systems and Applications
  • Systems Analysis and Design