Towards Locating and Exploring Hard-to-Find Information on the Web

Abstract

This work developed new methods and tools to empower subject matter experts to effectively discover and track information on the Web that is relevant to a given task (or domain). Our approach consists of two main components that address these challenges: 1) Domain discovery; and 2) Crawling and information gathering. For each of these components we have designed new methods, and developed open-source tools that implement these methods. Notably, we have designed a new framework that facilitates domain discovery, organization and presentation. We have also developed a general and extensible crawling infrastructure that substantially extends the ACHE open-source focused crawler to support complex crawling tasks and multiple crawling strategies to discover new content in a timely manner.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 2018
Accession Number
AD1061874

Entities

People

  • AĆ©cio Santos
  • Juliana Freire
  • Kien Pham
  • Sonia Quispe
  • Yamuna Krishnamurthy

Organizations

  • New York University

Tags

Communities of Interest

  • Autonomy
  • Cyber
  • Materials and Manufacturing Processes
  • Weapons Technologies

DTIC Thesaurus Topics

  • Air Force
  • Air Force Research Laboratories
  • Control Panels
  • Dark Web
  • Data Mining
  • Government Procurement
  • Human Trafficking
  • Information Science
  • Infrastructure
  • Machine Learning
  • Network Science
  • New York
  • Simulations
  • Supervised Machine Learning
  • User Interface
  • Web Browsers
  • Websites

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Robotics and Automation.
  • Systems Analysis and Design