Towards Locating and Exploring Hard-to-Find Information on the Web

Abstract

This work developed new methods and tools to empower subject matter experts to effectively discover and track information on the Web that is relevant to a given task (or domain). Our approach consists of two main components that address these challenges: 1) Domain discovery; and 2) Crawling and information gathering. For each of these components we have designed new methods, and developed open-source tools that implement these methods. Notably, we have designed a new framework that facilitates domain discovery, organization and presentation. We have also developed a general and extensible crawling infrastructure that substantially extends the ACHE open-source focused crawler to support complex crawling tasks and multiple crawling strategies to discover new content in a timely manner.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Sep 01, 2018
Accession Number: AD1061874

Entities

People

Aécio Santos
Juliana Freire
Kien Pham
Sonia Quispe
Yamuna Krishnamurthy

Organizations

New York University

Towards Locating and Exploring Hard-to-Find Information on the Web

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers