Towards Locating and Exploring Hard-to-Find Information on the Web
Abstract
This work developed new methods and tools to empower subject matter experts to effectively discover and track information on the Web that is relevant to a given task (or domain). Our approach consists of two main components that address these challenges: 1) Domain discovery; and 2) Crawling and information gathering. For each of these components we have designed new methods, and developed open-source tools that implement these methods. Notably, we have designed a new framework that facilitates domain discovery, organization and presentation. We have also developed a general and extensible crawling infrastructure that substantially extends the ACHE open-source focused crawler to support complex crawling tasks and multiple crawling strategies to discover new content in a timely manner.
Document Details
- Document Type
- Technical Report
- Publication Date
- Sep 01, 2018
- Accession Number
- AD1061874
Entities
People
- AƩcio Santos
- Juliana Freire
- Kien Pham
- Sonia Quispe
- Yamuna Krishnamurthy
Organizations
- New York University