Corpus and Method for Identifying Citations in Non-Academic Text (Open Access, Publisher's Version)

Abstract

We attempt to identify citations in non-academic text such as patents. Unlike academic articles which often provide bibliographies and follow consistent citation styles, non-academic text cites scientific research in a more ad-hoc manner. We manually annotate citations in 50 patents, train a CRF classifier to find new citations, and apply a reranker to incorporate non local information. Our best system achieves 0.83 F-score on 5-fold cross validation.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 31, 2014
Accession Number
AD1042311

Entities

People

  • Adam Meyers
  • Yifan He

Organizations

  • New York University

Tags

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Computational Linguistics
  • Computational Science
  • Computer Languages
  • Computer Science
  • Computer Vision
  • Data Sets
  • Hidden Markov Models
  • Identification
  • Language
  • Linguistics
  • Machine Learning
  • Markov Models
  • Named Entity Recognition
  • Natural Language Processing
  • Pattern Recognition

Readers

  • Computational Linguistics
  • Distributed Systems and Data Platform Development
  • Library and Information Science