Coverage Adjusted Entropy Estimation

Abstract

Data on "neural coding" have frequently been analyzed using information-theoretic measures. These formulations involve the fundamental, and generally difficult statistical problem of estimating entropy. We review briefly several methods that have been advanced to estimate entropy, and highlight a method, the coverage adjusted entropy estimator (CAE), due to Chao and Shen that appeared recently in the environmental statistics literature. This method begins with the elementary Horvitz-Thompson estimator, developed for sampling from a finite population and adjusts for the potential new species that have not yet been observed in the sample - these become the new patterns or "words" in a spike train that have not yet been observed. The adjustment is due to I.J. Good, and is called the Good-Turing coverage estimate. We provide a new empirical regularization derivation of the coverage-adjusted probability estimator, which shrinks the MLE. We prove that the CAE is consistent and first-order optimal, with rate O(sub-p)[1/ log n], in the class of distributions with finite entropy variance and that within the class of distributions with finite qth moment of the log-likelihood, the Good-Turing coverage estimate and the total probability of unobserved words converge at rate O(sub-p)[1/(log n)exp q]. We then provide a simulation study of the estimator with standard distributions and examples from neuronal data, where observations are dependent. The results show that, with a minor modification, the CAE performs much better than the MLE and is better than the Best Upper Bound estimator, due to Paninski, when the number of possible words m is unknown or infinite.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jun 05, 2007
Accession Number: ADA472999

Entities

People

Bin Yu
Robert E. Kass
Vincent Q. Vu

Organizations

University of California, Berkeley

Coverage Adjusted Entropy Estimation

Abstract

Document Details

Entities

People

Organizations

Tags

DTIC Thesaurus Topics

Fields of Study

Readers