<?xml version="1.0"?>
<!-- name="generator" content="blosxom/2.0" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
  <channel>
    <title>Notebooks   </title>
    <link>http://bactra.org/notebooks</link>
    <description>Cosma's Notebooks</description>
    <language>en</language>

  <item>
    <title>Estimating Entropies and Informations</title>
    <link>http://bactra.org/notebooks/2012/04/27#entropy-estimation</link>
    <description>
&lt;P&gt;The central mathematical objects
in &lt;a href=&quot;information-theory.html&quot;&gt;information theory&lt;/a&gt; are the entropies
of random variables.  These (&quot;Shannon&quot;) entropies are properties of the
probability distributions of the variables, rather than of particular
realizations.  (This is unlike the Boltzmann entropy
of &lt;a href=&quot;stat-mech.html&quot;&gt;statistical mechanics&lt;/a&gt;, which is an objective
property of the microscopic
state, &lt;a href=&quot;http://arxiv.org/abs/cond-mat/0303625&quot;&gt;at least once we fix our
partition of the former into macroscopic states&lt;/a&gt;.  Confusing the two
entropies is &lt;a href=&quot;http://arxiv.org/abs/cond-mat/0410063&quot;&gt;common but
bad&lt;/a&gt;.)  The question which concerns me here is how to estimate the entropy
of the distribution, given a sample of realizations.  The most obvious
approach, if one knows the form of the distribution but not the parameters, is
to estimate the parameters and then plug in.  But it feels like one should be
able to do this more non-parametrically.  The obvious non-parametric estimator
is of course just the entropy of the empirical distribution, what one might
call the empirical entropy.  However, the empirical distribution isn't always
the best estimate of the true distribution (one might perfer, e.g., some kind
of kernel density estimate).  For that matter, we often don't really care about
the distribution, just its entropy, so some more direct estimator would be
nice.

&lt;P&gt;What would be really nice would be to not just have point estimates but
also confidence intervals.  Non-parametrically, my guess is that the only
feasible way to do this is bootstrapping.

&lt;P&gt;For finite alphabets, one approach would be to use something like variable
length Markov chains, or &lt;a href=&quot;http://arxiv.org/abs/cs.LG/0406011&quot;&gt;causal
state reconstruction&lt;/a&gt;, to reconstruct a get machine capable of generating
the sequence.  From the machine, it is easy to calculate the entropy of words
or blocks of any finite length, and even the entropy rate.  My experience with
using &lt;a href=&quot;http://bactra.org/CSSR/&quot;&gt;CSSR&lt;/a&gt; is that the entropy rate
estimates can get very good even when the over-all reconstruction of the
structure is very poor, but I don't have any real theory on that.  I suspect
CSSR converges on the true entropy rate faster than do variable length Markov
chains, because the former has greater expressive power, but again I don't know
that for sure.

&lt;P&gt;Using &lt;a href=&quot;cep-gzip.html&quot;&gt;gzip is a bad idea&lt;/a&gt; (for this purpose; it
works fine for data compression).

&lt;P&gt;See also:
	&lt;a href=&quot;bootstrap-entropy.html&quot;&gt;Bootstrapping Entropy Estimates&lt;/a&gt;

&lt;ul&gt;Recommended:
	&lt;li&gt;D. J. Albers, George Hripcsak, &quot;Estimation of time-delayed mutual information and bias for irregularly and sparsely sampled time-series&quot;, &lt;a href=&quot;http://arxiv.org/abs/1110.1615&quot;&gt;arxiv:1110.1615&lt;/a&gt;
	&lt;li&gt;Jose M. Amigo, Janusz Szczepanski, Elek Wajnryb and Maria
V. Sanchez-Vives, &quot;Estimating the Entropy Rate of Spike Trains via Lempel-Ziv
Complexity&quot;,
&lt;a href=&quot;http://neco.mitpress.org/cgi/content/abstract/16/4/717&quot;&gt;&lt;cite&gt;Neural
Computation&lt;/cite&gt; &lt;strong&gt;16&lt;/strong&gt; (2004): 717--736&lt;/a&gt; [Normally,
I have &lt;a href=&quot;cep-gzip.html&quot;&gt;strong views on using Lempel-Ziv to
measure entropy rates&lt;/a&gt;, but here they are using the 1976 Lempel-Ziv
definitions, not the 1978 ones.  The difference is subtle, but important;
1978 leads to gzip and practical compression algorithms, but very bad
entropy estimates; 1976 leads, as they show numerically, to quite good
entropy rate estimates, at least for some processes.  Thanks to Dr. Szczepanski
for correspondence about this paper.]
	&lt;li&gt;J.-R. Chazottes and D. Gabrielli, &quot;Large deviations for empirical
entropies of Gibbsian sources&quot;, &lt;a
href=&quot;http://arxiv.org/abs/math.PR/0406083&quot;&gt;math.PR/0406083&lt;/a&gt; = &lt;a
href=&quot;http://dx.doi.org/10.1088/0951-7715/18/6/007&quot;&gt;&lt;cite&gt;Nonlinearity&lt;/cite&gt;
&lt;strong&gt;18&lt;/strong&gt; (2005): 2545--2563&lt;/a&gt; [This is a very cool result which
shows that block entropies, and entropy rates estimated from those blocks, obey
the &lt;a href=&quot;large-deviations.html&quot;&gt;large deviation principle&lt;/a&gt; even as one
lets the length of the blocks grow with the amount of data, provided the
block-length doesn't grow too quickly (only logarithmically).  I wish I could
write papers like this.]
	&lt;li&gt;John W. Fisher III, Alexander T. Ihler and Paula A. Viola,
&quot;Learning Informative Statistics: A Nonparametric Approach&quot;, pp. 900--906 in
NIPS 12 (1999) [&lt;a href=&quot;http://books.nips.cc/papers/files/nips12/0900.pdf&quot;&gt;PDF
reprint&lt;/a&gt;.  I'd call this more of a semi-parametric approach than a fully
non-parametric one; they assume a parametric form for the dependence structure,
but are agnostic about the distributions of innovations, and so try to maximize
non-parametrically estimated mutual informations.]
	&lt;li&gt;Yongmiao Hong and Halbert White, &quot;Asymptotic Distribution Theory
for Nonparametric Entropy Measures of Serial Dependence&quot;,
&lt;cite&gt;Econometrica&lt;/cite&gt; &lt;strong&gt;73&lt;/strong&gt; (2005): 837--901
[&lt;a href=&quot;http://www.jstor.org/stable/3598868&quot;&gt;JSTOR&lt;/a&gt;; &lt;a href=&quot;http://dss.ucsd.edu/~hwhite/pub_files/hwcv-095.pdf&quot;&gt;PDF reprint&lt;/a&gt; via Prof. White]
	&lt;li&gt;Matthew B. Kennel, Jonathon Shlens, Henry D. I. Abarbanel and
E. J. Chichilnisky, &quot;Estimating Entropy Rates with Bayesian Confidence
Intervals&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1162/0899766053723050&quot;&gt;&lt;cite&gt;Neural
Computation&lt;/cite&gt; &lt;strong&gt;17&lt;/strong&gt; (2005): 1531--1576&lt;/a&gt;
	&lt;li&gt;Alexander Kraskov, Harald St&amp;ouml;gbauer and Peter Grassberger,
&quot;Estimating Mutual Information&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cond-mat/0305641&quot;&gt;cond-mat/0305641&lt;/a&gt;
= &lt;cite&gt;Physical Review E&lt;/cite&gt; &lt;strong&gt;69&lt;/strong&gt; (2004): 066138
	&lt;li&gt;D&amp;aacute;vid P&amp;aacute;l, Barnab&amp;aacute;s P&amp;oacute;czos, Csaba Szepesv&amp;aacute;ri, &quot;Estimation of R&amp;eacute;nyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs&quot;, &lt;a href=&quot;http://arxiv.org/abs/1003.1954&quot;&gt;arxiv:1003.1954&lt;/a&gt;
	&lt;li&gt;Liam Paninski, &quot;Estimation of Entropy and Mutual Information&quot;,
&lt;a href=&quot;http://neco.mitpress.org/cgi/content/abstract/15/6/1191&quot;&gt;&lt;cite&gt;Neural Computation&lt;/citE&gt; &lt;strong&gt;15&lt;/strong&gt; (2003): 1191--1253&lt;/a&gt; [&lt;a href=&quot;http://www.stat.columbia.edu/~liam/research/abstracts/info_est-nc-abs.html&quot;&gt;Preprint&lt;/a&gt;;
&lt;a href=&quot;http://www.stat.columbia.edu/~liam/research/info_est.html&quot;&gt;code&lt;/a&gt;]
	&lt;li&gt;Barnabas Poczos, Jeff Schneider, &quot;On the Estimation of alpha-Divergences&quot;, &lt;a href=&quot;http://jmlr.csail.mit.edu/proceedings/papers/v15/poczos11a.html&quot;&gt;AIStats 2011&lt;/a&gt;
	&lt;li&gt;Thomas Schuermann and Peter Grassberger, &quot;Entropy estimation of 
symbol sequences,&quot; &lt;cite&gt;Chaos&lt;/cite&gt; &lt;strong&gt;6&lt;/strong&gt; (1996): 414--427 = &lt;a 
href=&quot;http://arxiv.org/abs/cond-mat/0203436&quot;&gt;cond-mat/0203436&lt;/a&gt; 
	&lt;li&gt;Jonathon Shlens, Matthew B. Kennel, Henry D. I. Abarbanel, E. J. Chichilnisky, &quot;Estimating Information Rates with Confidence Intervals in Neural Spike Trains&quot;, &lt;a href=&quot;http://dx.doi.org/10.1162/neco.2007.19.7.1683&quot;&gt;&lt;citE&gt;Neural Computation&lt;/cite&gt; &lt;strong&gt;19&lt;/strong&gt; (2007): 1683--1719&lt;/a&gt;
	&lt;li&gt;Jonathan D. Victor, &quot;Asymptotic Bias in Information Estimates and
the Exponential (Bell) Polynomials&quot;, &lt;cite&gt;Neural
Computation&lt;/cite&gt; &lt;strong&gt;12&lt;/strong&gt; (2000): 2797--2804 [Calculates the bias
in the empirical entropy, as an estimator of the true entropy, under IID
sampling of a discrete space.  Interestingly, the first-order (1/n) term in the
bias does not depend on the actual distribution, though the higher-order terms
do.]
	&lt;li&gt;Vincent Q. Vu, Bin Yu, Robert E. Kass, &quot;Information In The Non-Stationary Case&quot;, &lt;a href=&quot;http://arxiv.org/abs/0806.3978&quot;&gt;arxiv:0806.3978&lt;/a&gt;
	&lt;li&gt;Benjamin Weiss, &lt;cite&gt;Single Orbit Dynamics&lt;/cite&gt; [Discusses
procedures for non-parametrically estimating entropy of suitably ergodic
sources, using just one realization of the process.]
	&lt;/ul&gt;

&lt;ul&gt;To read:
	&lt;li&gt;M. S. Baptista, E. J. Ngamga, Paulo R. F. Pinto, Margarida Brito, J. Kurths, &quot;Kolmogorov-Sinai entropy from recurrence times&quot;, &lt;a href=&quot;http://arxiv.org/abs/0908.3401&quot;&gt;arxiv:0908.3401&lt;/a&gt;
	&lt;li&gt;D. Benedetto, E. Caglioti, G. Cristadoro, M. Degli Esposti, &quot;Relative entropy via non-sequential recursive pair substitutions&quot;, &lt;a href=&quot;http://arxiv.org/abs/1007.3384&quot;&gt;arxiv:1007.3384&lt;/a&gt; [The first two authors were also the
lead authors of the epic-fail &quot;Language Trees and Zipping&quot; paper, &lt;a href=&quot;http://arxiv.org/abs/cond-mat/0108530&quot;&gt;arxiv:cond-mat/0108530&lt;/a&gt;, but perhaps
they've improved.]
	&lt;li&gt;Juan A. Bonachela, Haye Hinrichsen, Miguel A. Munoz, &quot;Entropy
estimates of small data
sets&quot;, &lt;a href=&quot;http://arxiv.org/abs/0804.4561&quot;&gt;arxiv:0804.4561&lt;/a&gt;
	&lt;li&gt;Salim Bouzebda and Issam Elhattab, &quot;Uniform-in-bandwidth consistency for kernel-type estimators of Shannon's entropy&quot;, &lt;a href=&quot;http://projecteuclid.org/euclid.ejs/1305034910&quot;&gt;&lt;cite&gt;Electronic Journal of Statistics&lt;/cite&gt; &lt;strong&gt;5&lt;/strong&gt; (2011): 440--459&lt;/a&gt;
	&lt;li&gt;H. Cai, S. R. Kulkarni and S. Verdu, &quot;Universal Entropy Estimation
via Block Sorting&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1109/TIT.2004.830771&quot;&gt;&lt;cite&gt;IEEE Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;50&lt;/strong&gt;
(2004): 1551--1561&lt;/a&gt;
	&lt;li&gt;C. J. Cellucci, A. M. Albano and P. E. Rapp, &quot;Statistical
validation of mutual information calculations: Comparison of alternative
numerical algorithms&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1103/PhysRevE.71.066208&quot;&gt;&lt;cite&gt;Physical Review
E&lt;/cite&gt; &lt;strong&gt;71&lt;/strong&gt; (2005): 066208&lt;/a&gt; [From the abstract: &quot;[A]
minimum description length argument is used to determine the optimal number of
elements to use when characterizing the distributions of X and Y. However, even
when using partitions of the X and Y axis indicated by minimum description
length, mutual information calculations performed with a uniform partition of
the XY plane can give misleading results. This motivated the construction of an
algorithm for calculating mutual information that uses an adaptive
partition. This algorithm also incorporates an explicit test of the statistical
independence of X and Y in a calculation that returns an assessment of the
corresponding null hypothesis. The previously published Fraser-Swinney
algorithm for calculating mutual information includes a sophisticated procedure
for local adaptive control of the partitioning process. When the Fraser and
Swinney algorithm and the algorithm constructed here are compared, they give
very similar numerical results (less than 4% difference in a typical
application). Detailed comparisons are possible when X and Y are correlated
jointly Gaussian distributed because an analytic expression for I(X,Y) can be
derived for that case. Based on these tests, three conclusions can be
drawn. First, the algorithm constructed here has an advantage over the
Fraser-Swinney algorithm in providing an explicit calculation of the
probability of the null hypothesis that X and Y are independent. Second, the
Fraser-Swinney algorithm is marginally the more accurate of the two algorithms
when large data sets are used. With smaller data sets, however, the
Fraser-Swinney algorithm reports structures that disappear when more data are
available. Third, the algorithm constructed here requires about 0.5% of the
computation time required by the Fraser-Swinney algorithm.&quot;]
	&lt;li&gt;J.-R. Chazottes and E. Uglade, &quot;Entropy estimation and fluctuations
of Hitting and Recurrence Times for Gibbsian sources&quot;, &lt;a
href=&quot;http://arxiv.org/abs/math.DS/0401093&quot;&gt;math.DS/0401093&lt;/a&gt;
	&lt;li&gt;Gabriela Ciuperca and Valerie Girardin, &quot;Estimation of the
Entropy Rate of a Countable Markov Chain&quot;, &lt;a href=&quot;http://dx.doi.org/10.1080/03610920701270964&quot;&gt;&lt;cite&gt;Communications in Statistics: Theory and Methods&lt;/cite&gt; &lt;strong&gt;36&lt;/strong&gt; (2007): 2543--2557&lt;/a&gt;
	&lt;li&gt;G. Ciuperca, V. Girardin and L. Lhote, &quot;Computation and Estimation of Generalized Entropy Rates for Denumerable Markov Chains&quot;, &lt;a href=&quot;http://dx.doi.org/10.1109/TIT.2011.2133710&quot;&gt;&lt;cite&gt;IEEE Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;57&lt;/strong&gt; (2011): 4026--4034&lt;/a&gt; [The estimation is just plugging in the MLE of the parameters, for
finitely-parametrized chains, but they claim to show that works well]
	&lt;li&gt;J.-R. Chazottes, C. Maldonado, &quot;Concentration bounds for entropy estimation of one-dimensional Gibbs measures&quot;, &lt;a href=&quot;http://arxiv.org/abs/1102.1816&quot;&gt;arxiv:1102.1816&lt;/a&gt;
	&lt;li&gt;Tommy W. S. Chow and D. Huang, &quot;Estimating Optimal Feature Subsets
Using Efficient Estimation of High-Dimensional Mutual Information&quot;, &lt;cite&gt;IEEE
Transactions on Neural Networks&lt;/cite&gt; &lt;strong&gt;16&lt;/strong&gt; (2005): 213--224
	&lt;li&gt;Peter Clifford and Ioana Ada Cosma, &quot;A simple sketching algorithm
for entropy estimation&quot;, &lt;a href=&quot;http://arxiv.org/abs/0908.3961&quot;&gt;arxiv:0908.3961&lt;/a&gt;
	&lt;li&gt;Marshall Crumiller, Bruce Knight, Yunguo Yu and Ehud Kaplan,
&quot;Estimating the amount of information conveyed by a population of neurons&quot;
[&lt;a href=&quot;http://camelot.mssm.edu/~kaplane/Frontiers-review-2011.pdf&quot;&gt;PDF preprint via Dr. Kaplan&lt;/a&gt;]
	&lt;li&gt;J. M. Finn, J. D. Goettee, Z. Toroczkai, M. Anghel and B. P.  Wood,
&quot;Estimation of entropies and dimensions by nonlinear symbolic time series
analysis&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1063/1.1555471&quot;&gt;&lt;cite&gt;Chaos&lt;/cite&gt; &lt;strong&gt;13&lt;/strong&gt;
(2003): 444--456&lt;/a&gt;
	&lt;li&gt;Yun Gao, Ioannis Kontoyiannis, Elie Bienenstock
		&lt;ul&gt;
		&lt;li&gt;&quot;From the entropy
to the statistical structure of spike
trains&quot;, &lt;a href=&quot;http://arxiv.org/abs/0710.4117&quot;&gt;arxiv:0710.4117&lt;/a&gt;
		&lt;li&gt;&quot;Estimating the entropy of binary time series: Methodology, some theory and a simulation study&quot;, &lt;a href=&quot;http://arxiv.org/abs/0802.4363&quot;&gt;arxiv:0802.4363&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;M. N. Goria, N. N. Leonenko, V. V. Mergel and Pl L. Novi
Inverardi, &quot;A new class of random vector entropy estimators and its applications in testing statistical hypotheses&quot;, &lt;a href=&quot;http://dx.doi.org/10.1080/104852504200026815&quot;&gt;&lt;cite&gt;Journal of Nonparametric Statistics&lt;/cite&gt; &lt;strong&gt;17&lt;/strong&gt;
(2005): 277--297&lt;/a&gt;
	&lt;li&gt;Peter Grassberger, &quot;Data Compression and Entropy Estimates by 
Non-sequential Recursive Pair Substitution,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/physics/0207023&quot;&gt;physics/0207023&lt;/a&gt; [On 
Jimenez-Montano, Ebeling and Poeschel] 
	&lt;li&gt;Jean Hausser and Korbinian Strimmer, &quot;Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks&quot;,
&lt;a href=&quot;http://jmlr.csail.mit.edu/papers/v10/hausser09a.html&quot;&gt;&lt;cite&gt;Journal of Machine Learning Research&lt;/citE&gt; &lt;strong&gt;10&lt;/strong&gt;
(2009): 1469--1484&lt;/a&gt;
	&lt;li&gt;Detlef Holstein and Holger Kantz, &quot;Optimal Markov approximations and generalized embeddings&quot;, &lt;a href=&quot;http://dx.doi.org/10.1103/PhysRevE.79.056202&quot;&gt;&lt;cite&gt;Physical Review E&lt;/cite&gt; &lt;strong&gt;79&lt;/strong&gt; (2009): 056202&lt;/a&gt;
	&lt;li&gt;Marcus Hutter, Marco Zaffalon, &quot;Distribution of Mutual Information from Complete and Incomplete Data&quot;, &lt;cite&gt;Computational Statistics and
Data Analysis&lt;/cite&gt; &lt;strong&gt;48&lt;/strong&gt; (2005): 633--657, &lt;a href=&quot;http://arxiv.org/abs/cs/0403025&quot;&gt;arxiv:cs/0403025&lt;/a&gt;
	&lt;li&gt;Jiantao Jiao, Haim H. Permuter, Lei Zhao, Young-Han Kim, Tsachy Weissman, &quot;Universal Estimation of Directed Information&quot;, &lt;a href=&quot;http://arxiv.org/abs/1201.2334&quot;&gt;arxiv:1201.2334&lt;/a&gt;
	&lt;li&gt;Miguel Angel Jimenez-Montano, Werner Ebeling, and Thorsten 
Poeschel, &quot;SYNTAX: A computer program to compress a sequence and to estimate 
its information content,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/cond-mat/0204134&quot;&gt;cond-mat/0204134&lt;/a&gt; 
	&lt;li&gt;David K&amp;auml;llberg, Nikolaj Leonenko, Oleg Seleznjev, &quot;Statistical Inference for R&amp;eacute;nyi Entropy Functionals&quot;, &lt;a href=&quot;http://arxiv.org/abs/1103.4977&quot;&gt;arxiv:1103.4977&lt;/a&gt;
	&lt;li&gt;Alexei Kaltchenko, &quot;Algorithms for Estimating Information Distance 
with Applications to Bioinformatics and Linguistics&quot;, &lt;a 
href=&quot;http://arxiv.org/abs/cs.CC/0404039&quot;&gt;cs.CC/0404039&lt;/a&gt; 
	&lt;li&gt;Shiraj Khan, Sharba Bandyopadhyay, Auroop R. Ganguly, Sunil Saigal,
David J. Erickson, III, Vladimir Protopopescu, and George Ostrouchov, &quot;Relative
performance of mutual information estimation methods for quantifying the
dependence among short and noisy data&quot;,
&lt;a href=&quot;http://dx.doi.org/10.1103/PhysRevE.76.026209&quot;&gt;&lt;cite&gt;Physical Review
E&lt;/cite&gt; &lt;strong&gt;76&lt;/strong&gt; (2007): 026209&lt;/a&gt;
	&lt;li&gt;&lt;a href=&quot;http://www.dam.brown.edu/people/yiannis/&quot;&gt;Ioannis
Kontoyiannis&lt;/a&gt;, P. H. Algoet, Yu. M. Suhov and A. J. Wyner, &quot;Nonparametric
Entropy Estimation for Stationary Processes and Random Fields, with
Applications to English Text&quot;
	&lt;li&gt;Nikolai Leonenko, Luc Pronzato and Vippal Savani, &quot;A class
of R&amp;eacute;nyi information estimators for multidimensional densities&quot;,
&lt;a href=&quot;http://projecteuclid.org/euclid.aos/1223908088&quot;&gt;&lt;cite&gt;Annals of Statistics&lt;/cite&gt; &lt;strong&gt;36&lt;/strong&gt;
(2008): 2153--2182&lt;/a&gt;, &lt;a href=&quot;http://arxiv.org/abs/0810.5302&quot;&gt;arxiv:0810.5302&lt;/a&gt;
	&lt;li&gt;Annick Lesne, Jean-Luc Blanc and Laurent Pezard, &quot;Entropy
estimation of very short symbolic sequences&quot;, &lt;a href=&quot;http://dx.doi.org/10.1103/PhysRevE.79.046208&quot;&gt;&lt;cite&gt;Physical Review E&lt;/cite&gt; &lt;strong&gt;79&lt;/strong&gt; (2009): 046208&lt;/a&gt;
	&lt;li&gt;Christophe Letellier, &quot;Estimating the Shannon Entropy: Recurrence
Plots versus Symbolic Dynamics&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1103/PhysRevLett.96.254102&quot;&gt;&lt;cite&gt;Physical Review
Letters&lt;/cite&gt; &lt;strong&gt;96&lt;/strong&gt; (2006): 254102&lt;/a&gt;
	&lt;li&gt;Johan Lim, &quot;Estimation of the Entropy Functional from Dependent
Samples&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1080/03610920601126001&quot;&gt;&lt;cite&gt;Communications in
Statistics: Theory and Methods&lt;/cite&gt; &lt;strong&gt;36&lt;/strong&gt; (2007):
1577--1589&lt;/a&gt;
	&lt;li&gt;Tiger W. Lin and George N. Reeke, &quot;A Continuous Entropy Rate
Estimator for Spike Trains Using a K-Means-Based Context
Tree&quot;, &lt;a href=&quot;http://dx.doi.org/10.1162/neco.2009.11-08-912&quot;&gt;&lt;cite&gt;Neural
Computation&lt;/cite&gt; &lt;strong&gt;22&lt;/strong&gt; (2010): 998--1024&lt;/a&gt;
	&lt;li&gt;Ilya Nemenman, &quot;Inference of entropies of discrete random variables
with unknown cardinalities,&quot;
&lt;a href=&quot;http://arxiv.org/abs/physics/0207009&quot;&gt;physics/0207009&lt;/a&gt;
	&lt;li&gt;Ilya Nemenman, William Bialek and Rob de Ruyter van Steveninck,
&quot;Entropy and information in neural spike trains: Progress on the sampling
problem&quot;,
&lt;a href=&quot;http://arxiv.org/abs/0306063&quot;&gt;0306063&lt;/a&gt;
	&lt;li&gt;XuanLong Nguyen, Martin J. Wainwright, Michael I. Jordan, &quot;Estimating divergence functionals and the likelihood ratio by convex risk minimization&quot;, &lt;a href=&quot;http://arxiv.org/abs/0809.0853&quot;&gt;arxiv:0809.0853&lt;/a&gt;
	&lt;li&gt;Leandro Pardo, &lt;cite&gt;Statistical Inference Based on
Divergence Measures&lt;/cite&gt;
	&lt;li&gt;Liam Paninski, &quot;Estimating Entropy on &lt;em&gt;m&lt;/em&gt; Bins Given Fewer
Than &lt;em&gt;m&lt;/em&gt; Samples&quot;, &lt;cite&gt;IEEE Transactions on Information
Theory&lt;/cite&gt; &lt;strong&gt;50&lt;/strong&gt; (2004): 2200--2203
	&lt;li&gt;Angeliki Papana and Dimitris Kugiumtzis, &quot;Evaluation of Mutual Information Estimators for Time Series&quot;, &lt;a href=&quot;http://arxiv.org/abs/0904.4753&quot;&gt;arxiv:0904.4753&lt;/a&gt;
	&lt;li&gt;Paulo R. F. Pinto, M. S. Baptista, Isabel S. Labouriau, &quot;Density of first Poincar&amp;eacute; returns, periodic orbits, and Kolmogorov-Sinai entropy&quot;, &lt;a href=&quot;http://arxiv.org/abs/0908.4575&quot;&gt;arxiv:0908.4575&lt;/a&gt;
	&lt;li&gt;G. Pola, R. S. Petersen, A. Thiele, M. P. Young and S. Panzeri,
&quot;Data-Robust Tight Lower Bounds to the Information Carried by Spike Times of a
Neuronal Population&quot;, &lt;a
href=&quot;http://neco.mitpress.org/cgi/content/abstract/17/9/1962&quot;&gt;&lt;cite&gt;Neural
Computation&lt;/cite&gt;
&lt;strong&gt;17&lt;/strong&gt; (2005): 1962--2005&lt;/a&gt;
	&lt;li&gt;Thomas Sch&amp;uuml;rmann 
		&lt;ul&gt; 
		&lt;li&gt;&quot;Bias Analysis in Entropy Estimates&quot;, &lt;a 
href=&quot;http://arxiv.org/abs/cond-mat/0403192&quot;&gt;cond-mat/0403192&lt;/a&gt; 
		&lt;li&gt;&quot;Scaling behaviour of entropy estimates,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/cond-mat/0203409&quot;&gt;cond-mat/0203409&lt;/a&gt; 
		&lt;/ul&gt;
	&lt;li&gt;J. F. Silva and S. Narayanan, &quot;Complexity-Regularized Tree-Structured Partition for Mutual Information Estimation&quot;, &lt;a href=&quot;http://dx.doi.org/10.1109/TIT.2011.2177771&quot;&gt;&lt;cite&gt;IEEE Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;58&lt;/strong&gt; (2012): 1940--1952&lt;/a&gt;
	&lt;li&gt;Kumar Sricharan, Alfred O. Hero III, &quot;Ensemble estimators for multivariate entropy estimation&quot;, &lt;a href=&quot;http://arxiv.org/abs/1203.5829&quot;&gt;arxiv:1203.5829&lt;/a&gt;
	&lt;li&gt;Kumar Sricharan, Raviv Raich, Alfred O. Hero III, &quot;Empirical estimation of entropy functionals with confidence&quot;, &lt;a href=&quot;http://arxiv.org/abs/1012.4188&quot;&gt;arxiv:1012.4188&lt;/a&gt;
	&lt;li&gt;Taiji Suzuki, Masashi Sugiyama and Toshiyuki Tanaka, &quot;Mutual information approximation via maximum likelihood estimation of desnity ratio&quot;, ISIT 2009 [&lt;a href=&quot;http://sugiyama-www.cs.titech.ac.jp/~sugi/2009/ISIT2009.pdf&quot;&gt;PDF preprint&lt;/a&gt; via Prof. Sugiyama]
	&lt;li&gt;Zhiyi Zhang
		&lt;ul&gt;
		&lt;li&gt;&quot;Entropy Estimation in Turing's Perspective&quot;, &lt;a href=&quot;http://dx.doi.org/10.1162/NECO_a_00266&quot;&gt;&lt;cite&gt;Neural Computation&lt;/cite&gt; &lt;Strong&gt;24&lt;/strong&gt; (2012): 1368--1389&lt;/a&gt;
		&lt;li&gt;&quot;A Normal Law for the Plug-in Estimator of Entropy&quot;,
&lt;a href=&quot;http://dx.doi.org/10.1109/TIT.2011.2179702&quot;&gt;&lt;cite&gt;IEEE Transactions on
Information Theory&lt;/cite&gt; &lt;strong&gt;58&lt;/strong&gt; (2012): 2745--2747&lt;/a&gt;
		&lt;/ul&gt;
	&lt;/ul&gt;
</description>
  </item>
  </channel>
</rss>
