<?xml version="1.0"?>
<!-- name="generator" content="blosxom/2.0" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
  <channel>
    <title>Notebooks   </title>
    <link>http://bactra.org/notebooks</link>
    <description>Cosma's Notebooks</description>
    <language>en</language>

  <item>
    <title>Information Theory </title>
    <link>http://bactra.org/notebooks/2012/04/11#information-theory</link>
    <description>
&lt;P&gt;Imagine that someone hands you a sealed envelope, containing, say, a
telegram.  You want to know what the message is, but you can't just open it up
and read it.  Instead you have to play a game with the messenger: you get to
ask yes-or-no questions about the contents of the envelope, to which he'll
respond truthfully.  Question: assuming this rather contrived and boring
exercise is repeated many times over, and you get as clever at choosing your
questions as possible, what's the smallest number of questions needed, on
average, to get the contents of the message nailed down?

&lt;P&gt;This question actually has an answer.  Suppose there are only a finite
number of messages (&quot;Yes&quot;; &quot;No&quot;; &quot;Marry me?&quot;; &quot;In Reno, divorce final&quot;; &quot;All is
known stop fly at once stop&quot;; or just that there's a limit on the length of the
messages, say a thousand characters).  Then we can number the messages from 1
to &lt;i&gt;N&lt;/i&gt;.  Call the message we get on this trial &lt;em&gt;S&lt;/em&gt;.  Since the game
is repeated many times, it makes sense to say that there's a probability \( p_i
\) of getting message number &lt;em&gt;i&lt;/em&gt; on any given trial, i.e. \( \Pr{(S=i)}
= p_i \).  Now, the number of yes-no questions needed to pick out any given
message is, at most, \( \log{N} \), taking the logarithm to base two.  (If you
were allowed to ask questions with three possible answers, it'd be log to the
base three.  Natural logarithms would seem to imply the idea of their being
2.718... answers per question, but nonetheless make sense mathematically.)  But
one can do better than that: if message &lt;em&gt;i&lt;/em&gt; is more frequent than
message &lt;em&gt;j&lt;/em&gt; (if \( p_i &gt; p_j \) ), it makes sense to ask whether the
message is &lt;em&gt;i&lt;/em&gt; before considering the possibility that it's &lt;em&gt;j&lt;/em&gt;;
you'll save time.  One can in fact show, with a bit of algebra, that the
smallest average number of yes-no questions is \[ -\sum_{i}{p_i\log{p_i}} ~.
\] This gives us \( \log{N} \) when all the &lt;i&gt;p&lt;/i&gt;&lt;sub&gt;&lt;i&gt;i&lt;/i&gt;&lt;/sub&gt; are
equal, which makes sense: then there are no prefered messages, and the order of
asking doesn't make any difference.  The sum is called, variously, the
information, the information content, the self-information, the entropy or the
Shannon entropy of the message, conventionally written &lt;em&gt;H&lt;/em&gt;[&lt;em&gt;S&lt;/em&gt;].

&lt;P&gt;Now, at this point a natural and sound reaction would be to say &quot;the
mathematicians can call it what they like, but what you've described, this
ridiculous guessing game, has squat-all to do with information.&quot;  Alas, would
that this were so: it &lt;em&gt;is&lt;/em&gt; ridiculous, but it works.  More: it was
arrived at, simultaneously, by several mathematicians and engineers during
World War II (among the Americans, most notably, Claude Shannon and Norbert
Wiener), working on very serious and practical problems of coding,
code-breaking, communication and automatic control.  The real justification for
regarding the entropy as the amount of information is that, unsightly though it
is, though it's abstracted away all the content of the message and almost all
of the context (except for the distribution over messages), it works.  You can
try to design a communication channel which doesn't respect the theorems of
information theory; in fact, people did; you'll fail, as they did.

&lt;P&gt;Of course, nothing really depends on guessing the contents of sealed 
envelopes; any sort of random variable will do. 

&lt;P&gt;The next natural extension is to say, &quot;Well, I've got two envelopes here,
and I want to know what all the messages are in both of them; how many
questions will that take?&quot;  Call the two variables &lt;em&gt;S&lt;/em&gt; and &lt;em&gt;T&lt;/em&gt;.
(The case of more than two is a pretty simple extension, left to the reader's
ingenuity and bored afternoons.)  To find out the value of &lt;em&gt;S&lt;/em&gt;
takes &lt;em&gt;H&lt;/em&gt;[&lt;em&gt;S&lt;/em&gt;] questions; that
of &lt;em&gt;T&lt;/em&gt;, &lt;em&gt;H&lt;/em&gt;[&lt;em&gt;T&lt;/em&gt;]; so together we need at
most &lt;em&gt;H&lt;/em&gt;[&lt;em&gt;S&lt;/em&gt;] + &lt;em&gt;H&lt;/em&gt;[&lt;em&gt;T&lt;/em&gt;] questions.  But some
combinations of messages may be more likely than others.  If one of them is
&quot;Marry me?&quot;, the odds are good that the other is &quot;Yes&quot; or &quot;No&quot;.  So, by the
same reasoning as before, we figure out the distribution of pairs of messages,
and find its entropy, called the &lt;em&gt;joint entropy,&lt;/em&gt;
written &lt;em&gt;H&lt;/em&gt;[&lt;em&gt;S&lt;/em&gt;, &lt;em&gt;T&lt;/em&gt;].  Lo and behold, some algebra proves
that &lt;em&gt;H&lt;/em&gt;[&lt;em&gt;S&lt;/em&gt;, &lt;em&gt;T&lt;/em&gt;] is at most &lt;em&gt;H&lt;/em&gt;[&lt;em&gt;S&lt;/em&gt;]
+ &lt;em&gt;H&lt;/em&gt;[&lt;em&gt;T&lt;/em&gt;], and is always lower if the two variables are not
statistically independent.  Now suppose that we've figured out the contents of
one message, &lt;em&gt;S&lt;/em&gt; let us say (i.e. we've learned it's &quot;Marry me?&quot; or
whatever): how many questions will it take us to find out the contents
of &lt;em&gt;T&lt;/em&gt;?  This is the &lt;em&gt;conditional entropy,&lt;/em&gt; the entropy
of &lt;em&gt;T&lt;/em&gt; conditioned on &lt;em&gt;S&lt;/em&gt;, written
&lt;em&gt;H&lt;/em&gt;[&lt;em&gt;T&lt;/em&gt;|&lt;em&gt;S&lt;/em&gt;], and a little thought shows it must 
be &lt;em&gt;H&lt;/em&gt;[&lt;em&gt;T&lt;/em&gt;, &lt;em&gt;S&lt;/em&gt;] - &lt;em&gt;H&lt;/em&gt;[&lt;em&gt;S&lt;/em&gt;], for 
consistency.  This finally leads us to the idea of the &lt;em&gt;mutual 
information,&lt;/em&gt; written &lt;em&gt;I&lt;/em&gt;[&lt;em&gt;S&lt;/em&gt;; &lt;em&gt;T&lt;/em&gt;], which is the 
amount we learn about &lt;em&gt;T&lt;/em&gt; from knowing &lt;em&gt;S&lt;/em&gt;, i.e., the number of 
questions it saves us from having to ask, i.e., &lt;em&gt;H&lt;/em&gt;[&lt;em&gt;T&lt;/em&gt;] 
- &lt;em&gt;H&lt;/em&gt;[&lt;em&gt;T&lt;/em&gt;|&lt;em&gt;S&lt;/em&gt;], which is, as it happens, always the same 
as &lt;em&gt;H&lt;/em&gt;[&lt;em&gt;S&lt;/em&gt;] - &lt;em&gt;H&lt;/em&gt;[&lt;em&gt;S&lt;/em&gt;|&lt;em&gt;T&lt;/em&gt;].  (Hence 
&quot;mutual.&quot;)  The mutual information quantifies how much one variable (say, the 
signal picked up by the receiver in the field) can tell us about another (say, 
the signal sent on the other end). 

&lt;P&gt;I should now talk about the source and channel coding theorems, and 
error-correcting codes, which are remarkably counter-intuitive beasts, but I 
don't feel up to it. 

&lt;P&gt;I should also talk about the connection to Kolmogorov complexity, too.
Roughly, the Kolmogorov complexity of a sequence of symbols is the shortest
computer program which will generate that sequence as its output.  For certain
classes of random processes, the Kolmogorov complexity per symbol converges, on
average, to the entropy per symbol, which in that case is the entropy rate, the
entropy of the latest symbol, conditioned on all the previous ones.  This gives
us a pretty profound result: random sequences are incompressible; and,
conversely, an incompressible sequence looks random.  In fact it turns out that
one can write down formal analogs to almost all the usual theorems about
information which talk, not about the entropy, but about the length of the
Kolmogorov program, also for this reason called the &lt;em&gt;algorithmic
information.&lt;/em&gt;

&lt;P&gt;&lt;a href=&quot;wiener.html&quot;&gt;Norbert Wiener&lt;/a&gt; worked out the continuous case of 
the standard entropy/coding/ communication channel part of information theory 
at the same time as Shannon was doing the discrete version; I don't know 
whether anything like this exists for algorithmic information theory. 

&lt;P&gt;In addition to the use in communications and technology, this stuff is also
of some use in &lt;a href=&quot;stat-mech.html&quot;&gt;statistical physics&lt;/a&gt; (we are, after
all, the people who came up with the idea of entropy in the first place!), in
&lt;a href=&quot;chaos.html&quot;&gt;dynamics&lt;/a&gt; (where we use an infinite family of
generalizations of the Shannon entropy, the R&amp;eacute;nyi entropies), and in &lt;a
href=&quot;probability.html&quot;&gt;probability&lt;/a&gt; and &lt;a
href=&quot;statistics.html&quot;&gt;statistics generally&lt;/a&gt;.  There are important
connections to deep issues about &lt;a
href=&quot;learning-inference-induction.html&quot;&gt;learning and induction&lt;/a&gt;, though I
think they're often misconceived.  (Another rant for another time.)  Certainly
the occasional people who say &quot;this isn't a communication channel, so you can't
use information theory&quot; are wrong.

&lt;P&gt;Equally out of it are &lt;a href=&quot;cep-gzip.html&quot;&gt;physicists who try to use 
gzip to measure entropy&lt;/a&gt;. 

&lt;P&gt;Relation to other &lt;a href=&quot;complexity-measures.html&quot;&gt;complexity
measures&lt;/a&gt;, &lt;a href=&quot;computational-mechanics.html&quot;&gt;computational
mechanics&lt;/a&gt;.  What are the appropriate extensions to things other than simple
time-series, e.g., spatially extended systems?

&lt;P&gt;See also:
	&lt;a href=&quot;ergodic-theory.html&quot;&gt;Ergodic Theory&lt;/a&gt;;
	&lt;a href=&quot;entropy-estimation.html&quot;&gt;Estimating Entropies and Informations&lt;/a&gt;;
	&lt;a href=&quot;info-geo.html&quot;&gt;Information Geometry&lt;/a&gt;;
&lt;a href=&quot;mdl.html&quot;&gt;The Minimum Description Length Principle&lt;/a&gt;;
	&lt;a href=&quot;recurrence-times.html&quot;&gt;Recurrence Times of Stochastic Processes&lt;/a&gt;



&lt;ul&gt;Recommended, more general:
	&lt;li&gt;Cover and Thomas, &lt;cite&gt;Elements of Information Theory&lt;/cite&gt; [Is 
and deserves to be the standard text, but is too damn expensive] 
	&lt;li&gt;Ray and Charles
Eames, &lt;cite&gt;&lt;a
href=&quot;http://www.archive.org/details.php?identifier=communications_primer&quot;&gt;A
Communications Primer&lt;/a&gt;&lt;/cite&gt; [Short film from, incredibly, 1953]
	&lt;li&gt;&lt;a href=&quot;http://hornacek.coa.edu/dave/&quot;&gt;Dave Feldman&lt;/a&gt;, &lt;a
href=&quot;http://hornacek.coa.edu/dave/Tutorial/&quot;&gt;Information Theory, Excess
Entropy and Statistical Complexity&lt;/a&gt; [a little log-rolling never hurt anyone]
	&lt;li&gt;Chris Hillman, &lt;a 
href=&quot;http://www.math.washington.edu/~hillman/entropy.html&quot;&gt;Entropy on the 
World Wide Web&lt;/a&gt; 
	&lt;li&gt;Pierce, &lt;cite&gt;Symbols, Signals and Noise&lt;/cite&gt; [The best 
non-technical book, indeed, almost the only one which isn't full of nonsense; 
but I must warn you he does use logarithms in a few places.] 
	&lt;li&gt;Rieke et al., &lt;cite&gt;Spikes: Exploring the Neural Code&lt;/cite&gt; [&lt;a 
href=&quot;../reviews/spikes/&quot;&gt;Review: Cells that Go Ping, or, The Value of the 
Three-Bit Spike&lt;/a&gt;] 
	&lt;li&gt;Thomas Schneider, &lt;a 
href=&quot;http://www-lmmb.ncifcrf.gov/~toms/paper/primer/&quot;&gt;Primer on Information 
Theory&lt;/a&gt; [for &lt;a href=&quot;molecular-biology.html&quot;&gt;molecular biologists&lt;/a&gt;] 
	&lt;li&gt;Claude Shannon and Warren Weaver, &lt;cite&gt;Mathematical Theory of 
Communication&lt;/cite&gt; [The very first work on information theory, highly 
motivated by very practical problems of communication and coding; it's still 
interesting to read.  The first half, Shannon's paper on &quot;&lt;a 
href=&quot;http://cm.bell-labs.com/cm/ms/what/shannonday/paper.html&quot;&gt;A Mathematical 
Theory of Communication&lt;/a&gt;,&quot; is now on-line, courtesy of Bell Labs, where 
Shannon worked.] 
	&lt;/ul&gt; 

&lt;ul&gt;Recommended, more specialized:
	&lt;li&gt;Paul H. Algoet and Thomas M. Cover, &quot;A Sandwich Proof of the
Shannon-McMillan-Breiman
Theorem&quot;, &lt;a href=&quot;http://projecteuclid.org/euclid.aop/1176991794&quot;&gt;&lt;cite&gt;Annals
of Probability&lt;/cite&gt;
&lt;strong&gt;16&lt;/strong&gt; (1988): 899--909&lt;/a&gt; [A very slick asymptotic equipartition
result for relative entropy, with the usual equipartition theorem as a special
case]
	&lt;li&gt;Massimiliano Badino, &quot;An Application of Information Theory to the
Problem of the Scientific
Experiment&quot;, &lt;cite&gt;Synthese&lt;/cite&gt; &lt;strong&gt;140&lt;/strong&gt; (2004): 355--389 [An
interesting attempt to formulate experimental hypothesis testing in
information-theoretic terms, with experiments serving as a channel between the
world and the scientist.  Badino makes what seems to me a very nice point, that
if the source is ergodic (because, e.g., experiments are independent
replicates), then almost surely a long enough sequence of experimental results
will be &quot;typical&quot;, in the sense of the asymptotic equipartition property, and
so observing what your theory describes as an atypical sequence is reason to
reject that theory.  Two problems with this, however, are that Badino assumes
the theory completely specifies the probability of observations, i.e., no free
parameters to be estimated from data, and he doesn't seem to be aware of any of
the work relating information theory to hypothesis testing, which goes back at
least to Kullback in the 1950s.  I think something very interesting could be
done here, about testing hypotheses on ergodic (not just IID) sources, but
wonder if it hasn't been done
already...  &lt;a href=&quot;http://philsci-archive.pitt.edu/archive/00001830/&quot;&gt;MS Word
preprint&lt;/a&gt;]
	&lt;li&gt;Andrew Barron and Nicolas Hengartner, &quot;Information theory and superefficiency&quot;, &lt;a href=&quot;http://projecteuclid.org/euclid.aos/1024691358&quot;&gt;&lt;citE&gt;Annals of Statistics&lt;/citE&gt; &lt;strong&gt;26&lt;/strong&gt; (1998):
1800--1825&lt;/a&gt;
	&lt;li&gt;M. S. Bartlett, &quot;The Statistical Significance of Odd Bits of 
Information&quot;, &lt;cite&gt;Biometrika&lt;/cite&gt; &lt;strong&gt;39&lt;/strong&gt; (1952): 228--237 [A 
goodness-of-fit test based on fluctuations of the 
entropy.  &lt;a href=&quot;http://www.jstor.org/2334019&quot;&gt;JSTOR&lt;/a&gt;] 
	&lt;li&gt;Carl T. Bergstrom and Michael Lachmann, &quot;The fitness value of
information&quot;, &lt;a
href=&quot;http://arxiv.org/abs/q-bio.PE/0510007&quot;&gt;q-bio.PE/0510007&lt;/a&gt;
	&lt;li&gt;Sergey Bobkov, Mokshay Madiman, &quot;Concentration of the information in data with log-concave distributions&quot;, &lt;a href=&quot;http://projecteuclid.org/euclid.aop/1312555807&quot;&gt;&lt;cite&gt;Annals of Probability&lt;/cite&gt; &lt;strong&gt;39&lt;/strong&gt; (2011): 1528--1543&lt;/a&gt;, &lt;a href=&quot;http://arxiv.org/abs/1012.5457&quot;&gt;arxiv:1012.5457&lt;/a&gt;
	&lt;li&gt;Jochen Brocker, &quot;A Lower Bound on Arbitrary $f$-Divergences in
Terms of the Total Variation&quot; &lt;a href=&quot;http://arxiv.org/abs/0903.1765&quot;&gt;arxiv:0903.1765&lt;/a&gt;
	&lt;li&gt;Gavin Brown, Adam Pocock, Ming-Jie Zhao, Mikel Luj&amp;aacute;n, &quot;Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection&quot;, &lt;a href=&quot;http://jmlr.csail.mit.edu/papers/v13/brown12a.html&quot;&gt;&lt;cite&gt;Journal of Machine Learning Research&lt;/cite&gt;
&lt;strong&gt;13&lt;/strong&gt; (2012): 27--66&lt;/a&gt;
	&lt;li&gt;Nicolo Cesa-Bianchi and Gabor Lugosi, &lt;citE&gt;Prediction, Learning,
and Games&lt;/cite&gt; [&lt;a href=&quot;../weblog/algae-2008-07.html#prediction&quot;&gt;Mini-review&lt;/a&gt;]
	&lt;li&gt;J.-R. Chazottes and D. Gabrielli, &quot;Large deviations for empirical
entropies of Gibbsian sources&quot;, &lt;a
href=&quot;http://arxiv.org/abs/math.PR/0406083&quot;&gt;math.PR/0406083&lt;/a&gt; = &lt;a
href=&quot;http://dx.doi.org/10.1088/0951-7715/18/6/007&quot;&gt;&lt;cite&gt;Nonlinearity&lt;/cite&gt;
&lt;strong&gt;18&lt;/strong&gt; (2005): 2545--2563&lt;/a&gt; [This is a very cool result which
shows that block entropies, and entropy rates estimated from those blocks, obey
the large deviation principle even as one lets the length of the blocks grow
with the amount of data, provided the block-length doesn't grow too quickly
(only logarithmically).  I wish I could write papers like this.]
	&lt;li&gt;Paul Cuff, Haim Permuter and Thomas Cover, &quot;Coordination
Capacity&quot;, &lt;a href=&quot;http://arxiv.org/abs/0909.2408&quot;&gt;arxiv:0909.2408&lt;/a&gt; [&quot;...
elements of a theory of cooperation and coordination in networks. Rather than
considering a communication network as a means of distributing information, or
of reconstructing random processes at remote nodes, we ask what dependence can
be established among the nodes given the communication
constraints. Specifically, in a network with communication rates ... between
the nodes, we ask what is the set of all achievable joint distributions ... of
actions at the nodes of the network. ... Distributed cooperation can be the
solution to many problems such as distributed games, distributed control, and
establishing mutual information bounds on the influence of one part of a
physical system on another.&quot;  But the networks considered are &lt;em&gt;very&lt;/em&gt;
simple, and a lot of them cheat by providing access to a common source of
randomness.  Still, an interesting direction!]
	&lt;li&gt;Stefano Galatolo, Mathieu Hoyrup, and Crist&amp;oacute;bal Rojas,
&quot;Effective symbolic dynamics, random points, statistical behavior, complexity
and entropy&quot;, &lt;a href=&quot;http://arxiv.org/abs/0801.0209&quot;&gt;arxiv:0801.0209&lt;/a&gt;
[&lt;em&gt;All&lt;/em&gt;, not almost all, Martin-Lof points are statistically
typical.]
	&lt;li&gt;R. M. Gray, &lt;cite&gt;Entropy and Information Theory&lt;/cite&gt;
[Mathematically rigorous; many interesting newer developments, for specialists.
Now &lt;a href=&quot;http://ee.stanford.edu/~gray/it.html&quot;&gt;on-line&lt;/a&gt;.]
	&lt;li&gt;Aleks Jakulin and Ivan Bratko, &quot;Quantifying and Visualizing 
Attribute Interactions&quot;, &lt;a 
href=&quot;http://arxiv.org/abs/cs.AI/0308002&quot;&gt;cs.AI/0308002&lt;/a&gt; 
	&lt;li&gt;A. I. Khinchin, &lt;cite&gt;Mathematical Foundations of Information 
Theory&lt;/cite&gt; [An axiomatic approach, for those who like that sort of thing]
	&lt;li&gt;Solomon Kullback, &lt;cite&gt;Information Theory and Statistics&lt;/cite&gt; 
	&lt;li&gt;Katalin Marton and Paul C. Shields, &quot;Entropy and the Consistent Estimation of Joint Distributions&quot;, &lt;a href=&quot;http://projecteuclid.org/euclid.aop/1176988736&quot;&gt;&lt;cite&gt;Annals of Probability&lt;/citE&gt; &lt;strong&gt;22&lt;/strong&gt; (1994):
960--977&lt;/a&gt;
	&lt;li&gt;Maxim Raginsky
		&lt;ul&gt;
		&lt;li&gt;&quot;Empirical processes, typical sequences and coordinated actions in standard Borel spaces&quot;, &lt;a href=&quot;http://arxiv.org/abs/1009.0282&quot;&gt;arxiv:1009.0282&lt;/a&gt;
		&lt;li&gt;&quot;Directed information and Pearl's causal calculus&quot;, &lt;a href=&quot;http://arxiv.org/abs/1110.0718&quot;&gt;arxiv:1110.0718&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;Maxim Raginsky, Roummel F. Marcia, Jorge Silva and Rebecca M.
Willett, &quot;Sequential Probability Assignment via Online Convex Programming Using
Exponential Families&quot; [ISIT
2009, &lt;a href=&quot;http://people.ee.duke.edu/~willett/papers/raginsky_marcia_silva_willett_ISIT09.pdf&quot;&gt;PDF&lt;/a&gt;]
	&lt;li&gt;Alfr&amp;eacute;d R&amp;eacute;nyi, &quot;On Measures of Entropy and
Information&quot;, &lt;a href=&quot;http://projecteuclid.org/euclid.bsmsp/1200512181&quot;&gt;pp. 547--561
in vol. I of &lt;cite&gt;Proceedings of the Fourth Berkeley Symposium on Mathematical
Statistics and Probability&lt;/cite&gt;&lt;/a&gt;
	&lt;li&gt;Jorma Rissanen, &lt;cite&gt;Stochastic Complexity in Statistical
Inquiry&lt;/cite&gt; [Applications of coding ideas to statistical problems.  &lt;a
href=&quot;../reviews/stochastic-complexity-in-statistical-inquiry/&quot;&gt;Review: Less Is
More, or &lt;em&gt;Ecce data!&lt;/em&gt;&lt;/a&gt;]
	&lt;li&gt;Olivier Rivoire and Stanislas Leibler, &quot;The Value of Information
for Populations in Varying Environments&quot;, &lt;a href=&quot;http://arxiv.org/abs/1010.5092&quot;&gt;arxiv:1010.5092&lt;/a&gt;
	&lt;li&gt;Paul C. Shields, &lt;cite&gt;The Ergodic Theory of Discrete Sample
Paths&lt;/cite&gt; [Emphasis on ergodic properties relating to individual sample
paths, as well as coding-theoretic
arguments.  &lt;a
href=&quot;http://www.math.utoledo.edu/~pshields/ergodic.html&quot;&gt;Shield's page on the
book&lt;/a&gt;.]
	&lt;li&gt;Bastian Steudel and Nihat Ay, &quot;Information-theoretic inference of common ancestors&quot;, &lt;a href=&quot;http://arxiv.org/abs/1010.5720&quot;&gt;arxiv:1010.5720&lt;/a&gt;
	&lt;li&gt;Eric E. Thomson and William B. Kristan, &quot;Quantifying Stimulus
Discriminability: A Comparison of Information Theory and Ideal Observer
Analysis&quot;,
&lt;a href=&quot;http://neco.mitpress.org/cgi/content/abstract/17/4/741&quot;&gt;&lt;cite&gt;Neural
Computation&lt;/cite&gt; &lt;strong&gt;17&lt;/strong&gt; (2005): 741--778&lt;/a&gt; [A useful warning
against a too-common abuse of information theory.  Thanks to Eric for providing
me with a pre-print.]
	&lt;li&gt;Hugo Touchette and Seth Lloyd, &quot;Information-Theoretic Approach to
the Study of Control Systems,&quot;
&lt;a href=&quot;http://arxiv.org/abs/physics/0104007&quot;&gt;physics/0104007&lt;/a&gt; [Rediscovery
and generalization of &lt;a href=&quot;ashby.html&quot;&gt;Ashby's&lt;/a&gt; &quot;law of requisite
variety&quot; from the 1950s; more applications than he gave, but a more tortuous
proof]
	&lt;li&gt;Benjamin Weiss, &lt;cite&gt;Single Orbit Dynamics&lt;/cite&gt;
	&lt;/ul&gt;

&lt;ul&gt;Not recommended: 
	&lt;li&gt;Dario Benedetto, Emanuele Caglioti and Vittorio Loreto, &quot;Language
Trees and Zipping,&quot; &lt;a
href=&quot;http://arxiv.org/abs/cond-mat/0108530&quot;&gt;cond-mat/0108530&lt;/a&gt; [Though
they're no worse than other people who use gzip as an approximation to the
Kolmogorov complexity, this example, published in PRL, is especially egregious,
and has called forth two separate and conclusive demolitions,
&lt;a href=&quot;http://arxiv.org/abs/cond-mat/0205521&quot;&gt;cond-mat/0205521&lt;/a&gt; and &lt;a 
href=&quot;http://arxiv.org/abs/cond-mat/0202383&quot;&gt;cond-mat/0202383&lt;/a&gt;] 
	&lt;li&gt;B. Roy Frieden, &lt;cite&gt;Physics from Fisher Information: A 
Unification&lt;/cite&gt; [&lt;a href=&quot;../reviews/physics-from-fisher-info/&quot;&gt;Review: 
Laboring to Bring Forth a Mouse&lt;/a&gt;] 
	&lt;/ul&gt; 

&lt;ul&gt;To read: 
	&lt;li&gt;Robert Alicki, &quot;Information-theoretical meaning of quantum 
dynamical entropy,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/quant-ph/0201012&quot;&gt;quant-ph/0201012&lt;/a&gt; 
	&lt;li&gt;Armen E. Allahverdyan, &quot;Entropy of Hidden Markov Processes via
Cycle Expansion&quot;, &lt;a href=&quot;http://arxiv.org/abs/0810.4341&quot;&gt;arxiv:0810.4341&lt;/a&gt;
	&lt;li&gt;P. O. Amblard and O. J. J. Michel, &quot;On directed ifnromation theory
and Granger causality graphs&quot;, &lt;a href=&quot;http://arxiv.org/abs/1002.1446&quot;&gt;arxiv:1002.1446&lt;/a&gt;
	&lt;li&gt;Jose M. Amigo, Matthew B. Kennel and Ljupco Kocarev, &quot;The
permutation entropy rate equals the metric entropy rate for ergodic information
sources and ergodic dynamical systems&quot;, &lt;a
href=&quot;http://arxiv.org/abs/nlin.CD/0503044&quot;&gt;nlin.CD/0503044&lt;/a&gt;
	&lt;li&gt;John C. Baez, Mike Stay, &quot;Algorithmic Thermodynamics&quot;, &lt;a href=&quot;http://arxiv.org/abs/1010.2067&quot;&gt;arxiv:1010.2067&lt;/a&gt;
	&lt;li&gt;K. Bandyopadhyay, A. K. Bhattacharya, Parthapratim Biswas and
D. A. Drabold, &quot;Maximum entropy and the problem of moments: A stable
algorithm&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cond-mat/0412717&quot;&gt;cond-mat/0412717&lt;/a&gt;
	&lt;li&gt;Richard G. Baraniuk, Patrick Flandrin, Augustus J. E. M. Janssen
and Olivier J. J. Michel, &quot;Measuring Time-Frequency Information Content Using
the Renyi Entropies&quot;, &lt;cite&gt;IEEE Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;47&lt;/strong&gt; (2001): 1391--1409
	&lt;li&gt;Felix Belzunce, Jorge Navarro, Jos&amp;eacute; M. Ruiz and Yolanda del
Aguila, &quot;Some results on residual entropy function&quot; [sic], &lt;a
href=&quot;http://dx.doi.org/doi:10.1007/s001840300276&quot;&gt;&lt;cite&gt;Metrika&lt;/cite&gt; &lt;strong&gt;59&lt;/strong&gt;
(2004): 147--161&lt;/a&gt;
	&lt;li&gt;Fabio Benatti, Tyll Krueger, Markus Mueller, Rainer
Siegmund-Schultze and Arleta Szkola, &quot;Entropy and Algorithmic Complexity in
Quantum Information Theory: a Quantum Brudno's Theorem&quot;, &lt;a
href=&quot;http://arxiv.org/abs/quant-ph/0506080&quot;&gt;quant-ph/0506080&lt;/a&gt;
	&lt;li&gt;C. T. Bergstrom and M. Rosvall, &quot;The transmission sense of information&quot;, &lt;a href=&quot;http://arxiv.org/abs/0810.4168&quot;&gt;arxiv:0810.4168&lt;/a&gt;
	&lt;li&gt;Igor Bjelakovic, Tyll Krueger, Rainer Siegmund-Schultze and Arleta 
Szkola 
		&lt;ul&gt; 
		&lt;li&gt;&quot;The Shannon-McMillan Theorem for Ergodic Quantum Lattice 
Systems,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/math.DS/0207121&quot;&gt;math.DS/0207121&lt;/a&gt; 
		&lt;li&gt;&quot;Chained Typical Subspaces - a Quantum Version of 
Breiman's Theorem,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/quant-ph/0301177&quot;&gt;quant-ph/0301177&lt;/a&gt; 
		&lt;/ul&gt; 
	&lt;li&gt;Claudio Bonanno, &quot;The Manneville map: topological, metric and 
algorithmic entropy,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/math.DS/0107195&quot;&gt;math.DS/0107195&lt;/a&gt; 
	&lt;li&gt;Andrej Bratko, Gordon V. Cormack, Bogdan Filipic, Thomas R. Lynam,
Blaz Zupan, &quot;Spam Filtering Using Statistical Data Compression Models&quot;, &lt;a
href=&quot;http://jmlr.csail.mit.edu/papers/volume7/bratko06a/bratko06a.pdf&quot;&gt;&lt;cite&gt;Journal
of Machine Learning Research&lt;/cite&gt; &lt;strong&gt;7&lt;/strong&gt; (2006): 2673--2698&lt;/a&gt;
	&lt;li&gt;Paul Bohan Broderick, &quot;On Communication and Computation&quot;,
&lt;cite&gt;Minds and Machines&lt;/cite&gt; &lt;strong&gt;14&lt;/strong&gt; (2004): 1--19 [&quot;The most
famous models of computation and communication, Turing Machines and
(Shannon-style) information sources, are considered.  The most significant
difference lies in the types of state-transitions allowed in each sort of
model. This difference does not correspond to the difference that would be
expected after considering the ordinary usage of these terms.&quot;]
	&lt;li&gt;Joachim M. Buhmann, &quot;Information theoretic model validation for clustering&quot;, &lt;a href=&quot;http://arxiv.org/abs/1006.0375&quot;&gt;arxiv:1006.0375&lt;/a&gt;
	&lt;li&gt;Kenneth P. Burnham and David R. Anderson, &lt;cite&gt;Model Selection and
Inference: A Practical Information-Theoretic Approach&lt;/cite&gt;
	&lt;li&gt;Massimo Cencini and Alessandro Torcini, &quot;A nonlinear marginal 
stability criterion for information propagation,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/nlin.CD/0011044&quot;&gt;nlin.CD/0011044&lt;/a&gt; 
	&lt;li&gt;Gregory J. Chaitin 
		&lt;ul&gt; 
		&lt;li&gt;&lt;cite&gt;Algorithmic Information Theory&lt;/cite&gt; 
[&lt;a href=&quot;http://www.umcs.maine.edu/~chaitin/cup.pdf&quot;&gt;online&lt;/a&gt;] 
		&lt;li&gt;&lt;citE&gt;Information, Randomness and Incompleteness&lt;/cite&gt; 
[&lt;a href=&quot;http://www.umcs.maine.edu/~chaitin/ws.ps&quot;&gt;online&lt;/a&gt;] 
		&lt;li&gt;&lt;cite&gt;Information-Theoretic Incompleteness&lt;/cite&gt; 
[&lt;a href=&quot;http://www.umcs.maine.edu/~chaitin/ps3.ps&quot;&gt;online&lt;/a&gt;] 
		&lt;/ul&gt; 
	&lt;li&gt;J. Chen and T. Berger, &quot;The Capacity of Finite-State Markov
Channels with Feedback&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1109/TIT.2004.842697&quot;&gt;&lt;cite&gt;IEEE Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;51&lt;/strong&gt;
(2005): 780--798&lt;/a&gt;
	&lt;li&gt;G. Ciuperca, V. Girardin and L. Lhote, &quot;Computation and Estimation of Generalized Entropy Rates for Denumerable Markov Chains&quot;, &lt;a href=&quot;http://dx.doi.org/10.1109/TIT.2011.2133710&quot;&gt;&lt;cite&gt;IEEE Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;57&lt;/strong&gt; (2011): 4026--4034&lt;/a&gt; [The estimation is just plugging in the MLE of the parameters, for
finitely-parametrized chains, but they claim to show that works well]
	&lt;li&gt;Bob Coecke, &quot;Entropic Geometry from Logic,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/quant-ph/0212065&quot;&gt;quant-ph/0212065&lt;/a&gt; 
	&lt;li&gt;Felix Creutzig, Amir Globerson and Naftali Tishby,
&quot;Past-future information bottleneck in dynamical systems&quot;, &lt;a href=&quot;http://dx.doi.org/10.1103/PhysRevE.79.041925&quot;&gt;&lt;cite&gt;Physical Review E&lt;/cite&gt; &lt;strong&gt;79&lt;/strong&gt; (2009): 041925&lt;/a&gt;
	&lt;li&gt;Imre Csiszar, &quot;The Method of Types&quot;, &lt;cite&gt;IEEE Tranactions on
Information Theory&lt;.cite&gt; &lt;strong&gt;44&lt;/strong&gt; (1998): 2505--2523
[free &lt;a
href=&quot;http://www.stanford.edu/class/ee376a/handout/method_of_types&quot;&gt;PDF
copy&lt;/a&gt;]
	&lt;li&gt;Imre Csiszar and Janos Korner, &lt;cite&gt;Information Theory: Coding 
Theorems for Discrete Memoryless Systems&lt;/cite&gt; 
	&lt;li&gt;Imre Csiszar and Paul Shields, &lt;cite&gt;Information Theory and
Statistics: A Tutorial&lt;/cite&gt;
[&lt;a
href=&quot;http://www.renyi.hu/%7Ecsiszar/Publications/Information_Theory_and_Statistics%3A_A_Tutorial.pdf&quot;&gt;Fulltext
PDF&lt;/a&gt;]
	&lt;li&gt;&amp;#x141;ukasz D&amp;#x119;bowski
		&lt;ul&gt;
		&lt;li&gt;&quot;On vocabulary size of grammar-based codes&quot;,
&lt;a href=&quot;http://arxiv.org/abs/cs.IT/0701047&quot;&gt;cs.IT/0701047&lt;/a&gt;
		&lt;li&gt;&quot;Variable-Length Coding of Two-sided
Asymptotically Mean Stationary Measures&quot;, &lt;a href=&quot;http://dx.doi.org/10.1007/s10959-009-0264-0&quot;&gt;&lt;cite&gt;Journal of Theoretical Probability&lt;/cite&gt; &lt;strong&gt;23&lt;/strong&gt; (2009): 237--256&lt;/a&gt;
		&lt;li&gt;&quot;Mixing, Ergodic, and Nonergodic Processes with Rapidly Growing Information between Blocks&quot;, &lt;a href=&quot;http://arxiv.org/abs/1103.3952&quot;&gt;arxiv:1103.3952&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;Gustavo Deco and Bernd Schurmann, &lt;cite&gt;Information Dynamics: 
Foundations and Applications&lt;/cite&gt; 
	&lt;li&gt;Amir Dembo, &quot;Information Inequalities and Concentration of
Measure&quot;, &lt;cite&gt;The Annals of Probability&lt;/cite&gt; &lt;strong&gt;25&lt;/strong&gt; (1997):
927--939 [&quot;We derive inequalities of the form \Delta(P,Q) =&lt; H(P|R) + H(Q|R)
which hold for every choice of probability measures P, Q, R, where H(P|R)
denotes the relative entropy of P with respect to R and \Delta(P,Q) stands for
a coupling type 'distance' between P and Q.&quot;]
	&lt;li&gt;A. Dembo and I. Kontoyiannis, &quot;Source Coding, Large Deviations, 
and Approximate Pattern Matching,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/math.PR/0103007&quot;&gt;math.PR/0103007&lt;/a&gt; 
	&lt;li&gt;Steffen Dereich, &quot;The quantization complexity of diffusion 
processes&quot;, &lt;a href=&quot;http://arxiv.org/abs/math.PR/0411597&quot;&gt;math.PR/0411597&lt;/a&gt; 
	&lt;li&gt;Joseph DeStefano and Erik Learned-Miller, &quot;A Probabilistic Upper
Bound on Differential Entropy&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cs.IT/0504091&quot;&gt;cs.IT/0504091&lt;/a&gt; [&quot;A novel,
non-trivial, probabilistic upper bound on the entropy of an unknown
one-dimensional distribution, given the support of the distribution and a
sample from that distribution...&quot;]
	&lt;li&gt;David Doty, &quot;Every sequence is compressible to a random one&quot;,
&lt;a href=&quot;http://arxiv.org/abs/cs.IT/0511074&quot;&gt;cs.IT/0511074&lt;/a&gt; [&quot;Kucera and
Gacs independently showed that every infinite sequence is Turing reducible to a
Martin-Lof random sequence. We extend this result to show that every infinite
sequence S is Turing reducible to a Martin-Lof random sequence R such that the
asymptotic number of bits of R needed to compute n bits of S, divided by n, is
precisely the constructive dimension of S.&quot;]
	&lt;li&gt;David Doty and Jared Nichols, &quot;Pushdown Dimension&quot;,
&lt;a href=&quot;http://arxiv.org/abs/cs.IT/0504047&quot;&gt;cs.IT/0504047&lt;/a&gt;
	&lt;li&gt;Dowe, Korb and Oliver (eds.), &lt;cite&gt;Information, Statistics and 
Induction in Science&lt;/cite&gt; 
	&lt;li&gt;Tomasz Downarowicz, &lt;cite&gt;Entropy in Dynamical Systems&lt;/cite&gt;
	&lt;li&gt;M. Drmota and W. Szpankowski, &quot;Precise minimax redundancy and
regret&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1109/TIT.2004.836702&quot;&gt;&lt;cite&gt;IEEE Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;50&lt;/strong&gt;
(2004): 2686--2707&lt;/a&gt;
	&lt;li&gt;Ebanks, Sahoo and Sander, &lt;Cite&gt;Characterization of Information 
Measures&lt;/cite&gt; 
	&lt;li&gt;Werner Ebeling and Thorsten Poeschel, &quot;Entropy and Long range 
correlations in literary English,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/cond-mat/0204108&quot;&gt;cond-mat/0204108&lt;/a&gt; 
	&lt;li&gt;Karl-Erik Eriksson, Kristian Lindgren, Bengt &amp;Aring;. 
M&amp;aring;nsson, &lt;cite&gt;Structure, Context, Complexity, Organization: Physical 
Aspects of Information and Value&lt;/cite&gt; [The sort of title which usually makes 
me run away, but actually full of content] 
	&lt;li&gt;Roger Filliger and Max-Olivier Hongler, &quot;Relative entropy and
efficiency measure for diffusion-mediated transport processes&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1088/0305-4470/38/6/005&quot;&gt;&lt;cite&gt;Journal of Physics A:
Mathematical and General&lt;/cite&gt; &lt;strong&gt;38&lt;/strong&gt; (2005): 1247--1255&lt;/a&gt; [&quot;We
propose an efficiency measure for diffusion-mediated transport processes
including molecular-scale engines such as Brownian motors.... Ultimately, the
efficiency measure can be directly interpreted as the relative entropy between
two probability distributions, namely: the distribution of the particles in the
presence of the external rectifying force field and a reference distribution
describing the behavior in the absence of the rectifier&quot;.  Interesting for the
link between relative entropy and energetics.]
	&lt;li&gt;Flocchini &lt;em&gt;et al.&lt;/em&gt; (eds.), &lt;cite&gt;Structure, Information and 
Communication Complexity&lt;/cite&gt;
	&lt;li&gt;H. Follmer, &quot;On entropy and information gain
in random fields&quot;, &lt;cite&gt;Z. Wahrsh. verw. Geb.&lt;/cite&gt; &lt;strong&gt;26&lt;/strong&gt;
91973): 207--217
	&lt;li&gt;David V. Foster and Peter Grassberger, &quot;Lower bounds on mutual
information&quot;, &lt;a href=&quot;http://dx.doi.org/10.1103/PhysRevE.83.010101&quot;&gt;&lt;cite&gt;Physical
Review E&lt;/cite&gt; &lt;strong&gt;83&lt;/strong&gt; (2011): 010101&lt;/a&gt;
	&lt;li&gt;Travis Gagie, &quot;Compressing Probability Distributions&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cs.IT/0506016&quot;&gt;cs.IT/0506016&lt;/a&gt; [&lt;em&gt;Abstract&lt;/em&gt;
(in full): &quot;We show how to store good approximations of probability
distributions in small space.&quot;]
	&lt;li&gt;Pierre Gaspard, &quot;Time-Reversed Dynamical Entropy and
Irreversibility in Markovian Random Processes&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1007/s10955-004-3455-1&quot;&gt;&lt;cite&gt;Journal of Statistical
Physics&lt;/cite&gt; &lt;strong&gt;117&lt;/strong&gt; (2004): 599--615&lt;/a&gt;
	&lt;li&gt;George M. Gemelos and Tsachy Weissman, &quot;On the Entropy Rate of
Pattern Processes&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cs.IT/0504046&quot;&gt;cs.IT/0504046&lt;/a&gt;
	&lt;li&gt;Josep Ginebra, &quot;On the Measure of the Information in a Statistical
Experiment&quot;, &lt;a
href=&quot;http://ba.stat.cmu.edu/journal/2007/vol02/issue01/ginebra.pdf&quot;&gt;&lt;cite&gt;Bayesian
Analysis&lt;/cite&gt; &lt;strong2&lt;/strong&gt; (2007): 167--212&lt;/a&gt;
	&lt;li&gt;M. Godavarti and A. Hero, &quot;Convergence of Differential Entropies&quot;, 
&lt;cite&gt;IEEE Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;50&lt;/strong&gt;
(2004): 171--176
	&lt;li&gt;Goldman, &lt;cite&gt;Information Theory&lt;/cite&gt; [Old (1965) text, but has 
some interesting time-series stuff which has dropped out of most modern 
presentations] 
	&lt;li&gt;Alexander N. Gorban, Iliya V. Karlin and Hans Christian Ottinger, 
&quot;The additive generalization of the Boltzmann entropy,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/cond-mat/0209319&quot;&gt;cond-mat/0209319&lt;/a&gt; [The 
abstract sounds like a rediscovery of Renyi entropies --- &lt;a 
href=&quot;http://bactra.org/weblog/234.html&quot;&gt;there's a lot of that going 
around&lt;/a&gt; --- but presumably there's more] 
	&lt;li&gt;Green and Swets, &lt;cite&gt;Signal Detection Theory and 
Psychophysics&lt;/cite&gt; 
	&lt;li&gt;A. Greven, G. Keller and G. Warnecke (eds.), &lt;cite&gt;Entropy&lt;/cite&gt; 
	&lt;li&gt;Peter Gr&amp;uuml;nwald and Paul Vit&amp;aacute;nyi, &quot;Shannon Information
and Kolmogorov Complexity&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cs.IT/0410002&quot;&gt;cs.IT/0410002&lt;/a&gt;
	&lt;li&gt;Sudipto Guha, Andrew McGregor and Suresh Venkatasubramanian,
&quot;Streaming and Sublinear Approximation of Entropy and Information Distances&quot;,
&lt;cite&gt;17th ACM-SIAM Symposium on Discrete Algorithms, 2006&lt;/cite&gt;
[&lt;a href=&quot;http://www.research.att.com/~suresh/papers/jstest/&quot;&gt;Link via
Suresh&lt;/a&gt;]
	&lt;li&gt;Michael J. W. Hall, &quot;Universal Geometric Approach to Uncertainity, 
Entropy and Information,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/physics/9903045&quot;&gt;physics/9903045&lt;/a&gt; 
	&lt;li&gt;Guangyue Han, &quot;Limit Theorems for the Sample Entropy of Hidden Markov Chains&quot;, &lt;a href=&quot;http://arxiv.org/abs/1102.0365&quot;&gt;arxiv:1102.0365&lt;/a&gt;
	&lt;li&gt;Guangyue Han and Brian Marcus, &quot;Analyticity of Entropy Rate in
Families of Hidden Markov Chains&quot;, &lt;a
href=&quot;http://arxiv.org/abs/math.PR/0507235&quot;&gt;math.PR/0507235&lt;/a&gt;
	&lt;li&gt;Te Sun Han
		&lt;ul&gt;
		&lt;li&gt;&quot;Hypothesis Testing with the General Source&quot;,
&lt;a href=&quot;http://dx.doi.org/10.1109/18.887854&quot;&gt;&lt;cite&gt;IEEE Transactions on
Information Theory&lt;/cite&gt; &lt;strong&gt;46&lt;/strong&gt; (2000): 2415--2427&lt;/a&gt; =
&lt;a href=&quot;http://arxiv.org/abs/math.PR/0004121&quot;&gt;math.PR/0004121&lt;/a&gt; [&quot;The
asymptotically optimal hypothesis testing problem with the general sources as
the null and alternative hypotheses is studied.... Our fundamental philosophy
in doing so is first to convert all of the hypothesis testing problems
completely to the pertinent computation problems in the large
deviation-probability theory. ... [This] enables us to establish quite compact
general formulas of the optimal exponents of the second kind of error and
correct testing probabbilities for the general sources including all
nonstationary and/or nonergodic sources with arbitrary abstract alphabet
(countable or uncountable). Such general formulas are presented from the
information-spectrum point of view.&quot;]
		&lt;li&gt;&quot;Folklore in Source Coding: Information-Spectrum
Approach&quot;, &lt;a href=&quot;http://dx.doi.org/10.1109/TIT.2004.840860&quot;&gt;&lt;cite&gt;IEEE
Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;51&lt;/strong&gt; (2005):
747--753&lt;/a&gt; [From the abstract: &quot;we verify the validity of the folklore that
the output from any source encoder working at the optimal coding rate with
asymptotically vanishing probability of error looks like almost completely
random.&quot;]
		&lt;li&gt;&quot;An information-spectrum approach to large deviation
theorems&quot;, &lt;a href=&quot;http://arxiv.org/abs/cs.IT/0606104&quot;&gt;cs.IT/0606104&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;Te Sun Han and Kingo Kobayashi, &lt;cite&gt;Mathematics of Information
and Coding&lt;/citE&gt; [I've read about half of this; it's quite
good.  &lt;a href=&quot;http://www.oup.co.uk/isbn/0-8218-0534-7&quot;&gt;Blurb&lt;/a&gt;]
	&lt;li&gt;Masahito Hayashi, &quot;Second order asymptotics in fixed-length source
coding and intrinsic randomness&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cs.IT/0503089&quot;&gt;cs.IT/0503089&lt;/a&gt;
	&lt;li&gt;Nicolai T. A. Haydn, &quot;The Central Limit Theorem for uniformly strong mixing measures&quot;, &lt;a href=&quot;http://arxiv.org/abs/0903.1325&quot;&gt;arxiv:0903.1325&lt;/a&gt;
	&lt;li&gt;Nicolai Haydn and Sandro Vaienti, &quot;Fluctuations of the Metric
Entropy for Mixing Measures&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1142/S021949370400119X&quot;&gt;&lt;cite&gt;Stochastics and
Dynamics&lt;/cite&gt; &lt;strong&gt;4&lt;/strong&gt; (2004): 595--627&lt;/a&gt;
	&lt;li&gt;D.-K. He and E.-H. Yang, &quot;The Universality of Grammar-Based Codes
for Sources With Countably Infinite
Alphabets&quot;, &lt;a href=&quot;http://dx.doi.org/10.1109/TIT.2005.856948&quot;&gt;&lt;cite&gt;IEEE
Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;51&lt;/strong&gt; (2005):
3753--3765&lt;/a&gt;
	&lt;li&gt;Torbjorn Helvik, Kristian Lindgren and Mats G. Nordahl, &quot;Continity
of Information Transport in Surjective Cellular Automata&quot; [thanks to Mats
for a preprint]
	&lt;li&gt;Yoshito Hirata and Alistair I. Mees, &quot;Estimating topological 
entropy via a symbolic data compression technique,&quot; &lt;citE&gt;Physical Review 
E&lt;/cite&gt; &lt;strong&gt;67&lt;/strong&gt; (2003): 026205 
	&lt;li&gt;S.-W. Ho and S. Verdu, &quot;On the Interplay Between Conditional Entropy and Error Probability&quot;, &lt;a href=&quot;http://dx.doi.org/10.1109/TIT.2010.2080891&quot;&gt;&lt;cite&gt;IEEE Transactions on Information
Theory&lt;/cite&gt; &lt;strong&gt;56&lt;/strong&gt; (2010): 5930--5942&lt;/a&gt;
	&lt;li&gt;S.-W. Ho and R. W. Yeung, &quot;The Interplay Between Entropy and Variational Distance&quot;, &lt;a href=&quot;http://dx.doi.org/10.1109/TIT.2010.2080452&quot;&gt;&lt;cite&gt;IEEE Transactions on Information
Theory&lt;/cite&gt; &lt;strong&gt;56&lt;/strong&gt; (2010): 5906--5929&lt;/a&gt;
	&lt;li&gt;Michael Hochman, &quot;Upcrossing Inequalities for Stationary Sequences and Applications to Entropy and Complexity&quot;, &lt;a href=&quot;http://arxiv.org/abs/math.DS/0608311&quot;&gt;arxiv:math.DS/0608311&lt;/a&gt; [where &quot;complexity&quot; = algorithmic
information content]
	&lt;li&gt;M. Hotta and I. Jochi, &quot;Composability and Generalized Entropy,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/cond-mat/9906377&quot;&gt;cond-mat/9906377&lt;/a&gt; 
	&lt;li&gt;Marcus Hutter, &quot;Distribution of Mutual Information,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/cs.AI/0112019&quot;&gt;cs.AI/0112019&lt;/a&gt; 
	&lt;li&gt;Marcus Hutter and Marco Zaffalon, &quot;Distribution of mutual
information from complete and incomplete data&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1016/j.csda.2004.03.010&quot;&gt;&lt;cite&gt;Computational
Statistics and Data Analysis&lt;/cite&gt; &lt;strong&gt;48&lt;/strong&gt; (2004): 633--657&lt;/a&gt;;
also in the arxiv someplace
	&lt;li&gt;Shunsuke Ihara, &lt;cite&gt;Information Theory for Continuous 
Systems&lt;/cite&gt; 
	&lt;li&gt;K. Iriyama
		&lt;ul&gt;
		&lt;li&gt;&quot;Error Exponents for Hypothesis Testing of the General
Source&quot;, &lt;a href=&quot;http://dx.doi.org/10.1109/TIT.2004.842774&quot;&gt;&lt;cite&gt;IEEE
Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;51&lt;/strong&gt; (2005):
1517--1522&lt;/a&gt;
		&lt;li&gt;&quot;Probability of Error for the Fixed-Length Lossy Coding of
General Sources&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1109/TIT.2004.842777&quot;&gt;&lt;cite&gt;IEEE Transactions on
Information Theory&lt;/cite&gt; &lt;strong&gt;51&lt;/strong&gt; (2005): 1498--1507&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;K. Iwata, K. Ikeada and H. Sakai, &quot;A Statistical Property of
Multiagent Learning Based on Markov Decision
Process&quot;, &lt;a href=&quot;http://dx.doi.org/10.1109/TNN.2006.875990&quot;&gt;&lt;cite&gt;IEEE
Transactions on Neural Networks&lt;/cite&gt;
&lt;strong&gt;17&lt;/strong&gt; (2006): 829--842&lt;/a&gt; [The property is asymptotic
equipartiton!]
	&lt;li&gt;Herve Jegou and Christine Guillemot, &quot;Entropy coding with Variable
Length Re-writing
Systems&quot;, &lt;a href=&quot;http://arxiv.org/abs/cs.IT/0508058&quot;&gt;cs.IT/0508058&lt;/a&gt; [&quot;This
paper describes a new set of block source codes well suited for data
compression. These codes are defined by sets of productions rules of the form
a.l-&gt;b, where a in A represents a value from the source alphabet A and l, b are
-small- sequences of bits.... [A] construction method [is given which] allows
to obtain [&lt;em&gt;sic&lt;/em&gt;] codes such that the marginal bit probability converges
to 0.5 as the sequence length increases and this is achieved even if the
probability distribution function is not known by the encoder.&quot;]
	&lt;li&gt;Petr Jizba and Toshihico Arimitsu, &quot;The world according to Renyi: 
Thermodynamics of multifractal systems,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/cond-mat/0207707&quot;&gt;cond-mat/0207707&lt;/a&gt; 
	&lt;li&gt;Oliver Johnson 
		&lt;ul&gt; 
		&lt;li&gt;&quot;A conditional Entropy Power Inequality for dependent 
variables,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/math.PR/0111021&quot;&gt;math.PR/0111021&lt;/a&gt; 
		&lt;li&gt;&quot;Entropy and a generalisation of `Poincare's Observation',&quot;
&lt;a href=&quot;http://arxiv.org/abs/math.PR/0201273&quot;&gt;math.PR/0201273&lt;/a&gt; 
		&lt;/ul&gt; 
	&lt;li&gt;Oliver Johnson and Andrew Barron, &quot;Fisher Information inequalities 
and the Central Limit Theorem,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/math.PR/0111020&quot;&gt;math.PR/0111020&lt;/a&gt; 
= &lt;cite&gt;Probability Theory and Related Fields&lt;/cite&gt; &lt;strong&gt;129&lt;/strong&gt; 
(2004): 391--409 
	&lt;li&gt;Ido Kanter and Hanan Rosemarin, &quot;Communication near the channel 
capacity with an absence of compression: Statistical Mechanical Approach,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/cond-mat/0301005&quot;&gt;cond-mat/0301005&lt;/a&gt; 
	&lt;li&gt;Holger Kantz and Thomas Schuermann, &quot;Enlarged scaling ranges for 
the KS-entropy and the information dimension,&quot; &lt;cite&gt;Chaos&lt;/cite&gt; 
&lt;strong&gt;6&lt;/strong&gt; (1996): 167--171 = 
&lt;a href=&quot;http://arxiv.org/abs/cond-mat/0203439&quot;&gt;cond-mat/0203439&lt;/a&gt; 
	&lt;li&gt;Hillol Kargupta, &quot;Information Transmission in Genetic Algorithm 
and Shannon's Second Theorem&quot; 
	&lt;li&gt;Matthew B. Kennel, &quot;Testing time symmetry in time series 
using data compression dictionaries&quot;, &lt;cite&gt;Physical Review E&lt;/cite&gt; 
&lt;strong&gt;69&lt;/strong&gt; (2004): 056208 
	&lt;li&gt;D. F. Kerridge, &quot;Inaccuracy and Inference&quot;, &lt;cite&gt;Journal of 
the Royal Statistical Society B&lt;/cite&gt; &lt;strong&gt;23&lt;/strong&gt; (1961): 184--194 
	&lt;li&gt;J. C. Kieffer and E.-H. Yang, &quot;Grammar-Based Lossless Universal
Refinement Source Coding&quot;, &lt;cite&gt;IEEE Transactions on Information
Theory&lt;/cite&gt; &lt;strong&gt;50&lt;/strong&gt; (2004): 1415--1424
	&lt;li&gt;&lt;a href=&quot;http://www.dam.brown.edu/people/yiannis/&quot;&gt;Ioannis 
Kontoyiannis&lt;/a&gt; 
		&lt;ul&gt; 
		&lt;li&gt;&quot;The Complexity and Entropy of Literary Styles&quot; 
		&lt;li&gt;&quot;Model Selection via Rate-Distortion Theory&quot; 
		&lt;li&gt;&quot;Some information-theoretic computations related to the distribution of prime numbers&quot;, &lt;a href=&quot;http://arxiv.org/abs/0710.4076&quot;&gt;arxiv:0710.4076&lt;/a&gt;
		&lt;/ul&gt; 
	&lt;li&gt;Bernard H. Lavenda, &quot;Information and coding discrimination of 
pseudo-additive entropies (PAE)&quot;, &lt;a 
href=&quot;http://arxiv.org/abs/cond-mat/0403591&quot;&gt;cond-mat/0403591&lt;/a&gt; 
	&lt;li&gt;Tue Lehn-schioler, Anant Hegde, Deniz Erdogmus and Jose
C. Principe, &quot;Vector quantization using information theoretic concepts&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1007/s11047-004-9619-8&quot;&gt;&lt;cite&gt;Natural
Computation&lt;/cite&gt; &lt;strong&gt;4&lt;/strong&gt; (2005): 39--51&lt;/a&gt; [&quot;it becomes clear
that minimizing the free energy of the system is in fact equivalent to
minimizing a divergence measure between the distribution of the data and the
distribution of the processing elements, hence, the algorithm can be seen as a
density matching method.&quot;]
	&lt;li&gt;F. Liang and A. Barron, &quot;Exact Minimax Strategies for Predictive 
Density Estimation, Data Compression, and Model Selection&quot;, &lt;a 
href=&quot;http://dx.doi.org/0.1109/TIT.2004.836922&quot;&gt;&lt;cite&gt;IEEE Transactions on Information Theory&lt;/cite&gt; 
&lt;strong&gt;50&lt;/strong&gt; (2004): 2708--2726&lt;/a&gt;
	&lt;li&gt;Kristian Lindgren, &quot;Information Theory for Complex Systems&quot;
[&lt;a href=&quot;http://frt.fy.chalmers.se/cs/cas/courses/infotheory/LectureNotes.html&quot;&gt;Online lecture notes&lt;/a&gt;, January 2003]
	&lt;li&gt;Niklas L&amp;uuml;dtke, Stefano Panzeri, Martin Brown, David
S. Broomhead, Joshua Knowles, Marcelo A. Montemurro, Douglas B. Kell,
&quot;Information-theoretic sensitivity analysis: a general method for credit assignment in complex networks&quot;, &lt;a href=&quot;http://dx.doi.org/10.1098/rsif.2007.1079&quot;&gt;&lt;cite&gt;Journal of
the Royal Society: Interface&lt;/cite&gt; &lt;strong&gt;&lt;/strong&gt; forthcoming (2007)&lt;/a&gt;
	&lt;li&gt;E. Lutwak, D. Yang and G. Zhang, &quot;Cramer-Rao and Moment-Entropy
Inequalities for Renyi Entropy and Generalized Fisher Information&quot;, &lt;cite&gt;IEEE
Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;51&lt;/strong&gt; (2005): 473--478
	&lt;li&gt;Christian K. Machens, &quot;Adaptive sampling by information 
maximization,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/physics/0112070&quot;&gt;physics/0112070&lt;/a&gt; 
	&lt;li&gt;David J. C. MacKay 
		&lt;ul&gt; 
		&lt;li&gt;&lt;cite&gt;Information Theory, Inference and 
Learning Algorithms&lt;/cite&gt; [&lt;a 
href=&quot;http://www.inference.phy.cam.ac.uk/mackay/itprnn/book.html&quot;&gt;Online 
version&lt;/a&gt;] 
		&lt;li&gt;&quot;Rate of Information Acquisition by a Species subjected to 
Natural Selection&quot; [&lt;a 
href=&quot;http://www.inference.phy.cam.ac.uk/mackay/Evolution.html&quot;&gt;Link&lt;/a&gt;] 
		&lt;/ul&gt; 
	&lt;li&gt;Donald Mackay, &lt;cite&gt;Information, Mechanism and Meaning&lt;/cite&gt; 
[Trying to coax some notion of &quot;meaning&quot; out of information theory; I've not 
read it yet, but Mackay was quite good.] 
	&lt;li&gt;Mokshay Madiman and Prasad Tetali, &quot;Information Inequalities for Joint Distributions, With Interpretations and Applications&quot;, &lt;a href=&quot;http://dx.doi.org/10.1109/TIT.2010.2046253&quot;&gt;&lt;cite&gt;IEEE
Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;56&lt;/strong&gt; (2010): 2699--2713&lt;/a&gt;, &lt;a href=&quot;http://arxiv.org/abs/0901.0044&quot;&gt;arxiv:0901.0044&lt;/a&gt;
	&lt;li&gt;Andrew J. Majda, Rafail V. Abramov and Marcus J. Grote,
&lt;cite&gt;Information Theory and Stochastic for Multiscale Nonlinear Systems&lt;/cite&gt;
[Sounds interesting, to judge from the &lt;a
href=&quot;http://www.oup.co.uk/isbn/0-8218-3843-1&quot;&gt;blurb&lt;/a&gt;.  &lt;a
href=&quot;http://www.cims.nyu.edu/~abramov/paper.php?mag.pdf&quot;&gt;PDF draft?&lt;/a&gt;]
	&lt;li&gt;David Malone and Wayne J. Sullivan, &quot;Guesswork and Entropy&quot;, 
&lt;cite&gt;IEEE Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;50&lt;/strong&gt; 
(2004): 525--526 
	&lt;li&gt;Brian Marcus, Karl Petersen and Tsachy Weissman (eds.), &lt;cite&gt;Entropy of Hidden Markov Processes and Connections to Dynamical Systems&lt;/cite&gt;
[&lt;a href=&quot;http://cambridge.org/9780521111133&quot;&gt;blurb&lt;/a&gt;]
	&lt;li&gt;Emin Martinian, Gregory W. wornell and Ram Zamir, &quot;Source Coding
with Encoder side Information&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cs.IT/0512112&quot;&gt;cs.IT/0512112&lt;/a&gt;
	&lt;li&gt;Eddy Mayer-Wolf and Moshe Zakai, &quot;Some relations between mutual
information and estimation error on Wiener
space&quot;, &lt;a href=&quot;http://arxiv.org/abs/math.PR/0610024&quot;&gt;math.PR/0610024&lt;/a&gt;
	&lt;li&gt;Robert J. McEliece, &lt;citE&gt;The Theory of Information and
Coding&lt;/cite&gt;
	&lt;li&gt;N. Merhav and M. J. Weinberger, &quot;On Universal Simulation of
Information Sources Using Training Data&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1109/TIT.2003.821993&quot;&gt;&lt;cite&gt;IEEE Transactions on Information Theory&lt;/cite&gt;
&lt;strong&gt;50&lt;/strong&gt; (2004): 5--20&lt;/a&gt;; with Addendum, &lt;a
href=&quot;http://dx.doi.org/10.1109/TIT.2005.853324&quot;&gt;&lt;cite&gt;IEEE Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;51&lt;/strong&gt;
(2005): 3381--3383&lt;/a&gt;
	&lt;li&gt;E. Meron and M. Feder, &quot;Finite-Memory Universal Prediction of
Individual Sequences&quot;, &lt;cite&gt;IEEE Transactions on Information
Theory&lt;/cite&gt; &lt;strong&gt;50&lt;/strong&gt; (2004): 1506--1523
	&lt;li&gt;Patrick Mitran, &quot;Typical Sequences for Polish Alphabets&quot;,
&lt;a href=&quot;http://arxiv.org/abs/1005.2321&quot;&gt;arxiv:1005.2321&lt;/a&gt;
	&lt;li&gt;Sanjoy K. Mitter and Nigel J. Newton, &quot;Information and Entropy Flow
in the Kalman-Bucy Filter&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1007/s10955-004-8781-9&quot;&gt;&lt;cite&gt;Journal of Statistical
Physics&lt;/cite&gt; &lt;strong&gt;118&lt;/strong&gt; (2005): 145--176&lt;/a&gt; [This looks rather
strange, from the abstract, but potentially interesting...]
	&lt;li&gt;Roberto Monetti, Wolfram Bunk, Thomas Aschenbrenner and
Ferdinand Jamitzky, &quot;Characterizing synchronization in time series using information measures extracted from symbolic representations&quot;, &lt;a href=&quot;http://dx.doi.org/10.1103/PhysRevE.79.046207&quot;&gt;&lt;cite&gt;Physical Review E&lt;/cite&gt; &lt;strong&gt;79&lt;/strong&gt; (2009): 046207&lt;/a&gt;
	&lt;li&gt;Andrea Montanari, &quot;The glassy phase of Gallager codes,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/cond-mat/0104079&quot;&gt;cond-mat/0104079&lt;/a&gt; 
	&lt;li&gt;G. W. M&amp;uuml;ller, &quot;Randomness and extrapolation&quot;,
&lt;a href=&quot;http://projecteuclid.org/euclid.bsmsp/1200514209&quot;&gt;&lt;cite&gt;Proceedings of
the Sixth Berkeley Symposium on Mathematical Statistics and Probability&lt;/cite&gt;,
Vol. 2 (Univ. of Calif. Press, 1972), 1--31&lt;/a&gt; [On a notion of randomness
supposedly related to, but stronger than, that of Martin-L&amp;ouml;f.]
	&lt;li&gt;Ziad Naja, Florence Alberge and P. Duhamel, &quot;Geometrical interpretation and improvements of the Blahut-Arimoto's algorithm&quot;, &lt;a href=&quot;http://arxiv.org/abs/1001.1915&quot;&gt;arxiv:1001.1915&lt;/a&gt;
	&lt;li&gt;Ilya Nemenman, &quot;Information theory, multivariate dependence, and
genetic network
inference&quot;, &lt;a
href=&quot;http://arxiv.org/abs/q-bio.QM/0406015&quot;&gt;q-bio.QM/0406015&lt;/a&gt;
	&lt;li&gt;E. Ordentlich and M. J. Weinberger, &quot;A Distribution Dependent
Refinement of Pinsker's
Inequality&quot;, &lt;a href=&quot;http://dx.doi.org/10.1109/TIT.2005.846407&quot;&gt;&lt;cite&gt;IEEE
Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;51&lt;/strong&gt; (2005):
1836--1840&lt;/a&gt; [As you know, Bob, Pinsker's inequality uses the total variation
distance between two distributions to put a lower bound on their
Kullback-Leibler divergence.]
	&lt;li&gt;Leandro Pardo, &lt;cite&gt;Statistical Inference Based on
Divergence Measures&lt;/cite&gt;
	&lt;li&gt;&lt;a href=&quot;http://www.cs.cas.cz/~mp&quot;&gt;Milan Palus&lt;/a&gt;, &quot;Coarse-grained
entropy rate for characterization of complex time series&quot;, &lt;cite&gt;Physica
D&lt;/cite&gt; &lt;strong&gt;93&lt;/strong&gt; (1996): 64--77 [Thanks to Prof. Palus for a
reprint]
	&lt;li&gt;Liam Paninski, &quot;Asymptotic Theory of Information-Theoretic
Experimental
Design&quot;, &lt;a
href=&quot;http://neco.mitpress.org/cgi/content/abstract/17/7/1480&quot;&gt;&lt;cite&gt;Neural
Computation&lt;/cite&gt; &lt;strong&gt;17&lt;/strong&gt; (2005): 1480--1507&lt;/a&gt;
	&lt;li&gt;Hanchuan Peng, Fuhui Long and Chris Ding, &quot;Feature Selection Based
on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and
Min-Redundancy&quot;, &lt;a href=&quot;http://dx.doi.org/10.1109/TPAMI.2005.159&quot;&gt;&lt;cite&gt;IEEE
Transactions on Pattern Analysis and Machine
Intelligence&lt;/cite&gt; &lt;strong&gt;27&lt;/strong&gt; (2005): 1226--1238&lt;/a&gt; [This sounds
like an idea I had in 2002, and was too dumb/lazy to follow up on.]
	&lt;li&gt;Haim H. Permuter, Young-Han Kim and Tschay Weissman, &quot;Interpretations
of Directed Information in Portfolio Theory, Data Compression,
and Hypothesis Testing&quot;, &lt;a href=&quot;http://arxiv.org/abs/0912.4872&quot;&gt;arxiv:0912.4872&lt;/a&gt;
	&lt;li&gt;Denes Petz, &quot;Entropy, von Neumann and the von Neumann Entropy,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/math-ph/0102013&quot;&gt;math-ph/0102013&lt;/a&gt; 
	&lt;li&gt;C.-E. Pfister and W. G. Sullivan, &quot;Renyi entropy, guesswork
moments, and large
deviations&quot;, &lt;a href=&quot;http://dx.doi.org/10.1109/TIT.2004.836665&quot;&gt;&lt;cite&gt;IEEE
Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;50&lt;/strong&gt; (2004):
2794--2800&lt;/a&gt;
	&lt;li&gt;Hong Qian, &quot;Relative Entropy: Free Energy Associated with
Equilibrium Fluctuations and Nonequilibrium
Deviations&quot;, &lt;a href=&quot;http://arxiv.org/abs/math-ph/0007010&quot;&gt;math-ph/0007010&lt;/a&gt;
= &lt;a href=&quot;http://dx.doi.org/10%2E1103/PhysRevE%2E63%2E042103&quot;&gt;&lt;cite&gt;Physical
Review E&lt;/cite&gt; &lt;strong&gt;63&lt;/strong&gt; (2001): 042103&lt;/a&gt;
	&lt;li&gt;Ziad Rached, Fady Alajaji and L. Lorne Campbell
		&lt;ul&gt;
		&lt;li&gt;&quot;R&amp;eacute;nyi's Divergence and Entropy Rates for Finite
Alphabet Markov
Sources&quot;, &lt;a href=&quot;http://dx.doi.org/10.1109/18.923736&quot;&gt;&lt;cite&gt;IEEE Transactions
on Information Theory&lt;/cite&gt; &lt;strong&gt;47&lt;/strong&gt; (2001): 1553--1561&lt;/a&gt;
		&lt;li&gt;&quot;The Kullback-Leibler Divergence Rate Between Markov
Sources&quot;, &lt;a href=&quot;10.1109/TIT.2004.826687&quot;&gt;&lt;cite&gt;IEEE Transactions on
Information Theory&lt;/cite&gt; &lt;strong&gt;50&lt;/strong&gt; (2004): 917--921&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;Yaron Rachlin, Rohit Negi and Pradeep Khosla, &quot;Sensing Capacity for
Markov Random Fields&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cs.IT/0508054&quot;&gt;cs.IT/0508054&lt;/a&gt;
	&lt;li&gt;Maxim Raginsky
		&lt;ul&gt;
		&lt;li&gt;&quot;Achievability results for statistical learning
under communication
constraints&quot;, &lt;a href=&quot;http://arxiv.org/abs/0901.1905&quot;&gt;arxiv:0901.1905&lt;/a&gt;
		&lt;li&gt;&quot;Joint universal lossy coding and identification of stationary mixing sources with general alphabets&quot;, &lt;a href=&quot;http://arxiv.org/abs/0901.1904&quot;&gt;arxiv:0901.1904&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;M. Rao, Y. Chen, B. C. Vemuri and F. Wang, &quot;Cumulative Residual
Entropy: A New Measure of Information&quot;, &lt;cite&gt;IEEE Transactions on Information
Theory&lt;/cite&gt; &lt;strong&gt;50&lt;/strong&gt; (2004): 1220--1228
	&lt;li&gt;Juan Ram&amp;oacute;n Rico-Juan, Jorge Calera-Rubio and Rafael
C. Carrasco, &quot;Smoothing and compression with stochastic k-testable tree
languages&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1016/j.patcog.2004.03.024&quot;&gt;&lt;cite&gt;Pattern
Recognition&lt;/cite&gt; &lt;strong&gt;38&lt;/strong&gt; (2005): 1420--1430&lt;/a&gt;
	&lt;li&gt;Mohammad Rezaeian, &quot;Hidden Markov Process: A New Representation,
Entropy Rate and Estimation
Entropy&quot;, &lt;a href=&quot;http://arxiv.org/abs/cs.IT/0606114&quot;&gt;cs.IT/0606114&lt;/a&gt;
	&lt;li&gt;E. Rivals and J.-P. Delahae, &quot;Optimal Representation in Average 
Using Kolmogorov Complexity,&quot; &lt;cite&gt;Theoretical Computer Science&lt;/cite&gt; 
&lt;Strong&gt;200&lt;/strong&gt; (1998): 261--287 
	&lt;li&gt;Reuven Y. Rubinstein, &quot;A Stochastic Minimum Cross-Entropy Method for
Combinatorial Optimization and Rare-event Estimation&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1007/s11009-005-6653-7&quot;&gt;&lt;cite&gt;Methodology and
Computing in Applied Probability&lt;/cite&gt; &lt;strong&gt;7&lt;/strong&gt; (2005): 5--50&lt;/a&gt;
	&lt;li&gt;Reuven Y. Rubinstein and Dirk P. Kroese, &lt;cite&gt;The Cross-Entropy
Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo
Simulation, and Machine Learning,&lt;/cite&gt;
	&lt;li&gt;Brois Ryabko, &quot;Applications of Universal Source Coding to Statistical Analysis of Time Series&quot;, &lt;a href=&quot;http://arxiv.org/abs/0809.1226&quot;&gt;arxiv:0809.1226&lt;/a&gt;
	&lt;li&gt;Boris Ryabko and Jaakko Astola
		&lt;ul&gt;
		&lt;li&gt;&quot;Prediction of Large Alphabet Processes and Its Application
to Adaptive Source
Coding&quot;, &lt;a href=&quot;http://arxiv.org/abs/cs.IT/0504079&quot;&gt;cs.IT/0504079&lt;/a&gt;
		&lt;li&gt;&quot;Universal Codes as a Basis for Time Series
Testing&quot;, &lt;a href=&quot;http://arxiv.org/abs/cs.IT/0602084&quot;&gt;cs.IT/0602084&lt;/a&gt;
		&lt;li&gt;&quot;Universal Codes as a Basis for Nonparametric Testing of
Serial Independence for Time Series&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cs.IT/0506094&quot;&gt;cs.IT/0506094&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;B. Ya. Ryabko and V. A. Monarev, &quot;Using information theory approach
to randomness
testing&quot;, &lt;a href=&quot;http://dx.doi.org/10.1016/j.jspi.2004.02.010&quot;&gt;&lt;cite&gt;Journal
of Statistical Planning and Inference&lt;/cite&gt; &lt;strong&gt;133&lt;/strong&gt; (2005);
95--110&lt;/a&gt;
	&lt;li&gt;Daniil Ryabko, &quot;Characterizing predictable classes of processes&quot;,
&lt;a href=&quot;http://arxiv.org/abs/0905.4341&quot;&gt;arxiv:0905.4341&lt;/a&gt;
	&lt;li&gt;Ines Samengo, &quot;Information loss in an optimal maximum likelihood 
decoding,&quot; &lt;a href=&quot;http://arxiv.org/abs/physics/0110074&quot;&gt;physics/0110074&lt;/a&gt; 
	&lt;li&gt;Thomas Sch&amp;uuml;rmann, Peter Grassberger, &quot;The predictability of letters in written English&quot;, &lt;cite&gt;Fractals&lt;/cite&gt; &lt;strong&gt;4&lt;/strong&gt; (1996): 1--5, &lt;a href=&quot;http://arxiv.org/abs/0710.4516&quot;&gt;arxiv:0710.4516&lt;/a&gt; [Shades of &lt;a href=&quot;../weblog/algae-2007-09.html#harris&quot;&gt;Zellig Harris&lt;/a&gt;]
	&lt;li&gt;Jacek Serafin, &quot;Finitary Codes, a short survey&quot;,
&lt;a href=&quot;http://arxiv.org/abs/math.DS/0608252&quot;&gt;math.DS/0608252&lt;/a&gt;
	&lt;li&gt;Gadiel Seroussi, &quot;On the number of t-ary trees with a given path
length&quot;, &lt;a href=&quot;http://arxiv.org/abs/cs.DM/0509046&quot;&gt;cs.DM/0509046&lt;/a&gt; [&quot; the
number of $t$-ary trees with path length $p$ estimates the number of universal
types, or, equivalently, the number of different possible Lempel-Ziv'78
dictionaries for sequences of length $p$ over an alphabet of size $t$.&quot;]
	&lt;li&gt;Ofer Shayevitz, &quot;A Note on a Characterization of R&amp;eacute;nyi Measures and its Relation to Composite Hypothesis Testing&quot;, &lt;a href=&quot;http://arxiv.org/abs/1012.4401&quot;&gt;arxiv:1012.4401&lt;/a&gt; [&quot;The R\'enyi information measures are characterized in terms of their Shannon counterparts, and properties of the former are recovered from first principle via the associated properties of the latter.&quot;]
	&lt;li&gt;Wojciech Slomczynski, &lt;cite&gt;Dynamical Entropy, Markov Operators, and
Iterated Function Systems&lt;/cite&gt; [Many thanks to Prof. Slomczynski for sending a
copy of his work]
	&lt;li&gt;Jeffrey E. Steif, &quot;Consistent estimation of joint distributions for sufficiently mixing random fields&quot;, &lt;a href=&quot;http://projecteuclid.org/euclid.aos/1034276630&quot;&gt;&lt;cite&gt;Annals of
Statistics&lt;/cite&gt; &lt;strong&gt;25&lt;/strong&gt; (1997): 293--304&lt;/a&gt; [Extension
of the Marton-Shields result to random fields in higher dimensions]
	&lt;li&gt;Alexander Stotland, Andrei A. Pomeransky, Eitan Bachmat and Doron 
Cohen, &quot;The information entropy of quantum mechanical states&quot;, &lt;a 
href=&quot;http://arxiv.org/abs/quant-ph/0401021&quot;&gt;quant-ph/0401021&lt;/a&gt; 
	&lt;li&gt;Rajesh Sundaresan, &quot;Guessing under source uncertainty&quot;,
&lt;a href=&quot;http://arxiv.org/abs/cs.IT/0603064&quot;&gt;cs.IT/0603064&lt;/a&gt;
	&lt;li&gt;Joe Suzuki, &quot;On Strong Consistency of Model Selection in
Classification&quot;, &lt;a href=&quot;http://dx.doi.org/10.1109/TIT.2006.883611&quot;&gt;&lt;cite&gt;IEEE
Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;52&lt;/strong&gt; (2006):
4767--4774&lt;/a&gt; [Based on information-theoretic criteria]
	&lt;li&gt;H. Takashashi, &quot;Redundancy of Universal Coding, Kolmogorov
Complexity, and Hausdorff Dimension&quot;, &lt;cite&gt;IEEE Transactions on Information
Theory&lt;/cite&gt; &lt;strong&gt;50&lt;/strong&gt; (2004): 2727--2736
[&lt;a href=&quot;http://dx.doi.org/10.1109/TIT.2004.836663&quot;&gt;&lt;/a&gt;]
	&lt;li&gt;Vincent Y. F. Tan, Animashree Anandkumar, Lang Tong and Alan
S. Willsky, &quot;A Large-Deviation Analysis of the Maximum-Likelihood Learning of
Markov Tree
Structures&quot;, &lt;a href=&quot;http://dx.doi.org/10.1109/TIT.2011.2104513&quot;&gt;&lt;cite&gt;IEEE Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;57&lt;/strong&gt; (2011): 1714--1735&lt;/a&gt;, &lt;a href=&quot;http://arxiv.org/abs/0905.0940&quot;&gt;arxiv:0905.0940&lt;/a&gt;
[Large deviations for Chow-Liu trees]
	&lt;li&gt;Inder Jeet Taneja
		&lt;ul&gt;
		&lt;li&gt;&lt;cite&gt;Generalized Information Measures and Their
Applications&lt;/cite&gt; [Full text &lt;a
href=&quot;http://www.mtm.ufsc.br/~taneja/book/book.html&quot;&gt;free online&lt;/a&gt;]
		&lt;li&gt;&quot;Inequalities Among Symmetric divergence
Measures and Their Refinement&quot;, &lt;a
href=&quot;http://arxiv.org/abs/math.ST/0501303&quot;&gt;math.ST/0501303&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;R. Timo, K. Blackmore and L. Hanlen, &quot;Word-Valued Sources: An Ergodic Theorem, an AEP, and the Conservation of Entropy&quot;, &lt;a href=&quot;http://dx.doi.org/10.1109/TIT.2010.2046251&quot;&gt;&lt;cite&gt;IEEE Transactions
on Information Theory&lt;/cite&gt; &lt;strong&gt;56&lt;/strong&gt; (2010): 3139--3148&lt;/a&gt;
	&lt;li&gt;C. G. Timpson, &quot;On the Supposed Conceptual Inadequacy of the 
Shannon Information,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/quant-ph/0112178&quot;&gt;quant-ph/0112178&lt;/a&gt; 
	&lt;li&gt;Pierre Tisseur, &quot;A bilateral version of the
Shannon-McMillan-Breiman Theorem&quot;, &lt;a
href=&quot;http://arxiv.org/abs/math.DS/0312125&quot;&gt;math.DS/0312125&lt;/a&gt;
	&lt;li&gt;Gasper Tkacik, &quot;From statistical mechanics to information theory: understanding biophysical information-processing systems&quot;, &lt;a href=&quot;http://arxiv.org/abs/1006.4291&quot;&gt;arxiv:1006.4291&lt;/a&gt;
	&lt;li&gt;Tim van Erven and Peter Harremoes, &quot;Renyi Divergence and Its Properties&quot;, &lt;a href=&quot;http://arxiv.org/abs/1001.4448&quot;&gt;arxiv:1001.4448&lt;/a&gt;
	&lt;li&gt;Marc M. Van Hulle, &quot;Edgeworth Approximation of Multivariate
Differential
Entropy&quot;, &lt;a
href=&quot;http://neco.mitpress.org/cgi/content/abstract/17/9/1903&quot;&gt;&lt;cite&gt;Neural
Computation&lt;/cite&gt; &lt;strong&gt;17&lt;/strong&gt; (2005): 1903--1910&lt;/a&gt;
	&lt;li&gt;Nikolai Vereshchagin and Paul Vitanyi, &quot;Kolmogorov's Structure 
Functions with an Application to the Foundations of Model Selection,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/cs.CC/0204037&quot;&gt;cs.CC/0204037&lt;/a&gt; 
	&lt;li&gt;Nguyen Xuan Vinh, Julien Epps, James Bailey, &quot;Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance&quot;, &lt;a href=&quot;http://jmlr.csail.mit.edu/papers/v11/vinh10a.html&quot;&gt;&lt;cite&gt;Journal of Machine Learning Research&lt;/cite&gt; &lt;strong&gt;11&lt;/strong&gt; (2010): 2837--2854&lt;/a&gt;
	&lt;li&gt;Vladimir V'yugin, &quot;On Instability of the Ergodic Limit Theorems with Respect to Small Violations of Algorithmic Randomness&quot;, &lt;a href=&quot;http://arxiv.org/abs/1105.4274&quot;&gt;arxiv:1105.4274&lt;/a&gt;
	&lt;li&gt;Bin Wang, &quot;&lt;em&gt;Minimum&lt;/em&gt; Entropy Approach to Word Segmentation 
Problems,&quot; &lt;a href=&quot;http://arxiv.org/abs/physics/0008232&quot;&gt;physics/0008232&lt;/a&gt; 
	&lt;li&gt;Q. Wang, S. R. Kulkarni, and S. Verdu, &quot;Divergence Estimation of
Continuous Distributions Based on Data-Dependent Partitions&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1109/TIT.2005.853314&quot;&gt;&lt;cite&gt;IEEE Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;51&lt;/strong&gt;
(2005): 3064--3074&lt;/a&gt; [Sounds cool]
	&lt;li&gt;Watanabe, &lt;cite&gt;Knowing and Guessing&lt;/cite&gt; 
	&lt;li&gt;Edward D. Weinberger
		&lt;ul&gt;
		&lt;li&gt;&quot;A Theory of Pragmatic Information and Its 
Application to the Quasispecies Model of Biological Evolution,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/nlin.AO/0105030&quot;&gt;nlin.AO/0105030&lt;/a&gt; 
		&lt;li&gt;&quot;A Generalization of the Shannon-McMillan-Breiman Theorem and the Kelly Criterion Leading to a Definition of Pragmatic Information&quot;, &lt;a href=&quot;http://arxiv.org/abs/0903.2243&quot;&gt;arxiv:0903.2243&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;T. Weissman and N. Merhav, &quot;On Causal Source Codes With Side
Information&quot;, &lt;a href=&quot;http://dx.doi.org/10.1109/TIT.2005.856978&quot;&gt;&lt;cite&gt;IEEE
Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;51&lt;/strong&gt; (2005):
4003--4013&lt;/a&gt;
	&lt;li&gt;Paul L. Williams and Randall D. Beer, &quot;Nonnegative Decomposition
of Multivariate Information&quot;, &lt;a href=&quot;http://arxiv.org/abs/1004.2515&quot;&gt;arxiv:1004.2515&lt;/a&gt;
	&lt;li&gt;S. Yang, A. Kavcic and S. Tatikonda, &quot;Feedback Capacity of
Finite-State Machine Channels&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1109/TIT.2004.842626&quot;&gt;&lt;cite&gt;IEEE Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;51&lt;/strong&gt;
(2005): 799--810&lt;/a&gt;
	&lt;li&gt;Jiming Yu and Sergio Verdu, &quot;Schemes for Bidirectional Modeling of
Discrete Stationary
Sources&quot;, &lt;a href=&quot;http://dx.doi.org/10.1109/TIT.2006.883626&quot;&gt;&lt;cite&gt;IEEE
Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;52&lt;/strong&gt; (2006):
4789--4807&lt;/a&gt;
	&lt;li&gt;Tong Zhang, &quot;Information-Theoretic Upper and Lower Bounds for
Statistical
Estimation&quot;, &lt;a href=&quot;http://dx.doi.org/10.1109/TIT.2005.864439&quot;&gt;&lt;cite&gt;IEEE
Transactions on Information Theory&lt;/citE&gt;
&lt;strong&gt;52&lt;/strong&gt; (2006): 1307--1321&lt;/a&gt;
	&lt;li&gt;Jacob Ziv, &quot;A Universal Prediction Lemma and Applications to
Universal Data Compression and Prediction&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1109/18.923732&quot;&gt;&lt;cite&gt;IEEE Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;47&lt;/strong&gt;
(2001): 1528--1532&lt;/a&gt;
	&lt;/ul&gt; 

&lt;uL&gt;Things I'm very skeptical of, but should read before dismissing:
	&lt;li&gt;P. Allegrini, V. Benci, P. Grigolini, P. Hamilton, M. Ignaccolo, 
G. Menconi, L. Palatella, G. Raffaelli, N. Scafetta, M. Virgilio and J. Jang, 
&quot;Compression and diffusion: a joint approach to detect complexity,&quot; 
&lt;a href=&quot;http://arxiv.org/abs/cond-mat/0202123&quot;&gt;cond-mat/0202123&lt;/a&gt; 
	&lt;li&gt;Andrea Baronchelli, Emanuele Caglioti and Vittorio Loreto,
&quot;Artificial sequences and complexity measures&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1088/1742-5468/2005/04/P04002&quot;&gt;&lt;cite&gt;Journal of
Statistical Mechanics: Theory and Experiment&lt;/cite&gt; (2005): P04002&lt;/a&gt;
	&lt;li&gt;Hong-Da Chen, Chang-Heng Chang, Li-Ching Hsieh, and Hoong-Chien
Lee, &quot;Divergence and Shannon Information in Genomes&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1103/PhysRevLett.94.178103 &quot;&gt;&lt;cite&gt;Physical Review
Letters&lt;/cite&gt; &lt;strong&gt;94&lt;/strong&gt; (2005): 178103&lt;/a&gt;
	&lt;li&gt;P. A. Varotsos, N. V. Sarlis, E. S. Skordas and M. S. Lazaridou
		&lt;ul&gt;
		&lt;li&gt;&quot;Entropy in the natural time-domain&quot;, &lt;a
href=&quot;http://arxiv.org/abs/physics/0501117&quot;&gt;physics/0501117&lt;/a&gt;
= &lt;citE&gt;Physical Review E&lt;/cite&gt; &lt;strong&gt;70&lt;/strong&gt; (2004): 011106
		&lt;li&gt;&quot;Natural entropy fluctuations discriminate similar looking
electric signals emitted from systems of different dynamics&quot;, &lt;a
href=&quot;http://arxiv.org/abs/physics/0501118&quot;&gt;physics/0501118&lt;/a&gt;
= &lt;cite&gt;Physical Review E&lt;/cite&gt; &lt;strong&gt;71&lt;/strong&gt; (2005)
		&lt;/ul&gt;
	&lt;li&gt;David H. Wolpert, &quot;Information Theory - The Bridge Connecting 
Bounded Rational Game Theory and Statistical Physics&quot;, &lt;a 
href=&quot;http://arxiv.org/abs/cond-mat/0402508&quot;&gt;cond-mat/0402508&lt;/a&gt; [Frankly if 
it were anyone other than David saying such stuff, I wouldn't even bother to 
read it.] 
	&lt;/ul&gt;

&lt;ul&gt;To write: 
	&lt;li&gt;CRS, &quot;State Reconstruction and Source Coding&quot; 
	&lt;li&gt;CRS, &quot;Typical Measures of Complexity Grow Like Shannon Entropy&quot; 
	&lt;/ul&gt; 

&lt;hr&gt;

&lt;em&gt;Previous versions&lt;/em&gt;: 2005-11-09 15:33 (but first version was some time in the 1990s --- 1998?)
</description>
  </item>
  </channel>
</rss>
