<?xml version="1.0"?>
<!-- name="generator" content="blosxom/2.0" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
  <channel>
    <title>Notebooks   </title>
    <link>http://bactra.org/notebooks</link>
    <description>Cosma's Notebooks</description>
    <language>en</language>

  <item>
    <title>Information Geometry</title>
    <link>http://bactra.org/notebooks/2011/12/20#info-geo</link>
    <description>
&lt;P&gt;This a slightly misleading name for applying &lt;a
href=&quot;../reviews/geometrical-methods/&quot;&gt;differential geometry&lt;/a&gt; to families of
&lt;a href=&quot;probability.html&quot;&gt;probability distributions&lt;/a&gt;, and so to &lt;a
href=&quot;statistics.html&quot;&gt;statistical models&lt;/a&gt;.  Information does however play
two roles in it: Kullback-Leibler information, or &lt;a
href=&quot;information-theory.html&quot;&gt;relative entropy&lt;/a&gt;, features as a measure of
divergence (not quite a metric, because it's asymmetric), and Fisher
information takes the role of curvature.  One very nice thing about information
geometry is that it gives us very strong tools for proving results about
statistical models, simply by considering them as well-behaved geometrical
objects.  Thus, for instance, it's basically a tautology to say that a manifold
is not changing much in the vicinity of points of low curvature, and changing
greatly near points of high curvature.  Stated more precisely, and then
translated back into probabilistic language, this becomes the Cramer-Rao
inequality, that the variance of a parameter estimator is at least the
reciprocal of the Fisher information.  As someone who likes differential
geometry, and now is interested in statistics, I find this very pleasing.

&lt;P&gt;As a physicist, I have always been somewhat bothered by the way
statisticians seem to accept &lt;em&gt;particular&lt;/em&gt; parametrizations of their
models as obvious and natural, and build those parameterization into their
procedures.  In linear regression, for instance, it's reasonably common for
them to want to find models with only a few non-zero coefficients.  This makes
my thumbs prick, because it seems to me obvious that if I regressed on
arbitrary linear combinations of my covariates, I have exactly the same
information (provided the transformation is invertible), and so I'm really
looking at exactly the same model --- but in general I'm &lt;em&gt;not&lt;/em&gt; going to
have a small number of non-zero coefficients any more.  In other words, I want
to be able to do &lt;em&gt;coordinate-free&lt;/em&gt; statistics.  Since differential
geometry lets me do coordinate-free physics, information geometry seems like an
appealing way to do this.  There are various information-geometric model
selection criteria, which I want to know more about; I suspect, based purely on
this disciplinary prejudice, that they will out-perform coordinate-dependent
criteria.

&lt;P&gt;I should also mention that &lt;a href=&quot;stat-mech.html&quot;&gt;statistical physics&lt;/a&gt;,
while it does no actual &lt;em&gt;statistics&lt;/em&gt;, is also very much concerned with
probability distributions.  Sun-Ichi Amari, who is the leader of a large and
impressive Japanese school of information-geometers, has a nice result (in,
e.g., his &quot;Hierarchy of Probability Distributions&quot; paper) showing
that &lt;a href=&quot;max-ent.html&quot;&gt;maximum entropy distributions&lt;/a&gt; are, exactly, the
ones with minimal interaction between their variables --- the ones which
approach most closely to independence.  I think this throws a very interesting
new light on the issue of &lt;em&gt;why&lt;/em&gt; we can assume equilibrium corresponds to
a state of &lt;a href=&quot;max-ent.html&quot;&gt;maximum entropy&lt;/a&gt; (&lt;em&gt;pace&lt;/em&gt; Jaynes,
assuming independence is clearly not an innocent way of saying &quot;I really don't
know anything more&quot;).  I also see, via the Arxiv, that people are starting to
think about phase transitions in information-geometric terms, which seems
natural in retrospect, though I can't comment further, not having read the
papers.

&lt;P&gt;See also:
	&lt;a href=&quot;exponential-families.html&quot;&gt;Exponential Families of Probability Measures&lt;/a&gt;, where the geometry is especially nice;
	&lt;a href=&quot;filtering.html&quot;&gt;Filtering and State Estimation&lt;/a&gt; for
some papers on differential-geometric ideas in statistical state estimation and
signal processing;
	&lt;a href=&quot;partial-identification.html&quot;&gt;Partial Identification
of Parametric Statistical Models&lt;/a&gt;

&lt;ul&gt;Recommended, big picture:
	&lt;li&gt;S.-I. Amari, O. E. Barndorff-Nielsen, R. E. Kass, S. L. Lauritzen,
and C. R. Rao, &lt;cite&gt;Differential Geometry in Statistical
Inference&lt;/cite&gt; [Now &lt;a href=&quot;http://projecteuclid.org/euclid.lnms/1215467056&quot;&gt;free online&lt;/a&gt;]
	&lt;li&gt;Sun-Ichi Amari and Hiroshi Nagaoka, &lt;cite&gt;Methods of Information
Geometry&lt;/cite&gt;
	&lt;li&gt;&lt;a href=&quot;http://www.stat.cmu.edu/~kass/&quot;&gt;Robert E. Kass&lt;/a&gt; and Paul W. Vos, &lt;cite&gt;Geometrical Foundations of
Asymptotic Inference&lt;/cite&gt;
	&lt;li&gt;Rudolf Kulhav&amp;yacute;, &lt;cite&gt;Recursive Nonlinear Estimation: A
Geometric Approach&lt;/cite&gt;
	&lt;/ul&gt;

&lt;ul&gt;Recommended, close-ups:
	&lt;li&gt;Sun-Ichi Amari, &quot;Information Geometry on Hierarchy of Probability
Distributions&quot;, &lt;cite&gt;IEEE Transacttions on Information
Theory&lt;/cite&gt; &lt;strong&gt;47&lt;/strong&gt; (2001): 1701--1711
[&lt;a
href=&quot;http://people.csail.mit.edu/jrennie/trg/papers/amari-ig-hierarchy-01.pdf&quot;&gt;PDF
reprint&lt;/a&gt;]
	&lt;li&gt;Vijay Balasubramanian, &quot;Statistical Inference, Occam's Razor, and
Statistical Mechanics on the Space of Probability Distributions&quot;,
&lt;cite&gt;Neural Computation&lt;/cite&gt; &lt;strong&gt;9&lt;/strong&gt; (1997): 349--368
	&lt;li&gt;Hwan-sik Choi and Nicholas M. Kiefer, &quot;Differential Geometry and
Bias Correction in Nonnested Hypothesis Testing&quot;
[&lt;a href=&quot;http://www.arts.cornell.edu/econ/kiefer/GeometryMS6.pdf&quot;&gt;PDF preprint
via Kiefer&lt;/a&gt;]
	&lt;li&gt;Tommi S. Jaakkola and David Haussler, &quot;Exploiting generative models
in discriminative classifiers&quot;, &lt;cite&gt;NIPS 11&lt;/cite&gt; (1998)
[&lt;a href=&quot;http://books.nips.cc/papers/files/nips11/0487.pdf&quot;&gt;PDF&lt;/a&gt;]
	&lt;li&gt;I. J. Myung, Vijay Balasubramanian and M. A. Pitt, &quot;Counting
probability distributions: Differential geometry and model selection&quot;,
&lt;a
href=&quot;http://dx.doi.org/10.1073/pnas.170283897&quot;&gt;&lt;cite&gt;Proceedings of the National Academy of Sciences&lt;/cite&gt; (USA) &lt;strong&gt;97&lt;/strong&gt;
(2000): 11170--11175&lt;/a&gt;
	&lt;/ul&gt;

&lt;ul&gt;To read:
	&lt;li&gt;Khadiga Arwini and C. T. J. Dodson, &quot;Neighborhoods of Independence
for Random Processes&quot;, &lt;a
href=&quot;http://arxiv.org/abs/math.DG/0311087&quot;&gt;math.DG/0311087&lt;/a&gt;
	&lt;li&gt;Nihat Ay
		&lt;ul&gt;
		&lt;li&gt;&quot;Information geometry on complexity and stochastic
interaction&quot; [&lt;a
href=&quot;http://www.mis.mpg.de/preprints/2001/prepr9501-abstr.html&quot;&gt;preprint&lt;/a&gt;]
		&lt;li&gt;&quot;An information-geometric approach to a theory of pragmatic
structuring&quot; [&lt;a
href=&quot;http://www.mis.mpg.de/preprints/2000/prepr5200-abstr.html&quot;&gt;preprint&lt;/a&gt;]
		&lt;/ul&gt;
	&lt;li&gt;O. E. Barndorff-Nielsen and Richard D. Gill, &quot;Fisher Information in
Quantum Statistics&quot;, &lt;a
href=&quot;http://arxiv.org/abs/quant-ph/9808009&quot;&gt;quant-ph/9808009&lt;/a&gt;
	&lt;li&gt;Damiano Brigo, &quot;The direct L2 geometric structure on a manifold of probability densities with applications to Filtering&quot;, &lt;a href=&quot;http://arxiv.org/abs/1111.6801&quot;&gt;arxiv:1111.6801&lt;/a&gt;
	&lt;li&gt;Xavier Calmet and Jacques Calmet, &quot;Dynamics of the Fisher
Information Metric&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cond-mat/0410452&quot;&gt;cond-mat/0410452&lt;/a&gt; = &lt;a
href=&quot;http://dx.doi.org/10.1103/PhysRevE.71.056109&quot;&gt;&lt;cite&gt;Physical Review
E&lt;/cite&gt; &lt;strong&gt;71&lt;/strong&gt; (2005): 056109&lt;/a&gt;
	&lt;li&gt;Gavin E. Crooks, &quot;Measuring Thermodynamic Length&quot;,
&lt;a href=&quot;http://dx.doi.org/10.1103/PhysRevLett.99.100602&quot;&gt;&lt;cite&gt;Physical Review
Letters&lt;/cite&gt; &lt;strong&gt;99&lt;/strong&gt; (2007): 100602&lt;/a&gt; [&quot;Thermodynamic length is
a metric distance between equilibrium thermodynamic states. Among other
interesting properties, this metric asymptotically bounds the dissipation
induced by a finite time transformation of a thermodynamic system. It is also
connected to the Jensen-Shannon divergence, Fisher information, and Rao's
entropy differential metric.&quot;]
	&lt;li&gt;Imre Csiszar and Frantisek Matus, &quot;Closures of exponential
families&quot;, &lt;a
href=&quot;http://dx.doi.org/10%2E1214/009117904000000766&quot;&gt;&lt;cite&gt;Annals of
Probability&lt;/cite&gt; &lt;strong&gt;33&lt;/strong&gt; (2005): 582--600&lt;/a&gt; = &lt;a
href=&quot;http://arxiv.org/abs/math.PR/0503653&quot;&gt;math.PR/0503653&lt;/a&gt;
	&lt;li&gt;C. T. J. Dodson and H. Wang, &quot;Iterative Approximation of
Statistical Distributions and Relation to Information Geometry&quot;,
&lt;a href=&quot;http://dx.doi.org/10.1023/A:1012289028897&quot;&gt;&lt;cite&gt;Statistical Inference
for Stochastic Processes&lt;/cite&gt; &lt;strong&gt;4&lt;/strong&gt; (2001): 307--318&lt;/a&gt; [&quot;the
optimal control of stochastic processes through sensor estimation of
probability density functions is given a geometric setting via information
theory and the information metric.&quot;]
	&lt;li&gt;Tryphon T. Georgiou, &quot;An intrinsic metric for power spectral
density
functions&quot;, &lt;a href=&quot;http://arxiv.org/abs/math.PR/0608486&quot;&gt;math.PR/0608486&lt;/a&gt;
[Leads to a Riemannian geometry on stochastic processes, apparently...]
	&lt;li&gt;Paolo Gibilisco and Tommaso Isola, &quot;Uncertainty Principle and
Quantum Fisher Information&quot;, &lt;a
href=&quot;http://arxiv.org/abs/math-ph/0509046&quot;&gt;math-ph/0509046&lt;/a&gt;
	&lt;li&gt;Paolo Gibilisco, Daniele Imparato and Tommaso Isola,
&quot;Uncertainty Principle and Quantum Fisher Information II&quot; &lt;a href=&quot;http://arxiv.org/abs/math-ph/0701062&quot;&gt;math-ph/0701062&lt;/a&gt;
	&lt;li&gt;Kazushi Ikeda, &quot;Information Geometry of Interspike Intervals in
Spiking Neurons&quot;, &lt;a
href=&quot;http://neco.mitpress.org/cgi/content/abstract/17/12/2719&quot;&gt;&lt;cite&gt;Neural
Computation&lt;/cite&gt; &lt;strong&gt;17&lt;/strong&gt; (2005): 2719--2735&lt;/a&gt;
	&lt;li&gt;Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, &quot;Stochastic
Reasoning, Free Energy, and Information Geometry&quot;, &lt;a
href=&quot;http://neco.mitpress.org/cgi/content/abstract/16/9/1779&quot;&gt;&lt;cite&gt;Neural
Computation&lt;/cite&gt; &lt;strong&gt;16&lt;/strong&gt; (2004): 1779--1810&lt;/a&gt;
	&lt;li&gt;W. Janke, D.A. Johnston and R. Kenna, &quot;Information Geometry and
Phase Transitions&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cond-mat/0401092&quot;&gt;cond-mat/0401092&lt;/a&gt;
= &lt;cite&gt;Physica A&lt;/cite&gt; &lt;strong&gt;336&lt;/strong&gt; (2004): 181--186
	&lt;li&gt;G. Lebanon, &quot;Axiomatic Geometry of Conditional Models&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1109/TIT.2005.844060&quot;&gt;&lt;cite&gt;IEEE Transactions on
Information Theory&lt;/citE&gt; &lt;strong&gt;51&lt;/strong&gt; (2005): 1283--1294&lt;/a&gt;
	&lt;li&gt;M. K. Murray and J. W. Rice, &lt;cite&gt;Differential Geometry and
Statistics &lt;/cite&gt; [Thanks to &lt;a href=&quot;http://www.ergodicity.net/&quot;&gt;Anand
Sarwate&lt;/a&gt; for the recommendation]
	&lt;li&gt;Hiroyuki Nakahara and Shun-ichi Amari, &quot;Information-Geometric
Measure for Neural Spikes&quot;, &lt;cite&gt;Neural Computation&lt;/cite&gt; &lt;strong&gt;14&lt;/strong&gt;
(2002): 2269--2316
	&lt;li&gt;Frank Nielsen, &quot;Chernoff information of exponential families&quot;,
&lt;a href=&quot;http://arxiv.org/abs/1102.2684&quot;&gt;arxiv:1102.2684&lt;/a&gt;
	&lt;li&gt;J. Pletonen and S. Kaski, &quot;Discriminative Components of Data&quot;,
&lt;cite&gt;IEEE Transactions on Neural Networks&lt;/cite&gt; &lt;strong&gt;16&lt;/strong&gt; (2005):
68--83
	&lt;li&gt;Steven T. Smith, &quot;Covariance, Subspace, and Intrinsic Cramer-Rao
Bounds&quot;, &lt;a href=&quot;http://dx.doi.org/10.1109/TSP.2005.845428&quot;&gt;&lt;cite&gt;IEEE
Transactions on Signal Processing&lt;/cite&gt; &lt;strong&gt;53&lt;/strong&gt; (2005):
1610--1630&lt;/a&gt; [Thanks to Dr. Smith for a reprint]
	&lt;li&gt;R. F. Streater, &quot;Quantum Orlicz spaces in information geometry&quot;,
&lt;a href=&quot;http://arxiv.org/abs/math-ph/0407046&quot;&gt;math-ph/0407046&lt;/a&gt;
	&lt;li&gt;Masanobu Taniguchi and Yoshihide Kakizawa, &lt;cite&gt;Asymptotic Theory
of Statistical Inference for Time Series&lt;/cite&gt; [The first few chapters are
quite nice, but I haven't gotten to the parts where they actually use much
information geometry]
	&lt;li&gt;Marc Toussaint, &quot;Notes on information geometry and evolutionary
processes&quot;, &lt;a href=&quot;http://arxiv.org/abs/nlin.AO/0408040&quot;&gt;nlin.AO/0408040&lt;/a&gt;
	&lt;li&gt;Mark K. Transtrum, Benjamin B. Machta, James P. Sethna, &quot;The
geometry of nonlinear least squares with applications to sloppy models and
optimization&quot;, &lt;a href=&quot;http://arxiv.org/abs/1010.1449&quot;&gt;arxiv:1010.1449&lt;/a&gt;
[From the abstract, this sounds like a rediscovery of Amari's 1967 paper, but
Sethna is someone who usually know what he's doing so I reserve judgement]
	&lt;li&gt;Paolo Zanardi, Paolo Giorda, and Marco Cozzini,
&quot;Information-Theoretic Differential Geometry of Quantum Phase
Transitions&quot;, &lt;a
href=&quot;http//dx.doi.org/10.1103/PhysRevLett.99.100603&quot;&gt;&lt;cite&gt;Physical Review
Letters&lt;/cite&gt; &lt;strong&gt;99&lt;/strong&gt; (2007): 100603&lt;/a&gt;
	&lt;/ul&gt;
</description>
  </item>
  </channel>
</rss>
