<?xml version="1.0"?>
<!-- name="generator" content="blosxom/2.0" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
  <channel>
    <title>Notebooks   </title>
    <link>http://bactra.org/notebooks</link>
    <description>Cosma's Notebooks</description>
    <language>en</language>

  <item>
    <title>Information Geometry</title>
    <link>http://bactra.org/notebooks/2009/04/10#info-geo</link>
    <description>
&lt;P&gt;This a slightly misleading name for applying &lt;a
href=&quot;../reviews/geometrical-methods/&quot;&gt;differential geometry&lt;/a&gt; to families of
&lt;a href=&quot;probability.html&quot;&gt;probability distributions&lt;/a&gt;, and so to &lt;a
href=&quot;statistics.html&quot;&gt;statistical models&lt;/a&gt;.  Information does however play
two roles in it: Kullback-Leibler information, or &lt;a
href=&quot;information-theory.html&quot;&gt;relative entropy&lt;/a&gt;, features as a measure of
divergence (not quite a metric, because it's asymmetric), and Fisher
information takes the role of curvature.  One very nice thing about information
geometry is that it gives us very strong tools for proving results about
statistical models, simply by considering them as well-behaved geometrical
objects.  Thus, for instance, it's basically a tautology to say that a manifold
is not changing much in the vicinity of points of low curvature, and changing
greatly near points of high curvature.  Stated more precisely, and then
translated back into probabilistic language, this becomes the Cramer-Rao
inequality, that the variance of a parameter estimator is at least the
reciprocal of the Fisher information.  As someone who likes differential
geometry, and now is interested in statistics, I find this very pleasing.

&lt;P&gt;As a physicist, I have always been somewhat bothered by the way
statisticians seem to accept &lt;em&gt;particular&lt;/em&gt; parametrizations of their
models as obvious and natural, and build those parameterization into their
procedures.  In linear regression, for instance, it's reasonably common for
them to want to find models with only a few non-zero coefficients.  This makes
my thumbs prick, because it seems to me obvious that if I regressed on
arbitrary linear combinations of my covariates, I have exactly the same
information (provided the transformation is invertible), and so I'm really
looking at exactly the same model --- but in general I'm &lt;em&gt;not&lt;/em&gt; going to
have a small number of non-zero coefficients any more.  In other words, I want
to be able to do &lt;em&gt;coordinate-free&lt;/em&gt; statistics.  Since differential
geometry lets me do coordinate-free physics, information geometry seems like an
appealing way to do this.  There are various information-geometric model
selection criteria, which I want to know more about; I suspect, based purely on
this disciplinary prejudice, that they will out-perform coordinate-dependent
criteria.

&lt;P&gt;I should also mention that &lt;a href=&quot;stat-mech.html&quot;&gt;statistical physics&lt;/a&gt;,
while it does no actual &lt;em&gt;statistics&lt;/em&gt;, is also very much concerned with
probability distributions.  Sun-Ichi Amari, who is the leader of a very large
and impressive Japanese school of information-geometers, has a nice result (in,
e.g., his &quot;Hierarchy of Probability Distributions&quot; paper) showing that maximum
entropy distributions are, exactly, the ones with minimal interaction between
their variables --- the ones which approach most closely to independence.  I
think this throws a very interesting new light on the issue of &lt;em&gt;why&lt;/em&gt; we
can assume equilibrium corresponds to a state of maximum entropy (&lt;em&gt;pace&lt;/em&gt;
Jaynes, assuming independence is clearly not an innocent way of saying &quot;I
really don't know anything more&quot;).  I also see, via the Arxiv, that people are
starting to think about phase transitions in information-geometric terms, which
seems natural in retrospect, though I can't comment further, not having read
the papers.

&lt;P&gt;See also:
	&lt;a href=&quot;filtering.html&quot;&gt;Filtering and State Estimation&lt;/a&gt; for
some papers on differential-geometric ideas in statistical state estimation and
signal processing;
	&lt;a href=&quot;partial-identification.html&quot;&gt;Partial Identification
of Parametric Statistical Models&lt;/a&gt;

&lt;ul&gt;Recommended, general:
	&lt;li&gt;Sun-Ichi Amari, &quot;Information Geometry on Hierarchy of Probability
Distributions&quot;, &lt;cite&gt;IEEE Transacttions on Information
Theory&lt;/cite&gt; &lt;strong&gt;47&lt;/strong&gt; (2001): 1701--1711
[&lt;a
href=&quot;http://people.csail.mit.edu/jrennie/trg/papers/amari-ig-hierarchy-01.pdf&quot;&gt;PDF
reprint&lt;/a&gt;]
	&lt;li&gt;Sun-Ichi Amari and Hiroshi Nagaoka, &lt;cite&gt;Methods of Information
Geometry&lt;/cite&gt;
	&lt;li&gt;Rudolf Kulhav&amp;yacute; &lt;cite&gt;Recursive Nonlinear Estimation: A
Geometric Approach&lt;/cite&gt;
	&lt;/ul&gt;

&lt;ul&gt;Recommended, special topics:
	&lt;li&gt;Hwan-sik Choi and Nicholas M. Kiefer, &quot;Differential Geometry and
Bias Correction in Nonnested Hypothesis Testing&quot;
[&lt;a href=&quot;http://www.arts.cornell.edu/econ/kiefer/GeometryMS6.pdf&quot;&gt;PDF preprint
via Kiefer&lt;/a&gt;]
	&lt;li&gt;Tommi S. Jaakkola and David Haussler, &quot;Exploiting generative models
in discriminative classifiers&quot;, &lt;cite&gt;NIPS 11&lt;/cite&gt; (1998)
[&lt;a href=&quot;http://books.nips.cc/papers/files/nips11/0487.pdf&quot;&gt;PDF&lt;/a&gt;]
	&lt;/ul&gt;

	&lt;ul&gt;To read:
	&lt;li&gt;Khadiga Arwini and C. T. J. Dodson, &quot;Neighborhoods of Independence
for Random Processes&quot;, &lt;a
href=&quot;http://arxiv.org/abs/math.DG/0311087&quot;&gt;math.DG/0311087&lt;/a&gt;
	&lt;li&gt;Nihat Ay
		&lt;ul&gt;
		&lt;li&gt;&quot;Information geometry on complexity and stochastic
interaction&quot; [&lt;a
href=&quot;http://www.mis.mpg.de/preprints/2001/prepr9501-abstr.html&quot;&gt;preprint&lt;/a&gt;]
		&lt;li&gt;&quot;An information-geometric approach to a theory of pragmatic
structuring&quot; [&lt;a
href=&quot;http://www.mis.mpg.de/preprints/2000/prepr5200-abstr.html&quot;&gt;preprint&lt;/a&gt;]
		&lt;/ul&gt;
	&lt;li&gt;Vijay Balasubramanian, &quot;Statistical Inference, Occam's Razor, and
Statistical Mechanics on the Space of Probability Distributions&quot;,
&lt;cite&gt;Neural Computation&lt;/cite&gt; &lt;strong&gt;9&lt;/strong&gt; (1997): 349--368
	&lt;li&gt;O. E. Barndorff-Nielsen and Richard D. Gill, &quot;Fisher Information in
Quantum Statistics&quot;, &lt;a
href=&quot;http://arxiv.org/abs/quant-ph/9808009&quot;&gt;quant-ph/9808009&lt;/a&gt;
	&lt;li&gt;Xavier Calmet and Jacques Calmet, &quot;Dynamics of the Fisher
Information Metric&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cond-mat/0410452&quot;&gt;cond-mat/0410452&lt;/a&gt; = &lt;a
href=&quot;http://dx.doi.org/10.1103/PhysRevE.71.056109&quot;&gt;&lt;cite&gt;Physical Review
E&lt;/cite&gt; &lt;strong&gt;71&lt;/strong&gt; (2005): 056109&lt;/a&gt;
	&lt;li&gt;Gavin E. Crooks, &quot;Measuring Thermodynamic Length&quot;,
&lt;a href=&quot;http://dx.doi.org/10.1103/PhysRevLett.99.100602&quot;&gt;&lt;cite&gt;Physical Review
Letters&lt;/cite&gt; &lt;strong&gt;99&lt;/strong&gt; (2007): 100602&lt;/a&gt; [&quot;Thermodynamic length is
a metric distance between equilibrium thermodynamic states. Among other
interesting properties, this metric asymptotically bounds the dissipation
induced by a finite time transformation of a thermodynamic system. It is also
connected to the Jensen-Shannon divergence, Fisher information, and Rao's
entropy differential metric.&quot;]
	&lt;li&gt;Imre Csiszar and Frantisek Matus, &quot;Closures of exponential
families&quot;, &lt;a
href=&quot;http://dx.doi.org/10%2E1214/009117904000000766&quot;&gt;&lt;cite&gt;Annals of
Probability&lt;/cite&gt; &lt;strong&gt;33&lt;/strong&gt; (2005): 582--600&lt;/a&gt; = &lt;a
href=&quot;http://arxiv.org/abs/math.PR/0503653&quot;&gt;math.PR/0503653&lt;/a&gt;
	&lt;li&gt;C. T. J. Dodson and H. Wang, &quot;Iterative Approximation of
Statistical Distributions and Relation to Information Geometry&quot;,
&lt;a href=&quot;http://dx.doi.org/10.1023/A:1012289028897&quot;&gt;&lt;cite&gt;Statistical Inference
for Stochastic Processes&lt;/cite&gt; &lt;strong&gt;4&lt;/strong&gt; (2001): 307--318&lt;/a&gt; [&quot;the
optimal control of stochastic processes through sensor estimation of
probability density functions is given a geometric setting via information
theory and the information metric.&quot;]
	&lt;li&gt;Tryphon T. Georgiou, &quot;An intrinsic metric for power spectral
density
functions&quot;, &lt;a href=&quot;http://arxiv.org/abs/math.PR/0608486&quot;&gt;math.PR/0608486&lt;/a&gt;
[Leads to a Riemannian geometry on stochastic processes, apparently...]
	&lt;li&gt;Paolo Gibilisco and Tommaso Isola, &quot;Uncertainty Principle and
Quantum Fisher Information&quot;, &lt;a
href=&quot;http://arxiv.org/abs/math-ph/0509046&quot;&gt;math-ph/0509046&lt;/a&gt;
	&lt;li&gt;Paolo Gibilisco, Daniele Imparato and Tommaso Isola,
&quot;Uncertainty Principle and Quantum Fisher Information II&quot; &lt;a href=&quot;http://arxiv.org/abs/math-ph/0701062&quot;&gt;math-ph/0701062&lt;/a&gt;
	&lt;li&gt;Kazushi Ikeda, &quot;Information Geometry of Interspike Intervals in
Spiking Neurons&quot;, &lt;a
href=&quot;http://neco.mitpress.org/cgi/content/abstract/17/12/2719&quot;&gt;&lt;cite&gt;Neural
Computation&lt;/cite&gt; &lt;strong&gt;17&lt;/strong&gt; (2005): 2719--2735&lt;/a&gt;
	&lt;li&gt;Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, &quot;Stochastic
Reasoning, Free Energy, and Information Geometry&quot;, &lt;a
href=&quot;http://neco.mitpress.org/cgi/content/abstract/16/9/1779&quot;&gt;&lt;cite&gt;Neural
Computation&lt;/cite&gt; &lt;strong&gt;16&lt;/strong&gt; (2004): 1779--1810&lt;/a&gt;
	&lt;li&gt;W. Janke, D.A. Johnston and R. Kenna, &quot;Information Geometry and
Phase Transitions&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cond-mat/0401092&quot;&gt;cond-mat/0401092&lt;/a&gt;
= &lt;cite&gt;Physica A&lt;/cite&gt; &lt;strong&gt;336&lt;/strong&gt; (2004): 181--186
	&lt;li&gt;Robert E. Kass and Paul W. Vos, &lt;cite&gt;Geometrical Foundations of
Asymptotic Inference&lt;/cite&gt;
	&lt;li&gt;G. Lebanon, &quot;Axiomatic Geometry of Conditional Models&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1109/TIT.2005.844060&quot;&gt;&lt;cite&gt;IEEE Transactions on
Information Theory&lt;/citE&gt; &lt;strong&gt;51&lt;/strong&gt; (2005): 1283--1294&lt;/a&gt;
	&lt;li&gt;M. K. Murray and J. W. Rice, &lt;cite&gt;Differential Geometry and
Statistics &lt;/cite&gt; [Thanks to &lt;a href=&quot;http://www.ergodicity.net/&quot;&gt;Anand
Sarwate&lt;/a&gt; for the recommendation]
	&lt;li&gt;I. J. Myung, Vijay Balasubramanian and M. A. Pitt, &quot;Counting
probability distributions: Differential geometry and model selection&quot;,
&lt;a
href=&quot;http://dx.doi.org/10.1073/pnas.170283897&quot;&gt;&lt;cite&gt;PNAS&lt;/cite&gt; &lt;strong&gt;97&lt;/strong&gt;
(2000): 11170--11175&lt;/a&gt;
	&lt;li&gt;Hiroyuki Nakahara and Shun-ichi Amari, &quot;Information-Geometric
Measure for Neural Spikes&quot;, &lt;cite&gt;Neural Computation&lt;/cite&gt; &lt;strong&gt;14&lt;/strong&gt;
(2002): 2269--2316
	&lt;li&gt;J. Pletonen and S. Kaski, &quot;Discriminative Components of Data&quot;,
&lt;cite&gt;IEEE Transactions on Neural Networks&lt;/cite&gt; &lt;strong&gt;16&lt;/strong&gt; (2005):
68--83
	&lt;li&gt;&lt;a href=&quot;http://omega.albany.edu:8008/&quot;&gt;Carlos C. Rodr&amp;iacute;guez&lt;/a&gt;
		&lt;ul&gt;
		&lt;li&gt;&quot;Are We Cruising a Hypothesis
Space?&quot;, &lt;a href=&quot;http://arxiv.org/abs/physics/9808009&quot;&gt;physics/9808009&lt;/a&gt;
		&lt;li&gt;&quot;The ABC of Model Selection: AIC, BIC, and the New CIC&quot;
[&lt;a href=&quot;http://omega.albany.edu:8008/CIC/me05.pdf&quot;&gt;PDF preprint&lt;/a&gt;]
		&lt;li&gt;&quot;Raping the Likelihood Principle&quot; [&lt;em&gt;Abstract&lt;/em&gt;:
&quot;Information Geometry brings a new level of objectivity to bayesian inference
and resolves the paradoxes related to the so called Likelihood
Principle.&quot; &lt;a href=&quot;http://omega.albany.edu:8008/lp.pdf&quot;&gt;PDF preprint&lt;/a&gt;]
		&lt;li&gt;
		&lt;/ul&gt;
	&lt;li&gt;Steven T. Smith, &quot;Covariance, Subspace, and Intrinsic Cramer-Rao
Bounds&quot;, &lt;a href=&quot;http://dx.doi.org/10.1109/TSP.2005.845428&quot;&gt;&lt;cite&gt;IEEE
Transactions on Signal Processing&lt;/cite&gt; &lt;strong&gt;53&lt;/strong&gt; (2005):
1610--1630&lt;/a&gt; [Thanks to Dr. Smith for a reprint]
	&lt;li&gt;R. F. Streater, &quot;Quantum Orlicz spaces in information geometry&quot;,
&lt;a href=&quot;http://arxiv.org/abs/math-ph/0407046&quot;&gt;math-ph/0407046&lt;/a&gt;
	&lt;li&gt;Masanobu Taniguchi and Yoshihide Kakizawa, &lt;cite&gt;Asymptotic Theory
of Statistical Inference for Time Series&lt;/cite&gt; [The first few chapters are
quite nice, but I haven't gotten to the parts where they actually use much
information geometry]
	&lt;li&gt;Marc Toussaint, &quot;Notes on information geometry and evolutionary
processes&quot;, &lt;a href=&quot;http://arxiv.org/abs/nlin.AO/0408040&quot;&gt;nlin.AO/0408040&lt;/a&gt;
	&lt;li&gt;Paolo Zanardi, Paolo Giorda, and Marco Cozzini,
&quot;Information-Theoretic Differential Geometry of Quantum Phase
Transitions&quot;, &lt;a
href=&quot;http//dx.doi.org/10.1103/PhysRevLett.99.100603&quot;&gt;&lt;cite&gt;Physical Review
Letters&lt;/cite&gt; &lt;strong&gt;99&lt;/strong&gt; (2007): 100603&lt;/a&gt;
	&lt;/ul&gt;
</description>
  </item>
  </channel>
</rss>