<?xml version="1.0"?>
<!-- name="generator" content="blosxom/2.0" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
  <channel>
    <title>Notebooks   </title>
    <link>http://bactra.org/notebooks</link>
    <description>Cosma's Notebooks</description>
    <language>en</language>

  <item>
    <title>Sufficient Statistics</title>
    <link>http://bactra.org/notebooks/2009/04/10#sufficient-statistics</link>
    <description>
&lt;P&gt;In statistical theory, a &quot;statistic&quot; is a well-behaved function of the data,
which is what's actualy used in calculations or inferences, rather than the
full data set.  E.g., the sample mean, the sample median, the sample variance,
etc.  A statistic is &lt;em&gt;sufficient&lt;/em&gt; if it is just as informative as the
full data.  The concept was introduced by R. A. Fisher in the 1920s, and
refined by Jerzy Neyman in the 1930s.  Parametric sufficiency means that the
statistic contains just as much information about the parameter as the full
data.  The actual data has a certain probability distribution conditional on
the data, which in general will also involve the parameter.  The statistic is
sufficient if this conditional distribution is the &lt;em&gt;same&lt;/em&gt; for all
parameter values.  (That's actually clearer in algebra but I don't feel up to
writing it in HTML now.)  Once we've controlled for the sufficient statistic,
nothing else --- not even the original data --- can tell us anything more about
the parameter.  Predictive sufficiency is similar: given the predictively
sufficient statistic, future observations can be predicted as well as if the
whole past was available.  This can be expressed concisely in terms
of &lt;a href=&quot;information-theory.html&quot;&gt;mutual information&lt;/a&gt;.

&lt;P&gt;A &lt;em&gt;necessary&lt;/em&gt; statistic is one which can be computed from any
sufficient statistic, without reference to the original data.  (It's
&quot;necessary&quot; in the sense that any optimal inference implicitly involves knowing
the necessary statistic.)  Under pretty general conditions, maximum likelihood
estimates are necessary statistics, though they are not always sufficient.  A
&lt;em&gt;minimal sufficient&lt;/em&gt; statistic is one which is both necessary and
sufficient --- i.e., it's just as informative as the original data, but it can
be computed from any other sufficient statistic; no further compression of the
data is possible, without losing some information.

&lt;P&gt;A lot of my work has involved describing and finding predictively sufficient
statistics for time series and spatio-temporal processes.  It turns out that
the statistical sufficiency property gives rise to a Markov property for the
statistics.  (So,
basically, &lt;a href=&quot;computational-mechanics.html&quot;&gt;computational mechanics&lt;/a&gt;
turns out to be about constructive predictively sufficient statistics.)  So I'm
very interested in sufficiency in general, and especially how it relates to
Markovian representations of non-Markovian processes.

&lt;P&gt;Topics of particular interest: Necessary and sufficient conditions for the
existence of non-trivial sufficient statistics; dimensionality of sufficient
statistics; geometric and probabilistic characterizations; decision-theoretic
properties; necessary statistics; minimal sufficient statistics
for &lt;a href=&quot;transducers.html&quot;&gt;transducers&lt;/a&gt;; connections
to &lt;a href=&quot;causality.html&quot;&gt;causal inference&lt;/a&gt;

&lt;ul&gt;Recommended:
	&lt;li&gt;Sufficiency is a very important topic in statistical inference, and
any good book on theoretical statistics will cover it in depth.  I like E. L.
Lehmann's two-volume set on &lt;citE&gt;Theory of Point Estimation&lt;/cite&gt;
and &lt;cite&gt;Testing Statistical Hypotheses&lt;/cite&gt;, but really anyone will do.
	&lt;li&gt;David Blackwell and M. A. Girshick, &lt;cite&gt;Theory of Games and
Statistical Decisions&lt;/cite&gt; [Blackwell was a pioneer in exploring the
decision-theoretic properties of sufficiency, and this excellent old book
contains many deep theorems in this area]
	&lt;li&gt;E. B. Dynkin, &quot;Sufficient statistics and extreme
points&quot;, &lt;cite&gt;Annals of Probability&lt;/cite&gt; &lt;strong&gt;6&lt;/strong&gt; (1978): 705--730
[&quot;The connection between
&lt;a href=&quot;ergodic-theory.html&quot;&gt;ergodic decompositions&lt;/a&gt; and sufficient
statistics is explored in an elegant paper by DYNKIN&quot; ---
Kallenberg, &lt;cite&gt;Foundations of Modern Probability&lt;/cite&gt;, p. 577.
&lt;a
href=&quot;http://links.jstor.org/sici?sici=0091-1798%28197810%296%3A5%3C705%3ASSAEP%3E2.0.CO%3B2-D&quot;&gt;Link
to JSTOR&lt;/a&gt;]
	&lt;li&gt;John W. Fisher III, Alexander T. Ihler and Paula A. Viola,
&quot;Learning Informative Statistics: A Nonparametric Approach&quot;, pp. 900--906 in
NIPS 12 (1999) [&lt;a href=&quot;http://books.nips.cc/papers/files/nips12/0900.pdf&quot;&gt;PDF
reprint&lt;/a&gt;.  I'd call this more of a semi-parametric approach than a fully
non-parametric one; they assume a parametric form for the dependence structure,
but are agnostic about the distributions of innovations, and so try to maximize
non-parametrically estimated mutual informations.  In the limit, this will give
them sufficient statistics.]
	&lt;li&gt;R. A. Fisher
		&lt;ul&gt;
		&lt;li&gt;&quot;A Mathematical Examination of the Methods of Determining the Accuracy of an Observation by the Mean Error, and by the Mean Square Error&quot;,
&lt;cite&gt;Monthly Notices of the Royal Astronomical
Society&lt;/cite&gt; &lt;strong&gt;80&lt;/strong&gt; (1920): 758--770 [Apparently the first time
the sufficiency property was noted, though Fisher does not use that term
here.  &lt;a
href=&quot;http://digital.library.adelaide.edu.au/coll/special/fisher/12.pdf&quot;&gt;PDF&lt;/a&gt;]
		&lt;li&gt;&quot;On the Mathematical Foundations of Theoretical Statistics&quot;,
&lt;cite&gt;Philosophical Transactions of the Royal
SocietyA &lt;/cite&gt; &lt;strong&gt;222&lt;/strong&gt; (1922): 309--368 [Formal introduction of
the concept, and the name, of sufficiency, along with much else that has proved
fundamental to statistics, such as the likelihood function and the method of
maximum likelihood.  PDF in two
parts, &lt;a
href=&quot;http://digital.library.adelaide.edu.au/coll/special/fisher/18pt1.pdf&quot;&gt;1&lt;/A&gt;, &lt;a
href=&quot;http://digital.library.adelaide.edu.au/coll/special/fisher/18pt2.pdf&quot;&gt;2&lt;/a&gt;]
		&lt;li&gt;&quot;Theory of Statistical Estimation&quot;, &lt;cite&gt;Proceedings of
the Cambridge Philosophical Society&lt;/cite&gt; &lt;strong&gt;22&lt;/strong&gt; (1925): 700--725
[Often, but mistakenly, cited in place of the 1922 paper; admittedly, clearer.
&lt;a href=&quot;http://digital.library.adelaide.edu.au/coll/special/fisher/42.pdf&quot;&gt;PDF&lt;/a&gt;]
		&lt;/ul&gt;
	&lt;li&gt;Solomon Kullback, &lt;cite&gt;Information Theory and Statistics&lt;/cite&gt;
	&lt;li&gt;Rudolf Kulhavy, &lt;cite&gt;Recursive Nonlinear Estimation: A Geometric
Approach&lt;/cite&gt;
	&lt;li&gt;Benoit Mandelbrot, &quot;The Role of Sufficiency and of Estimation in
Thermodynamics&quot;, &lt;cite&gt;Annals of Mathematical
Statistics&lt;/cite&gt; &lt;strong&gt;33&lt;/strong&gt; (1962): 1021--1038
[&lt;a
href=&quot;http://links.jstor.org/sici?sici=0003-4851%28196209%2933%3A3%3C1021%3ATROSAO%3E2.0.CO%3B2-N&quot;&gt;JSTOR&lt;/a&gt;; &lt;a
href=&quot;http://math.yale.edu/mandelbrot/web_pdfs/029sufficiencyandestimation.pdf&quot;&gt;free
PDF reprint&lt;/a&gt;.  Extensive thermodynamic variables as sufficient statistics
for the conjugate intensive variables; Gibbs canonical form arising from
natural requirements on finite-dimensional sufficient statistics, which can
only be achieved for exponential families of probability distributions.  Very
clever, and IMHO a real contribution to
the &lt;a href=&quot;stat-mech-foundations.html&quot;&gt;foundations of staitstical mechanics
and thermodynamics&lt;/a&gt;.]
	&lt;li&gt;Giorgio Picci, &quot;Some Connections Between the Theory of Sufficient
Statistics and the Identifiability Problem&quot;, &lt;cite&gt;SIAM Journal on Applied
Mathematics&lt;/cite&gt; &lt;strong&gt;33&lt;/strong&gt; (1977): 383--398 [Introduces the idea of
a &quot;maximal identifiable statistic&quot; --- the coarsest partition of hypothesis
space where each equivalence class/cell of the partition gives rise to
a &lt;em&gt;distinct&lt;/em&gt; distribution of observables.  (I would prefer &quot;parameter&quot;
or &quot;functional&quot;,rather than &quot;statistic&quot;, since it's a function of the
distribution, not the observables, but that's a quibble.)  It might be
interesting to try to define &lt;a href=&quot;emergent-properties.html&quot;&gt;emergence&lt;/a&gt;
in these terms --- perhaps as a restriction on the observable sigma-field such
that the equivalence classes of the maximal identifiable parameter become
infinite-dimensional, or something like
that.  &lt;a href=&quot;http://www.jstor.org/stable/2100699&quot;&gt;JSTOR&lt;/a&gt;.  Thanks to
Rhiannon Weaver for the pointer.]
	&lt;/ul&gt;

&lt;ul&gt;To read:
	&lt;li&gt;R. R. Bahadur, &quot;Sufficiency and statistical decision functions,&quot;
&lt;cite&gt;Annals of Mathematical Statistics&lt;/cite&gt; &lt;strong&gt;25&lt;/strong&gt; (1954):
423--462
	&lt;li&gt;T. Bohlin, &quot;Information pattern for linear discrete-time models
with stochastic coefficients,&quot; &lt;cite&gt;IEEE Transactions on Automatic
Control&lt;/cite&gt; &lt;strong&gt;15&lt;/strong&gt; (1970): 104--106 [On recursively-computable
sufficient statistics]
	&lt;li&gt;J. L. Denny, &quot;Sufficient Conditions for a Family of Probabilities
to be
Exponential&quot;, &lt;a
href=&quot;http://www.pnas.org/cgi/reprint/57/5/1184&quot;&gt;&lt;cite&gt;Proceedings of the
National Academy of Sciences&lt;/cite&gt; &lt;strong&gt;57&lt;/strong&gt; (1967): 1184--&lt;/a&gt; [&quot;We
make the following statement precise under fairly weak conditions: in an
experiment, if we summarize &lt;em&gt;n&lt;/em&gt; statistically independent observtions
(&lt;em&gt;x&lt;/em&gt;&lt;sub&gt;1&lt;/sub&gt;,...&lt;em&gt;x&lt;sub&gt;n&lt;/em&gt;&lt;/sub&gt;) in &lt;em&gt;m&lt;/em&gt;
&lt; &lt;em&gt;n&lt;/em&gt; real numbers
(&lt;em&gt;y&lt;/em&gt;&lt;sub&gt;1&lt;/sub&gt;,...&lt;em&gt;y&lt;sub&gt;m&lt;/em&gt;&lt;/sub&gt;),
where &lt;em&gt;y&lt;sub&gt;j&lt;/em&gt;&lt;/sub&gt; = \sum&lt;sub&gt;i=1&lt;/sub&gt;&lt;sup&gt;n&lt;/sup&gt;&lt;em&gt;f&lt;sub&gt;j&lt;/sub&gt;&lt;/em&gt;(&lt;em&gt;x&lt;sub&gt;i&lt;/sub&gt;&lt;/em&gt;)
and the &lt;em&gt;f&lt;sub&gt;j&lt;/sub&gt;&lt;/em&gt; are given functions, and if we assume we have lost no
information by the summary, then the family of probabilities associated with
the experiment must be an exponentialm family.&quot;]
	&lt;li&gt;E. B. Dynkin, &quot;Necessary and sufficient statistics for a family of
probability distributions,&quot; &lt;cite&gt;Uspekhi maetm. nauk&lt;/cite&gt;
&lt;strong&gt;6&lt;/strong&gt; (1951): 68--90 [Apparently translated
in &lt;cite&gt;Select. Trans. Math. Statist. Prob.&lt;/cite&gt; &lt;strong&gt;1&lt;/strong&gt; (1951):
23--41.  Zacks, below, is supposed to follow closely]
	&lt;li&gt;V. S. Huzurbazar, &lt;cite&gt;Sufficient Statistics: Selected
Contributions&lt;/cite&gt;
	&lt;li&gt;Anna Jencova and Denes Petz, &quot;Suffificiency in quantum statistical
inference&quot;, &lt;a href=&quot;http://arxiv.org/abs/math-ph/0412093&quot;&gt;math-ph/0412093&lt;/a&gt;
[Sounds cool!]
	&lt;li&gt;S. L. Lauritzen, &lt;cite&gt;Extremal Families and Systems of Sufficient
Statistics&lt;/cite&gt;
	&lt;li&gt;W. J. Runggaldier and F. Spizzichino, &quot;Sufficient conditions for
finite dimensionality of filters in discrete time: A Laplace transform-based
approach,&quot; &lt;cite&gt;Bernoulli&lt;/cite&gt; &lt;strong&gt;7&lt;/strong&gt; (2001): 211--221
	&lt;li&gt;S. Zacks, &lt;cite&gt;The Theory of Statistical Inference&lt;/cite&gt; [For
material on necessary and sufficient statistics]
	&lt;/ul&gt;
</description>
  </item>
  </channel>
</rss>