<?xml version="1.0"?>
<!-- name="generator" content="blosxom/2.0" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
  <channel>
    <title>Notebooks   </title>
    <link>http://bactra.org/notebooks</link>
    <description>Cosma's Notebooks</description>
    <language>en</language>

  <item>
    <title>Deviation Inequalities in Probability Theory</title>
    <link>http://bactra.org/notebooks/2012/04/09#deviation-inequalities</link>
    <description>

&lt;P&gt;The laws of large numbers say that averages taken over large samples
converge on expectation values.  But these are asymptotic statements which say
nothing about what happens for samples of any particular size.  A deviation
inequality, by contrast, is a result which says that, for realizations of
such-and-such a stochastic process, the sample value of this functional
deviates by so much from its typical value with no more than a certain
probability: \( \Pr{\left(\left|f(X_1, X_2, \ldots X_n) - \mathbb{E}[f]\right|
&gt; h \right)} &lt; r(n,h,f) \), where the rate function \( r \) has to be given
explicitly, and may depend on the true joint distribution of the \( X_i \)
(though it's more useful if it doesn't depend on that very much).  (And of
course one could compare to the median rather than the mean, or just look at
fluctuations above the typical value rather than to other side, or get within a
certain factor of the typical value rather than a certain distance, etc.)  The
rate should be decreasing in \( h \)  and in \( n \).

&lt;P&gt;An elementary example is &quot;Markov's inequality&quot;: If \( X \) is a
non-negative random variable with a finite mean, then

\[
\Pr{\left(X &gt; h \right)} &lt; \frac{\mathbb{E}[X]}{h} ~.
\]

One can derive many other deviation inequalities from Markov's inequality by
taking \( X = g(Y) \), where \( Y \) is another random variable and \( g \) is
some suitable non-negative-valued function.

&lt;P&gt;For instance if \( Y \) has a finite variance \( v \), then

\[
\Pr{\left(|Y-\mathbb{E}[Y]| &gt; h \right)} &lt; \frac{v}{h^2} ~.
\]

This is
known as &quot;Chebyshev's inequality&quot;.  (Exercise: derive Chebyshev's inequality
from Markov's inequality.  Since Markov was in fact Chebyshev's student, it
would seem that the logical order here reverses the historical one, though
guessing at priority from eponyms is always hazardous.)  Suppose
that \( X_1, X_2, \ldots \) are random variables with
a common mean \( m \) and variance \( v \), and \( Y \) is the average of
the first \( n \) of these.  Then Chebyshev's inequality tells us that

\[
\Pr{\left(|Y - m| &gt; h\right)} &lt; \frac{\mathrm{Var}[Y]}{h^2} ~.
\]

If the \( X_i \) are uncorrelated (e.g., independent), then
\( \mathrm{Var}[Y] = v/n \), so the probability that the sample average
differs from the expectation by \( h \) or more goes to zero, no matter how
small we make \( h \).  This is precisely the weak law of large numbers.  If
the \( X_i \) are correlated, but nonetheless \( \mathrm{Var}[Y] \)
goes to zero as \( n \) grows (generally because correlations decay), then we
get an &lt;a href=&quot;ergodic-theorem.html&quot;&gt;ergodic theorem&lt;/a&gt;.  The rate of
convergence here however is not very good, just \( O(n^{-1}) \).

&lt;P&gt;Since \( e^u \) is a monotonically increasing function of
\( u \), for any positive \( t \), \( X &gt; h \) if and only
if \( e^{tX} &gt; e^{th} \), so we get an
exponential inequality,

\[
\Pr{\left(X &gt; h\right)} &lt; e^{-th} \mathbb{E}\left[e^{tX}\right] ~.
\]

Notice that the first term in the bound does not depend on the distribution of
\( X \), unlike the second term, which doesn't depend on the scale of the
deviation \( h \).  We are in fact free to pick whichever \( t \) gives us the
tightest bound.  The quantity \( \mathbb{E}\left[e^{tX}\right] \) is called the
&quot;moment generating function&quot; of \( X \), let's abbreviate it \( M_X(t) \), and
can fail to exist if some moments are infinite.  (Write the power series for \(
e^u \) and take expectations term by term to see all this.)  It has however the
very nice property that when \( X_1 \) and \( X_2 \) are independent, \(
M_{X_1+X_2}(t) = M_{X_1}(t) + M_{X_2}(t) \).  From this it follows that if \( Y
\) is the sum of \( n \) independent and identically distributed copies of \( X
\), \( M_Y(t) = {(M_X(t))}^{n} \).  Thus

\[
\Pr{\left(Y &gt; h\right)} &lt; e^{-th} (M_X(t))^{n} ~.
\]

If \( Z = Y/n \), the sample mean, this in turn gives

\[
\Pr{\left(Z &gt; h\right)} = \Pr{\left(Y &gt; hn\right)}
&lt; e^{-thn} (M_X(t))^{n} ~.
\]

So we can get exponential rates of convergence for the law of large numbers
from this.  [Students who took the CMU statistics department's probability
qualifying exam in 2010 now know who wrote problem 9.]  Again, the restriction
to IID random variables is not really essential, allowing dependence just means
that the moment generating functions don't factor exactly, but if they almost
factor than we can get results of the same form.  (Often, we end up with \( n
\) being replaced by \( n/n_0 \), where \( n_0 \) is something like how long it
takes dependence to decay to trivial levels.)

&lt;P&gt;I don't feel like going into the reasoning behind the other common deviation
bounds &amp;mdash; Bernstein, Chernoff, Hoeffding, Azuma, McDiarmid, etc. &amp;mdash;
because I feel like I've given enough of the flavor already.  I am using this
notebook as, actually, a notebook, more specifically a place to collect
references on deviation inequalities, especially ones that apply to collections
of dependent random variables.  Results here typically appeal to various
notions of mixing or decay of correlations, as found
in &lt;a href=&quot;ergodic-theory.html&quot;&gt;ergodic theory&lt;/a&gt;.

&lt;P&gt;See also:
	&lt;a href=&quot;concentration-of-measure.html&quot;&gt;Concentration of Measure&lt;/a&gt;;
	&lt;a href=&quot;ergodic-theory.html&quot;&gt;Ergodic Theory&lt;/a&gt;;
	&lt;a href=&quot;large-deviations.html&quot;&gt;Large Deviations&lt;/a&gt;;
	&lt;a href=&quot;learning-theory.html&quot;&gt;Learning Theory&lt;/a&gt;;
	&lt;a href=&quot;probability.html&quot;&gt;Probability&lt;/a&gt;

&lt;ul&gt;Recommended:
	&lt;li&gt;G. G. Bosco, F. P. Machado and Thomas Logan Ritchie, &quot;Exponential
Rates of Convergence in the Ergodic Theorem: A Constructive
Approach&quot;, &lt;a href=&quot;http://dx.doi.org/10.1007/s10955-010-9945-4&quot;&gt;&lt;cite&gt;Journal
of Statistical Physics&lt;/cite&gt;
&lt;strong&gt;139&lt;/strong&gt; (2010): 367--374&lt;/a&gt;
	&lt;li&gt;&lt;a href=&quot;http://www.kyb.mpg.de/~bousquet/&quot;&gt;Olivier Bousquet&lt;/a&gt;,
&lt;a href=&quot;http://www.lri.fr/~bouchero/&quot;&gt;St&amp;eacute;phane Boucheron&lt;/a&gt;
and &lt;a href=&quot;http://www.econ.upf.es/~lugosi/&quot;&gt;G&amp;aacute;bor Lugosi&lt;/a&gt;,
&quot;Introduction to Statistical Learning Theory&quot;
[Gives a very nice review of many deviation inequalities, with references.]
	&lt;li&gt;Nicolo Cesa-Bianchi and Gabor Lugosi, &lt;citE&gt;Prediction, Learning,
and Games&lt;/cite&gt; [Provides exemplary proofs in the appendix.  &lt;a href=&quot;../weblog/algae-2008-07.html#prediction&quot;&gt;Mini-review&lt;/a&gt;]
	&lt;li&gt;J.-R. Chazottes, P. Collet, C. Kuelske and F. Redig, &quot;Deviation
inequalities via coupling for stochastic processes and random fields&quot;, &lt;a
href=&quot;http://arxiv.org/abs/math.PR/0503483&quot;&gt;math.PR/0503483&lt;/a&gt; [Very cool]
	&lt;li&gt;Iosif Pinelis, &quot;Between Chebyshev and Cantelli&quot;, &lt;a href=&quot;http://arxiv.org/abs/1011.6065&quot;&gt;arxiv:1011.6065&lt;/a&gt; [Cute, with an appealing proof]
	&lt;li&gt;Yongqiang Tang, &quot;A Hoeffding-Type Inequality for Ergodic Time
Series&quot;, &lt;a href=&quot;http://dx.doi.org/10.1007/s10959-007-0057-2&quot;&gt;&lt;cite&gt;Journal of
Theoretical Probability&lt;/cite&gt; &lt;strong&gt;20&lt;/strong&gt; (2007): 167--176&lt;/a&gt;
[&lt;a href=&quot;http://www4.stat.ncsu.edu/~sghosal/papers/Tang.pdf&quot;&gt;PDF preprint&lt;/a&gt;]
	&lt;/ul&gt;

&lt;ul&gt;To read:
	&lt;li&gt;Olivier Catoni
		&lt;ul&gt;
		&lt;li&gt;&quot;High confidence estimates of the mean of heavy-tailed real random variables&quot;, &lt;a href=&quot;http://arxiv.org/abs/0909.5366&quot;&gt;arxiv:0909.5366&lt;/a&gt;
		&lt;li&gt;&quot;Challenging the empirical mean and empirical variance: a deviation study&quot;, &lt;a href=&quot;http://arxiv.org/abs/1009.2048&quot;&gt;arxiv:1009.2048&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;Patrick Cattiaux and Arnaud Guillin, &quot;Deviation bounds for additive
functionals of Markov process&quot;, &lt;a
href=&quot;http://arxiv.org/abs/math.PR/0603021&quot;&gt;math.PR/0603021&lt;/a&gt;
[Non-asymptotic bounds for the probability that time averages deviate from
expectations with respect to the invariant measure, when the process is
stationary and ergodic and the invariant measure is reasonably regular.]
	&lt;li&gt;J&amp;eacute;r&amp;ocirc;me Dedecker, Florence Merlev&amp;egrave;de, Magda Peligrad, Sergey Utev, &quot;Moderate deviations for stationary sequences of bounded random variables&quot;, &lt;a href=&quot;http://arxiv.org/abs/0711.3924&quot;&gt;arxiv:0711.3924&lt;/a&gt;
	&lt;li&gt;Marian Grendar Jr. and Marian Grendar, &quot;Chernoff's bound forms,&quot; &lt;a
href=&quot;http://arxiv.org/abs/math.PR/0306326&quot;&gt;math.PR/0306326&lt;/a&gt;
	&lt;li&gt;A. Guionnet and B. Zegarlinski, &lt;cite&gt;Lectures on Logarithmic Sobolev Inequalities&lt;/cite&gt; [&lt;a href=&quot;http://mathaa.epfl.ch/prst/mourrat/ihpin.pdf&quot;&gt;120 pp. PDF&lt;/a&gt;]
	&lt;li&gt;Vladislav Kargin, &quot;A large deviation inequality for vector functions on finite reversible Markov Chains&quot;, &lt;cite&gt;Annals of Applied Probability&lt;/cite&gt;
&lt;strong&gt;17&lt;/strong&gt; (2007): 1202--1221, &lt;a href=&quot;http://arxiv.org/abs/0508538&quot;&gt;arxiv:0508538&lt;/a&gt;
	&lt;li&gt;Carlos A. Leon and Francois Perron, &quot;Optimal Hoeffding bounds for
discrete reversible Markov
chains&quot;, &lt;a href=&quot;http://arxiv.org/abs/math.PR/0405296&quot;&gt;math.PR/0405296&lt;/a&gt;
	&lt;li&gt;Dasha Loukianova, Oleg Loukianov, Eva Loecherbach, &quot;Polynomial
bounds in the Ergodic Theorem for positive recurrent one-dimensional diffusions
and integrability of hitting
times&quot;, &lt;a href=&quot;http://arxiv.org/abs/0903.2405&quot;&gt;arxiv:0903.2405&lt;/a&gt;
[&lt;em&gt;non-asymptotic&lt;/em&gt; deviation bounds from bounds on moments of
recurrence times]
	&lt;li&gt;P. Major
		&lt;ul&gt;
		&lt;li&gt;&quot;On a multivariate version of Bernstein's inequality&quot;,
&lt;a href=&quot;http://arxiv.org/abs/math.PR/0411287&quot;&gt;math.PR/0411287&lt;/a&gt;
		&lt;li&gt;&quot;A multivariate generalization of Hoeffding's
ineqality&quot;, &lt;a href=&quot;http://arxiv.org/abs/math.PR/0411288&quot;&gt;math.PR/0411288&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;Florence Merlev&amp;egrave;de, Magda Peligrad, Emmanuel Rio , &quot;A Bernstein type inequality and moderate deviations for weakly dependent sequences&quot;,
&lt;a href=&quot;http://dx.doi.org/10.1007/s00440-010-0304-9&quot;&gt;&lt;citE&gt;Probability Theory
and Related Fields&lt;/cite&gt; &lt;strong&gt;151&lt;/strong&gt; (2011):
435--474&lt;/a&gt;, &lt;a href=&quot;http://arxiv.org/abs/0902.0582&quot;&gt;arxiv:0902.0582&lt;/a&gt;
	&lt;li&gt;Ted Theodosopoulos, &quot;A Reversion of the Chernoff Bound&quot;, &lt;a
href=&quot;http://arxiv.org/abs/math.PR/0501360&quot;&gt;math.PR/0501360&lt;/a&gt;
	&lt;/ul&gt;

&lt;P&gt;(Thanks to Michael Kalgalenko for typo-spotting)
</description>
  </item>
  </channel>
</rss>
