<?xml version="1.0"?>
<!-- name="generator" content="blosxom/2.0" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
  <channel>
    <title>Notebooks   </title>
    <link>http://bactra.org/notebooks</link>
    <description>Cosma's Notebooks</description>
    <language>en</language>

  <item>
    <title>Statistical Learning Theory with Dependent Data</title>
    <link>http://bactra.org/notebooks/2009/08/26#dependent-learning</link>
    <description>
&lt;P&gt;See &lt;a href=&quot;learning-theory.html&quot;&gt;learning theory&lt;/a&gt; if that title doesn't
make sense.  I am particularly interested in learning
for &lt;a href=&quot;ergodic-theory.html&quot;&gt;ergodic&lt;/a&gt; time series.  A lot of the work
in this area involves strong mixing conditions, especially beta-mixing.  I
suspect that in many cases strong mixing is &lt;em&gt;not&lt;/em&gt; actually needed, but
this is a pure hunch on my part with absolutely no evidence to back it up
(right now).

&lt;P&gt;See also:
	&lt;a href=&quot;empirical-process-theory.html&quot;&gt;Empirical Process Theory&lt;/a&gt;

&lt;ul&gt;Recommended:
	&lt;li&gt;Andrew Nobel and Amir Dembo, &quot;A Note on Uniform Laws of Averages for Dependent Processes&quot;, &lt;citE&gt;Statistics and Probability Letters&lt;/cite&gt;
&lt;strong&gt;17&lt;/strong&gt; (1993): 169--172 [An extremely easy way to extend uniform
laws of large numbers to uniform ergodic theorems for mixing processes.
Actually I suspect that mixing is only necessary to get an explicit rate; I
should re-read
it.  &lt;a
href=&quot;http://stat-or.unc.edu/webspace/public_html.stat/faculty/nobel/links/Papers/ULA-wkbern.pdf&quot;&gt;PDF
preprint&lt;/a&gt; via Dr. Nobel.]
	&lt;li&gt;Ron Meir, &quot;Nonparametric Time Series Prediction Through Adaptive
Model Selection,&quot; &lt;cite&gt;Machine Learning&lt;/cite&gt; &lt;strong&gt;39&lt;/strong&gt; (2000):
5--34 [&lt;a
href=&quot;http://www.ee.technion.ac.il/~rmeir/Publications/MeirTimeSeries00.pdf&quot;&gt;PDF&lt;/a&gt;.
Extending the &quot;structural risk minimization&quot; framework due to Vapnik to time
series.  Unfortunately Meir's approach demands knowledge of the mixing rate of
the process, which we don't really know how to estimate, but this is a very
encouraging first step.]
	&lt;li&gt;Mehryar Mohri and Afshin Rostamizadeh, &quot;Stability Bound for
Stationary Phi-mixing and Beta-mixing
Processes&quot;, &lt;a href=&quot;http://arxiv.org/abs/0811.1629&quot;&gt;arxiv:0811.1629&lt;/a&gt;
	&lt;li&gt;Daniil Ryabko, &quot;Pattern Recognition for Conditionally Indpendent
Data&quot;, &lt;a href=&quot;http://arxiv.org/abs/cs.LG/0507040&quot;&gt;cs.LG/0507040&lt;/a&gt; [It's a
bit of an odd form of dependence: the sequence of labels can show essentially
any form of dependence you like, but the features of the nth object have to be
independent of everything else, given the label of the nth object.  (Hence
&quot;conditionally independent data&quot;.)  This is like some kinds of hidden Markov
model...]
	&lt;li&gt;Mathukumalli Vidyasagar, &lt;cite&gt;A Theory of Learning and
Generalization: With Applications to Neural Networks and Control Systems&lt;/cite&gt;
[Has a very nice discussion of when the uniform laws of large numbers
of &lt;a href=&quot;learning-theory.html&quot;&gt;statistical learning theory&lt;/a&gt; transfer from
the usual IID setting to dependent processes, becoming uniform ergodic
theorems. (Sufficient conditions include things like beta-mixing, but necessary
and sufficient conditions seem to still be
unknown.)  &lt;a href=&quot;../weblog/algae-209-01.html#vidyasagar&quot;&gt;Mini-review&lt;/a&gt;]
	&lt;/ul&gt;

&lt;ul&gt;To read:
	&lt;li&gt;Herold Dehling (ed.), &lt;cite&gt;Empirical Process Techniques for
Dependent Data&lt;/cite&gt;
	&lt;li&gt;Mehryar Mohri, Afshin Rostamizadeh, &quot;Rademacher Complexity Bounds
for Non-I.I.D. Processes&quot;
[&lt;a href=&quot;http://www.cs.nyu.edu/~mohri/postscript/rad.pdf&quot;&gt;Preprint&lt;/a&gt;.
Mostly for stationary beta-mixing processes, to judge from the abstract.]
	&lt;li&gt;Daniil Ryabko, &quot;Characterizing predictable classes of processes&quot;,
&lt;a href=&quot;http://arxiv.org/abs/0905.4341&quot;&gt;arxiv:0905.4341&lt;/a&gt;
	&lt;li&gt;Bin Zou, Luoqing Li and Zongben Xu, &quot;The generalization performance
of ERM algorithm with strongly mixing
observations&quot;, &lt;a href=&quot;http://dx.doi.org/10.1007/s10994-009-5104-z&quot;&gt;&lt;cite&gt;Machine
Learning&lt;/cite&gt; &lt;strong&gt;75&lt;/strong&gt; (2009): 275--295&lt;/a&gt;
	&lt;/ul&gt;
</description>
  </item>
  </channel>
</rss>