<?xml version="1.0"?>
<!-- name="generator" content="blosxom/2.0" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
  <channel>
    <title>Notebooks   </title>
    <link>http://bactra.org/notebooks</link>
    <description>Cosma's Notebooks</description>
    <language>en</language>

  <item>
    <title>Universal Prediction Algorithms</title>
    <link>http://bactra.org/notebooks/2009/12/28#universal-prediction</link>
    <description>
&lt;P&gt;&lt;em&gt;Given&lt;/em&gt;: a single &lt;a href=&quot;time-series.html&quot;&gt;time series&lt;/a&gt;, perhaps
a very long one, from a &lt;a href=&quot;stochastic-process.html&quot;&gt;stochastic
process&lt;/a&gt; which is basically unknown; perhaps merely that it is stationary
and &lt;a href=&quot;ergodic-theory.html&quot;&gt;ergodic&lt;/a&gt;.

&lt;P&gt;&lt;em&gt;Desired&lt;/em&gt;: a forecast which will converge on the best possible
forecast, as the series becomes longer and longer.  Or: the best possible
forecast from within a fixed class of forecasting algorithms.

&lt;P&gt;A solution is called a &lt;em&gt;universal&lt;/em&gt; prediction algorithm because it
applied equally to all the processes within class, and is not tailored to any
one of them.

&lt;P&gt;This has connections to &lt;a href=&quot;information-theory.html&quot;&gt;information
theory&lt;/a&gt; (via universal compression algorithms), to the problem of
finding &lt;a href=&quot;markov.html&quot;&gt;Markovian representations&lt;/a&gt;
and &lt;a href=&quot;inference-markov.html&quot;&gt;inference for Markov models&lt;/a&gt;, and to
many other topics.

&lt;P&gt;See also:
	&lt;a href=&quot;ergodic-theory.html&quot;&gt;Ergodic Theory&lt;/a&gt;;
	&lt;a href=&quot;learning-games.html&quot;&gt;Learning in Games&lt;/a&gt;;
	&lt;a href=&quot;learning-theory.html&quot;&gt;Learning Theory&lt;/a&gt;;
	&lt;a href=&quot;learning-inference-induction.html&quot;&gt;Machine Learning, Statistical Inference and Induction&lt;/a&gt;;
	&lt;a href=&quot;sequential-decisions.html&quot;&gt;Sequential Decisions Under
Uncertainty&lt;/a&gt;;
	&lt;a href=&quot;time-series.html&quot;&gt;Time series&lt;/a&gt;

&lt;ul&gt;Recommended:
	&lt;li&gt;Paul H. Algoet
		&lt;ul&gt;
		&lt;li&gt;&quot;Universal Schemes for Prediction, Gambling, and Portfolio
Selection,&quot; &lt;cite&gt;Annals of Probability&lt;/cite&gt; &lt;strong&gt;20&lt;/strong&gt; (1992):
901--941
[&lt;a
href=&quot;http://www.jstor.org/pss/2244620&quot;&gt;JSTOR&lt;/a&gt;]
and an important Correction, &lt;strong&gt;23&lt;/strong&gt; (1995): 474--478
		&lt;li&gt;&quot;Universal Schemes for Learning the Best Nonlinear
Predictor Given the Infinite Past and Side Information,&quot; &lt;cite&gt;IEEE
Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;45&lt;/strong&gt; (1999): 1165--1185
		&lt;/ul&gt;
	&lt;li&gt;Nicolo Cesa-Bianchi and Gabor Lugosi, &lt;citE&gt;Prediction, Learning,
and Games&lt;/cite&gt; [&lt;a href=&quot;../weblog/algae-2008-07.html#prediction&quot;&gt;Mini-review&lt;/a&gt;]
	&lt;li&gt;Shane Legg, &quot;Is There an Elegant Universal Theory of
Prediction?&quot;, &lt;a href=&quot;http://arxiv.org/abs/cs.AI/0606070&quot;&gt;cs.AI/0606070&lt;/a&gt; [A
nice set of diagonalization arguments against the hope of a universal
prediction scheme which has the nice features of Solomonoff-style induction,
but is actually computable.]
	&lt;li&gt;Donald Ornstein and Benjamin Weiss, &quot;How Sampling Reveals a
Process&quot;, &lt;a href=&quot;http://dx.doi.org/10.1214/aop/1176990729&quot;&gt;&lt;cite&gt;Annals of
Probability&lt;/cite&gt; &lt;strong&gt;18&lt;/strong&gt; (1990): 905--930&lt;/a&gt; [Open access.  A
truly beautiful and inspiring paper.  &amp;mdash; The negative results here would
seem to depend on their very strong notion of what it means to reconstruct a
process.  For instance, while in their sense it is not possible to always
discriminate between two processes (unless they are Bernoulli), Ryabko and
Ryabko (&lt;a href=&quot;http://arxiv.org/abs/0804.0510&quot;&gt;arxiv:0804.0510&lt;/a&gt;) give a
consistent test to do just this for ergodic processes, not necessarily
Bernoulli, by employing a weaker notion of inter-process distance.]
	&lt;li&gt;Maxim Raginsky, Roummel F. Marcia, Jorge Silva and Rebecca M.
Willett, &quot;Sequential Probability Assignment via Online Convex Programming
Using Exponential Families&quot; [ISIT 2009; &lt;a href=&quot;http://people.ee.duke.edu/~willett/papers/raginsky_marcia_silva_willett_ISIT09.pdf&quot;&gt;PDF&lt;/a&gt;]
	&lt;/ul&gt;

&lt;ul&gt;To read:
	&lt;li&gt;Pierre Alquier and Olivier Wintenberger, &quot;Model selection and
randomization for weakly dependent time series forecasting&quot;, &lt;a href=&quot;http://arxiv.org/abs/0902.2924&quot;&gt;arxiv:0902.2924&lt;/a&gt;
	&lt;li&gt;L. Gyorfi, G. Morvai, S. Yakowitz, &quot;Limits to consistent on-line
forecasting for ergodic time series&quot;, &lt;cite&gt;IEEE Transactions on Information
Theory&lt;/cite&gt; &lt;strong&gt;44&lt;/strong&gt; (1998): 886-892
= &lt;a href=&quot;http://arxiv.org/abs/0712.2430&quot;&gt;arxiv:0712.2430&lt;/a&gt;
	&lt;li&gt;Marcus Hutter
		&lt;ul&gt;
		&lt;li&gt;&quot;Convergence and Error Bounds for Universal Prediction of
Nonbinary Sequences,&quot; &lt;a
href=&quot;http://arXiv.org/abs/cs/0106036&quot;&gt;cs.LG/0106036&lt;/a&gt;
		&lt;li&gt;&quot;Convergence and Loss Bounds for Bayesian Sequence
Prediction,&quot; &lt;a href=&quot;http://arxiv.org/abs/cs.LG/0301014&quot;&gt;cs.LG/0301014&lt;/a&gt;
		&lt;li&gt;&quot;General Loss Bounds for Universal Sequence Prediction,&quot;
&lt;a href=&quot;http://arxiv.org/abs/cs.AI/0101019&quot;&gt;cs.AI/0101019&lt;/a&gt;
		&lt;li&gt;&quot;Optimality of Universal Bayesian Sequence Prediction for
General Loss and Alphabet&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cs.LG/0311014&quot;&gt;cs.LG/0311014&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;Yuri Kalnishkan, Vladimir Vovk and Michael V. Vyugin, &quot;How many
strings are easy to predict?&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1016/j.ic.2005.04.001&quot;&gt;&lt;cite&gt;Information and
Computation&lt;/citE&gt; &lt;strong&gt;201&lt;/strong&gt; (2005): 55--71&lt;/a&gt; [&quot;It is well known
in the theory of Kolmogorov complexity that most strings cannot be compressed;
more precisely, only exponentially few (O(2^n-m)) binary strings of length n
can be compressed by m bits. This paper extends the 'incompressibility'
property of Kolmogorov complexity to the 'unpredictability' property of
predictive complexity. The 'unpredictability' property states that predictive
complexity (defined as the loss suffered by a universal prediction algorithm
working infinitely long) of most strings is close to a trivial upper bound (the
loss suffered by a trivial minimax constant prediction strategy). We show that
only exponentially few strings can be successfully predicted and find the base
of the exponent.&quot;]
	&lt;li&gt;Gusztav Morvai, &quot;Guessing the output of a stationary binary time series&quot;, &lt;a href=&quot;http://arxiv.org/abs/0710.3760&quot;&gt;arxiv:0710.3760&lt;/a&gt;
	&lt;li&gt;Gusztav Morvai, Sanjeev R. Kulkarni, Andrew B. Nobel, &quot;Regression estimation from an individual stable sequence&quot;, &lt;cite&gt;Statistics&lt;/cite&gt;
&lt;strong&gt;33&lt;/strong&gt; (1999): 99--118 = &lt;a href=&quot;http://arxiv.org/abs/0710.2496&quot;&gt;arxiv:0710.2496&lt;/a&gt;
	&lt;li&gt;Guszt&amp;aacute;v Morvai and Benjamin Weiss
		&lt;ul&gt;
		&lt;Li&gt;&quot;Forecasting for Stationary Binary Times Series&quot;, &lt;cite&gt;Atca Appl. Math.&lt;/cite&gt; &lt;strong&gt;79&lt;/strong&gt; (2003): 25--34 = &lt;a href=&quot;http://arxiv.org/abs/0710.5144&quot;&gt;arxiv:0710.5144&lt;/a&gt;
		&lt;li&gt;&quot;Forward Estimation for Ergodic Time Series&quot;,
&lt;cite&gt;ann. Inst. H. Poincare Prob. Statist.&lt;/cite&gt; &lt;strong&gt;41&lt;/strong&gt;
(2005): 859--870 &lt;a href=&quot;http://arxiv.org/abs/0711.3856&quot;&gt;arxiv:0711.3856&lt;/a&gt;
		&lt;li&gt;&quot;Inferring the conditional mean&quot;, &lt;cite&gt;Theory of
Stochastic Processes&lt;/cite&gt; &lt;strong&gt;11&lt;/strong&gt; (2005): 112--120
= &lt;a href=&quot;http://arxiv.org/abs/0710.3757&quot;&gt;arxiv:0710.3757&lt;/a&gt;
		&lt;li&gt;&quot;Intermittent estimation of stationary
time series&quot;, &lt;cite&gt;Test&lt;/cite&gt; &lt;strong&gt;13&lt;/strong&gt; (2004): 525--542
= &lt;a href=&quot;http://arxiv.org/abs/0711.0350&quot;&gt;arxiv:0711.0350&lt;/a&gt;
		&lt;li&gt;&quot;Limitations on intermittent forecasting&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1016/j.spl.2004.12.016&quot;&gt;&lt;cite&gt;Statistics and
Probability Letters&lt;/cite&gt; &lt;strong&gt;72&lt;/strong&gt; (2005): 285--290&lt;/a&gt; = &lt;a href=&quot;http://arxiv.org/abs/0710.3773&quot;&gt;arxiv:0710.3773&lt;/a&gt;
		&lt;li&gt;&quot;On Classifying Processes&quot;, &lt;cite&gt;Bernoulli&lt;/cite&gt;
&lt;strong&gt;11&lt;/strong&gt; (2005): 523--532 = &lt;a href=&quot;http://arxiv.org/abs/0710.3775&quot;&gt;arxiv:0710.3775&lt;/a&gt;
		&lt;li&gt;&quot;On sequential estimation and prediction for 
discrete time series&quot;, &lt;citE&gt;Stochastics and Dynamics&lt;/cite&gt; &lt;strong&gt;7&lt;/strong&gt; (2007): 417--437&lt;/a&gt; = &lt;a href=&quot;http://arxiv.org/abs/0803.4332&quot;&gt;arxiv:0803.4332&lt;/a&gt;
		&lt;li&gt;&quot;Prediction for discrete time series&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1007/s00440-004-0386-3&quot;&gt;&lt;cite&gt;Probability Theory and
Related Fields&lt;/cite&gt; &lt;strong&gt;132&lt;/strong&gt; (2005): 1--12&lt;/a&gt; = &lt;a href=&quot;http://arxiv.org/abs/0711.0471&quot;&gt;arxiv:0711.0471&lt;/a&gt;
		&lt;li&gt;&quot;On universal estimates for binary renewal processes&quot;,
&lt;cite&gt;Annals of Applied Probability&lt;/cite&gt; &lt;strong&gt;18&lt;/strong&gt; (2008):
1970--1992 = &lt;a href=&quot;http://arxiv.org/abs/0811.2076&quot;&gt;arxiv:0811.2076&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;G. Morvai, S. Yakowitz, L. Gyorfi, &quot;Nonparametric inference for ergodic, stationary time series&quot;, &lt;cite&gt;Annals of Statistics&lt;/cite&gt;
&lt;strong&gt;24&lt;/strong&gt; (1996): 370--379
= &lt;a href=&quot;http://arxiv.org/abs/0711.0367&quot;&gt;arxiv:0711.0367&lt;/a&gt;
	&lt;li&gt;Andrew B. Nobel, Gusztav Morvai, Sanjeev R. Kulkarni, &quot;Density
estimation from an individual numerical sequence&quot;, &lt;cite&gt;IEEE Transactions on
Information Theory&lt;/cite&gt; &lt;strong&gt;44&lt;/strong&gt; (1998): 537--541
= &lt;a href=&quot;http://arxiv.org/abs/0710.2500&quot;&gt;arxiv:0710.2500&lt;/a&gt;
	&lt;li&gt;Boris Ryabko and Jaakko Astola
		&lt;ul&gt;
		&lt;li&gt;&quot;Prediction of Large Alphabet Processes and Its Application
to Adaptive Source
Coding&quot;, &lt;a href=&quot;http://arxiv.org/abs/cs.IT/0504079&quot;&gt;cs.IT/0504079&lt;/a&gt;
		&lt;li&gt;&quot;Universal Codes as a Basis for Time Series
Testing&quot;, &lt;a href=&quot;http://arxiv.org/abs/cs.IT/0602084&quot;&gt;cs.IT/0602084&lt;/a&gt;
		&lt;li&gt;&quot;Universal Codes as a Basis for Nonparametric Testing of
Serial Independence for Time Series&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cs.IT/0506094&quot;&gt;cs.IT/0506094&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;Daniil Ryabko, &quot;On Finding Predictors for Arbitrary Families of Processes&quot;, &lt;a href=&quot;http://arxiv.org/abs/0912.4883&quot;&gt;arxiv:0912.4883&lt;/a&gt;
	&lt;li&gt;Daniil Ryabko and Marcus Hutter, &quot;On Sequence Prediction for
Arbitrary Measures&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cs.Lg/0606077&quot;&gt;cs.LG/0606077&lt;/a&gt;
	&lt;li&gt;T. Weissman, &quot;How to Filter an `Individual Sequence with Feedback'&quot;,
&lt;a href=&quot;http://dx.doi.org/10.1109/TIT.2008.926457&quot;&gt;&lt;cite&gt;IEEE Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;54&lt;/strong&gt; (2008): 3831--3841&lt;/a&gt;
	&lt;li&gt;S. Yakowitz, L. Gyorfi, J. Kieffer, G. Morvai, &quot;Strongly consistent
nonparametric forecasting and regression for stationary ergodic sequences&quot;,
&lt;cite&gt;J. Multivariate Analysis&lt;/cite&gt; &lt;strong&gt;71&lt;/strong&gt; (1999): 24--41
= &lt;a href=&quot;http://arxiv.org/abs/0712.2592&quot;&gt;arxiv:0712.2592&lt;/a&gt;
	&lt;li&gt;Jacob Ziv, &quot;A Universal Prediction Lemma and Applications to
Universal Data Compression and Prediction&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1109/18.923732&quot;&gt;&lt;cite&gt;IEEE Transactions on
Information Theory&lt;/cite&gt; &lt;strong&gt;47&lt;/strong&gt; (2001): 1528--1532&lt;/a&gt;
	&lt;/ul&gt;
</description>
  </item>
  </channel>
</rss>