Statistical Learning Theory with Dependent Data
26 Aug 2009 13:20
See learning theory if that title doesn't make sense. I am particularly interested in learning for ergodic time series. A lot of the work in this area involves strong mixing conditions, especially beta-mixing. I suspect that in many cases strong mixing is not actually needed, but this is a pure hunch on my part with absolutely no evidence to back it up (right now).
See also: Empirical Process Theory
- Recommended:
- Andrew Nobel and Amir Dembo, "A Note on Uniform Laws of Averages for Dependent Processes", Statistics and Probability Letters 17 (1993): 169--172 [An extremely easy way to extend uniform laws of large numbers to uniform ergodic theorems for mixing processes. Actually I suspect that mixing is only necessary to get an explicit rate; I should re-read it. PDF preprint via Dr. Nobel.]
- Ron Meir, "Nonparametric Time Series Prediction Through Adaptive Model Selection," Machine Learning 39 (2000): 5--34 [PDF. Extending the "structural risk minimization" framework due to Vapnik to time series. Unfortunately Meir's approach demands knowledge of the mixing rate of the process, which we don't really know how to estimate, but this is a very encouraging first step.]
- Mehryar Mohri and Afshin Rostamizadeh, "Stability Bound for Stationary Phi-mixing and Beta-mixing Processes", arxiv:0811.1629
- Daniil Ryabko, "Pattern Recognition for Conditionally Indpendent Data", cs.LG/0507040 [It's a bit of an odd form of dependence: the sequence of labels can show essentially any form of dependence you like, but the features of the nth object have to be independent of everything else, given the label of the nth object. (Hence "conditionally independent data".) This is like some kinds of hidden Markov model...]
- Mathukumalli Vidyasagar, A Theory of Learning and Generalization: With Applications to Neural Networks and Control Systems [Has a very nice discussion of when the uniform laws of large numbers of statistical learning theory transfer from the usual IID setting to dependent processes, becoming uniform ergodic theorems. (Sufficient conditions include things like beta-mixing, but necessary and sufficient conditions seem to still be unknown.) Mini-review]
- To read:
- Herold Dehling (ed.), Empirical Process Techniques for Dependent Data
- Mehryar Mohri, Afshin Rostamizadeh, "Rademacher Complexity Bounds for Non-I.I.D. Processes" [Preprint. Mostly for stationary beta-mixing processes, to judge from the abstract.]
- Daniil Ryabko, "Characterizing predictable classes of processes", arxiv:0905.4341
- Bin Zou, Luoqing Li and Zongben Xu, "The generalization performance of ERM algorithm with strongly mixing observations", Machine Learning 75 (2009): 275--295
