Ensemble Methods in Machine Learning
27 Dec 2009 21:53
Boosting, bagging, binning, stacking, mixtures of experts, ...
Value of diversity.
See also: Collective Cognition; Learning Theory; Model Selection
- Recommended (totally inadequate, what happened to come to mind cleaning
up my files):
- Sanjeev Arora, Elad Hazan and Satyen Kale, "The Multiplicative Weights Update Method: a Meta Algorithm and Applications " [PDF preprint. This is an interesting kind of result, which promises performance which comes to close that achieved by any strategy within a fixed class, no matter what sequence of data is observed --- but it's performance on that sequence, which, as the saying goes, "is no guarantee of future results". Cesa-Bianchi and Lugosi's book has a lot more along these lines.]
- Nicolo Cesa-Bianchi and Gabor Lugosi, Prediction, Learning, and Games [Mini-review]
- Gerda Claeskens and Nils Lid Hjort, Model Selection and Model Averaging
- Pedro Domingos, "The Role of Occam's Razor in Knowledge Discovery," Data Mining and Knowledge Discovery, 3 (1999) [Online. Ensemble methods as an apparent violation of Occam's Razor.]
- A. Juditsky, P. Rigollet, A. B. Tsybakov, "Learning by mirror averaging", arxiv:math/0511468 = Annals of Statistics 36 (2008): 2183--2206
- G. Langer and U. Parlitz, "Modeling parameter dependence from time series", Physical Review E 70 (2004): 056217 [Interesting use of ensemble methods in state space modeling]
- Laurence K. Saul and Michael I. Jordan, "Mixed Memory Markov Models: Decomposing Complex Stochastic Processes as Mixtures of Simpler Ones", Machine Learning 37 (1999): 75--87
- Robert E. Schapire, Yoav Freund, Peter Bartlett and Wee Sun Lee, "Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods", Annals of Statistics 26 (1998): 1651--1686
- To read:
- Ran Avnimelech and Nathan Intrator, "Boosted Mixture of Experts: An Ensemble Learning Scheme", Neural Computation 11 (1999): 483--497
- Larry M. Bartels, "Specification Uncertainty and Model Averaging", American Journal of Political Science 41 (1997): 641--674
- Gérard Biau, Luc Devroye and Gábor Lugosi, "Consistency of Random Forests and Other Averaging Classifiers", Journal of Machine Learning Research 9 (2008): 2015--2033 ["In the last years of his life, Leo Breiman promoted random forests for use in classification. He suggested using averaging as a means of obtaining good discrimination rules. The base classifiers used for averaging are simple and randomized, often based on random samples from the data. He left a few questions unanswered regarding the consistency of such rules. In this paper, we give a number of theorems that establish the universal consistency of averaging rules. We also show that some popular classifiers, including one suggested by Breiman, are not universally consistent."]
- Gavin Brown, Jeremy L. Wyatt and Pter Tino, "Managing Diversity in Regression Ensembles", Journal of Machine Learning Research 6 (2005): 1621--1650
- Bruno Caprile, Cesare Furlanello and Stefano Merler, "The Dynamics of AdaBoost Weights Tells You What's Hard to Classify," cs.LG/0201014
- Zhuo Chen and Yuhong Yan, "Time Series Models for Forecasting: Testing or Combining?", Studies in Nonlinear Dynamics and Econometrics 11:1 (2007): 3
- M. Di Marzio and C. C. Taylor, "Kernel density classification and boosting: an L2 analysis", Statistics and Computing 15 (2005): 113--123
- Yoav Freund, Yishay Mansour and Robert E. Schapire, "Generalization bounds for averaged classifiers", Annals of Statistics 32 (2004): 1698--1722 = math.ST/0410092
- Yoav Freund, Robert E. Schapire, Yoram Singer and Manfred K. Warmuth, "Using and combining predictors that specialize" [PDF preprint]
- Jerome H. Friedman, Bogdan E. Popescu, "Predictive learning via rule ensembles", arxiv:0811.1679
- G. Fumera and F. Roli, "A Theoretical and Experimental Analysis of Linear Combiners for Multiple Classifier Systems", IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (2005): 942--956
- Nicolas Garcia-Pedrajas, Cesar Garcia-Osorio and Colin Fyfe, "Nonlinear Boosting Projections for Ensemble Construction", Journal of Machine Learning Research 8 (2007): 1--33
- Alexander Goldenshluger, "A universal procedure for aggregating estimators", arxiv:0704.2500 = Annals of Statistics 37 (2009): 542--568
- Etienne Grossmann, "A Theory of Probabilistic Boosting, Decision Trees and Matryoshki", cs.LG/0607110
- Jakob Vogdrup Hansen, Combining Predictors: Meta Machine Learning Methods and Bias/Variance & Ambiguity Decompositions [Ph.D. thesis, University of Aarhus, 2000; on-line]
- Geoffrey E. Hinton, "Training Products of Experts by Minimizing Contrastive Divergence," Neural Computation 14 (2002): 1771--1800.
- Marcus Hutter and Jan Poland, "Adaptive Online Prediction by Following the Perturbed Leader", cs.AI/0504078 = Journal of Machine Learning Research 6 (2005): 639--660
- Robert A. Jacobs, "Bias/Variance Analyses of Mixtures-of-Experts Architectures", Neural Computation 9 (1997): 369--383 ["This article investigates the bias and variance of mixtures-of-experts (ME) architectures. The variance of an ME architecture can be expressed as the sum of two terms: the first term is related to the variances of the expert networks that comprise the architecture and the second term is related to the expert networks' covariances. One goal of this article is to study and quantify a number of properties of ME architectures via the metrics of bias and variance. A second goal is to clarify the relationships between this class of systems and other systems that have recently been proposed. It is shown that in contrast to systems that produce unbiased experts whose estimation errors are uncorrelated, ME architectures produce biased experts whose estimates are negatively correlated."]
- Wenxin Jiang, "Boosting with Noisy Data: Some Views from Statistical Theory", Neural Computation 16 (2004): 789--810
- Ludmila I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms
- Nicole Kraemer, "Boosting for Functional Data", math.ST/0605751
- Guillaume Lecu&eaucte;, "Lower Bounds and Aggregation in Density Estimation", Journal of Machine Learning Research 7 (2006): 971--981
- David Mease, Abraham J. Wyner and Andreas Buja, "Boosted Classification Trees and Class Probability/Quantile Estimation", Journal of Machine Learning Research 8 (2007): 409--439
- Nicolai Meinshausen, "Forest Garrote", arxiv:0906.3590
- David J. Miller and Siddharth Pal, "Transductive Methods for the Distributed Ensemble Classification Problem", Neural Computation 19 (2007): 856--884
- Seiji Miyoshi, Kazuyuki Hara, and Masato Okada, "Analysis of ensemble learning using simple perceptrons based on online learning theory", Physical Review E 71 (2005): 036116
- L. Nunes and E. Oliveira, "On Learning by Exchanging Advice," cs.LG/0203010
- Frenando C. Pereira and Yoram Singer, "An Efficient Extension to Mixture Techniques for Prediction and Decision Trees", Machine Learning 36 (1999): 183--199
- Evgueni Petrov, "Constraint-based analysis of composite solvers," cs.AI/0302036
- Philippe Rigollet, "Maximum likelihood aggregation and misspecified generalized linear models", arxiv:0911.2919
- Yoram Singer, "Adaptive Mixtures of Probabilistic Transducers", Neural Computation 9 (1997): 1711--1733 [PS.gz preprint]
- Eiji Takimoto and Akira Maruoka, "Top-down decision tree learning as information based boosting," Theoretical Computer Science 292 (2002): 447-464
- Héla Zouari, Laurent Heutte and Yves Lecourtier, "Controlling the diversity in classifier ensembles through a measure of agreement", Pattern Recognition 38 (2005): 2195--2199
