<?xml version="1.0"?>
<!-- name="generator" content="blosxom/2.0" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
  <channel>
    <title>Notebooks   </title>
    <link>http://bactra.org/notebooks</link>
    <description>Cosma's Notebooks</description>
    <language>en</language>

  <item>
    <title>Model Selection</title>
    <link>http://bactra.org/notebooks/2009/11/19#model-selection</link>
    <description>
&lt;P&gt;(Reader, please make your own suitably awful pun about the different senses
of &quot;model selection&quot; here, as a discouragement to those finding this page
through prurient searching.  Thank you.)

&lt;P&gt;In &lt;a href=&quot;statistics.html&quot;&gt;statistics&lt;/a&gt;
and &lt;a href=&quot;learning-inference-induction.html&quot;&gt;machine learning&lt;/a&gt;, &quot;model
selection&quot; is the problem of picking among different mathematical models which
all purport to describe the same data set.  This notebook will not (for now)
give advice on it; as usual, it's more of a place to organize my thoughts and
references...

&lt;P&gt;Classification of approaches to model selection (probably not really
exhaustive but I can't think of others, right now):
&lt;dl&gt;
&lt;dt&gt;Direct optimization of some measure of goodness of fit or risk on training
data.&lt;/dt&gt;
&lt;dd&gt;Seems implicit in a lot of work which points to marginal improvements in
&quot;the proportion of variance explained&quot;, mis-classification rates, &quot;perplexity&quot;,
etc.  Often, also, a recipe for over-fitting and chasing snarks.  What's wanted
is (almost always) some way of measuring the ability to generalize to new data,
and in-sample performance is a biased estimate of this.  Still,
with &lt;em&gt;enough&lt;/em&gt; data, if the gods
of &lt;a href=&quot;ergodic-theory.html&quot;&gt;ergodicity&lt;/a&gt; are kind, in-sample performance
is representative of generalization performance, so perhaps this will work
asymptotically, though in many cases the researcher will never even glimpse
Asymptopia across the Jordan.&lt;/dd&gt;

&lt;dt&gt;Optimize fit with model-dependent penalty&lt;/dt&gt;
&lt;dd&gt;Add on a term to each model which supposed indicates its ability to
over-fit.  (Adjusted R^2, AIC, BIC, ..., all do this in terms of the number of
parameters.)  Sounds reasonable, but I wonder how many actually work better, in
practice, than direct optimization.  (See Domingos for some depressing evidence
on this score.)&lt;/dd&gt;
&lt;dd&gt;Classical two-part &lt;a href=&quot;mdl.html&quot;&gt;minimum description length&lt;/a&gt;
methods were penalties; I don't yet understand one-part MDL.&lt;/dd&gt;

&lt;dt&gt;Penalties which depend on the model &lt;em&gt;class&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;Measure the capacity of a class of models to over-fit;
penalize &lt;em&gt;all&lt;/em&gt; models in that class accordingly, regardless of their
individual properties.  Outstanding example: Vapnik's &quot;structural risk
minimization&quot; (provably consistent under some circumstances).  Only
sporadically coincides with *IC-type penalties based on the number of
parameters.&lt;/dd&gt;

&lt;dt&gt;Cross-validation&lt;/dt&gt;
&lt;dd&gt;Estimate the ability to generalize to different data by, in fact, using
different data. Maybe the &quot;industry standard&quot; of machine learning.  Query, how
are we to know how much different data to use?&lt;/dd&gt;

&lt;dd&gt;Query, how are we to cross-validate when we have complex, relational data?
That is, I understand how to do it for independent samples, and I even
understand how to do it for &lt;a href=&quot;time-series.html&quot;&gt;time series&lt;/a&gt;, but I
do not understand how to do it
for &lt;a href=&quot;network-data-analysis.html&quot;&gt;networks&lt;/a&gt;, and I don't think I am
alone in this.  (Well, I understand how to do it for Erdos-Renyi networks,
because that's back to independent samples...)&lt;/dd&gt;

&lt;dt&gt;The method of sieves&lt;/dt&gt;
&lt;dd&gt;Directly optimize the fit, but within a constrained
class of models; relax the constraint as the amount of data grows.  If the
constraint is relaxed slowly enough, should converge on the truth.  (Ordinary
parametric inference, within a single model class, is a limiting case where the
constraint is relaxed infinitely slowly, and we converge on the pseudo-truth
within that class [provided we have a consistent estimator].)&lt;/dd&gt;

&lt;dt&gt;Encompassing models&lt;/dt&gt;
&lt;dd&gt;The sampling distribution of any estimator of any model class is a function
of the true distribution.  If the true model clss has been well-estimated, it
should be able to predict what other, &lt;em&gt;wrong&lt;/em&gt; model classes will
estimate, but not vice versa.  In this sense the true model class &quot;encompasses
the predictions&quot; of the wrong ones.  (&quot;Truth is the criterion both of itself
and of error.&quot;)&lt;/dd&gt;

&lt;dt&gt;General or covering models&lt;/dt&gt;
&lt;dd&gt;Come up with a single model class which includes all the interesting model
classes as special cases; do ordinary estimation within it.  Getting a
consistent estimator of the additional parameters this introduces is often
non-trivial, and interpretability can be a problem.&lt;/dd&gt;

&lt;dt&gt;Model averaging&lt;/dt&gt;
&lt;dd&gt;Don't try to pick the best or correct model; use them all with different
weights.  Chose the weighting scheme so that if one is best, it will tend to be
more and more influential.  Often I think the improvement is not so much from
using multiple models as from smoothing, since estimates of
&lt;em&gt;the single best model&lt;/em&gt; are going to be more noisy than estimates
of &lt;em&gt;a bunch of models which are all pretty good&lt;/em&gt;.  (This leads
to &lt;a href=&quot;ensemble-ml.html&quot;&gt;ensemble methods&lt;/a&gt;.)

&lt;dt&gt;Adequacy testing&lt;/dt&gt;
&lt;dd&gt;The correct model should be able to encode the data as uniform IID noise.
Test whether &quot;residuals&quot;, in the appropriate sense, are IID uniform.  Reject
models which can't hack it.  Possibly none of the models on offer is adequate;
this, too, is informative.  Or: models make specific probabilistic assumptions
(IID Gaussian noise, for example); test those.  Mis-specification testing.&lt;/dd&gt;
&lt;/dl&gt;

&lt;P&gt;The machine-learning-ish literature on model selection doesn't seem to ever
talk about setting up experiments to select among models; or do I just not read
the right papers there?  (The statistical literature on experimental design
tends to talk about &quot;model discrimination&quot; rather than &quot;model selection&quot;.)

&lt;ul&gt;Recommended, big-picture:
	&lt;li&gt;Leo Breiman, &quot;Heuristics of Instability and Stabilization in Model
Selection,&quot; &lt;a href=&quot;http://dx.doi.org/10.1214/aos/1032181158&quot;&gt;&lt;cite&gt;Annals of Statistics&lt;/cite&gt; &lt;strong&gt;24&lt;/strong&gt; (1996):
2350--2383&lt;/a&gt;
	&lt;li&gt;Gerda Claeskens and Nils Lid Hjort, &lt;cite&gt;Model Selection
and Model Averaging&lt;/cite&gt;
	&lt;li&gt;&lt;a href=&quot;http://www.cs.washington.edu/homes/pedrod/&quot;&gt;Pedro
Domingos&lt;/a&gt;, &quot;The Role of Occam's Razor in Knowledge Discovery,&quot; &lt;cite&gt;Data
Mining and Knowledge Discovery,&lt;/cite&gt; &lt;strong&gt;3&lt;/strong&gt; (1999) [&lt;a
href=&quot;http://www.cs.washington.edu/homes/pedrod/dmkd99.ps.gz&quot;&gt;Online&lt;/a&gt;]
	&lt;li&gt;Trever Hastie, Robert Tibshirani and Jerome Friedman, &lt;cite&gt;The
Elements of Statistical Learning: Data Mining, Inference, and Prediction&lt;/cite&gt;
	&lt;li&gt;C. R. Rao, Y. Wu, Sadanori Konishi and Rahul Mukerjee, &quot;On Model
Selection&quot;, in P. Lahiri (ed.), &lt;cite&gt;Model Selection&lt;/cite&gt;, pp. 1--64
[Thorough review paper, if from a rather old-school statistical-theory
perspective.  The rest of the volume is too Bayesian to be of interest to
me.  &lt;a href=&quot;http://www.jstor.org/stable/4356163&quot;&gt;JSTOR&lt;/a&gt;]
	&lt;li&gt;Aris Spanos, &quot;Curve-Fitting, the Reliability of Inductive
Inference and the Error-Statistical Approach&quot; [&lt;a
href=&quot;http://www.econ.vt.edu/Faculty/CVs_&amp;_Research/Aris%20Spanos%20-%20Working%20Papers/spanoscurve-fitting.pdf&quot;&gt;PDF
preprint&lt;/a&gt;]
	&lt;li&gt;V. N. (=Vladimir Naumovich) Vapnik, &lt;cite&gt;The Nature of
Statistical Learning Theory&lt;/cite&gt; [&lt;a href=&quot;../reviews/vapnik-nature/&quot;&gt;Review:
A Useful Biased Estimator&lt;/a&gt;]
	&lt;li&gt;Quang H. Vuong, &quot;Likelihood Ratio Tests for Model Selection and
Non-Nested Hypotheses&quot;, &lt;cite&gt;Econometrica&lt;/cite&gt; &lt;strong&gt;57&lt;/strong&gt; (1989):
307--333
	&lt;/ul&gt;

&lt;ul&gt;Recommended, close-ups:
	&lt;li&gt;Sylvain Arlot, &quot;V-fold cross-validation improved: V-fold
penalization&quot;,
&lt;a href=&quot;http://arxiv.org/abs/0802.0566&quot;&gt;arxiv:0802.0566&lt;/a&gt; [Seeing
cross-validation as a penalization method, and improving it accordingly by
strengthening the penalty term]
	&lt;li&gt;A. C. Atkinson and A. N. Donev, &lt;cite&gt;Optimum Experimental
Design&lt;/cite&gt; [&lt;a href=&quot;../reviews/atkinson-donev/&quot;&gt;Review&lt;/a&gt;]
	&lt;li&gt;Leo Breiman and Philip Spector, &quot;Submodel Selection and Evaluation
in Regression: The X-Random Case&quot;, &lt;cite&gt;International
Statistical Review&lt;/cite&gt; &lt;strong&gt;60&lt;/strong&gt; (1992): 291--319
[&lt;a href=&quot; http://www.jstor.org/stable/1403680&quot;&gt;JSTOR&lt;/a&gt;]
	&lt;li&gt;Prabir Burman, Edmond Chow and Deborah Nolan, &quot;A cross-validatory method for dependent data&quot;, &lt;a href=&quot;http://dx.doi.org/10.1093/biomet/81.2.351&quot;&gt;&lt;cite&gt;Biometrika&lt;/cite&gt; &lt;strong&gt;81&lt;/strong&gt; (1994): 351--358&lt;/a&gt; [&lt;a href=&quot;http://www.jstor.org/stable/2336965&quot;&gt;JSTOR&lt;/a&gt;]
	&lt;li&gt;Patrick S. Carmack, William R. Schucany, Jeffrey S. Spence, Richard
F. Gunst, Qihua Lin and Robert W. Haley, &quot;Far Casting Cross Validation&quot;
[Leave-one-out CV, with a constant-radius window skipped around each hold-out
point as well; this is designed to deal with correlations in time or in
space.  &lt;a href=&quot;http://smu.edu/statistics/TechReports/TR352.pdf&quot;&gt;PDF
preprint&lt;/a&gt;]
	&lt;li&gt;Nicolo Cesa-Bianchi and Gabor Lugosi, &lt;citE&gt;Prediction, Learning,
and Games&lt;/cite&gt;
[&lt;a href=&quot;../weblog/algae-2008-07.html#prediction&quot;&gt;Mini-review&lt;/a&gt;.  For
avoiding model selection in favor of adaptively-weighted combinations of
models.]
	&lt;li&gt;Snigdhansu Chatterjee, Nitai D. Mukhopadhyay, &quot;Risk and resampling
under model
uncertainty&quot;, &lt;a href=&quot;http://arxiv.org/abs/0805.3244&quot;&gt;arxiv:0805.3244&lt;/a&gt; [an
interesting approach to model averaging with provably good frequentist
properties, via bootstrapping --- for a trivial linear-Gaussian problem; not
clear to me how to generalize]
	&lt;li&gt;Bruce E. Hansen, &quot;Challenges for Econometric Model
Selection&quot;, &lt;cite&gt;Econometric Theory&lt;/cite&gt; &lt;strong&gt;21&lt;/strong&gt; (2005): 60--68
[&quot;Standard econometric model selection methods are based on four fundamental
errors in approach: parametric vision, the assumption of a true
[data-generating process], evaluation based on fit, and ignoring the impact of
model uncertainty on inference. Instead, econometric model selection methods
should be based on a semiparametric vision, models should be viewed as
approximations, models should be evaluated based on their purpose, and model
uncertainty should be incorporated into inference
methods.&quot;  &lt;a href=&quot;http://www.ssc.wisc.edu/~bhansen/papers/et_05.html&quot;&gt;PDF&lt;/a&gt;]
	&lt;li&gt;Marcus Hutter, &quot;The Loss Rank Principle for Model Selection&quot;,
&lt;a href=&quot;http://arxiv.org/abs/math.ST/0702804&quot;&gt;math.ST/0702804&lt;/a&gt; [This is a
simplified form of &lt;a href=&quot;../reviews/mayo-error/&quot;&gt;Deborah Mayo's
&quot;severity&quot;&lt;/a&gt;.]
	&lt;li&gt;Pascal Lavergne and Quang H. Vuong, &quot;Nonparametric Selection of
Regressors: The Nonnested Case&quot;, &lt;cite&gt;Econometrica&lt;/cite&gt; &lt;strong&gt;64&lt;/strong&gt;
(1996): 207--219 [Picking which variables belong in a regression, by looking at
the error of non-parametric kernel
regressions.  &lt;a href=&quot;http://www.jstor.org/stable/2171929&quot;&gt;JSTOR&lt;/a&gt;]
	&lt;li&gt;Charles Mitchell and Sara van de Geer, &quot;General Oracle Inequalities
for Model
Selection&quot;, &lt;a href=&quot;http://dx.doi.org/10.1214/08-EJS254&quot;&gt;&lt;cite&gt;Electronic
Journal of Statistics&lt;/cite&gt; &lt;strong&gt;3&lt;/strong&gt; (2009): 176--204&lt;/a&gt; [Analyzes
a data-set splitting scheme (like cross-validation with only one &quot;fold&quot;)]
	&lt;li&gt;&lt;a href=&quot;http://www.mcmaster.ca/economics/racine/&quot;&gt;Jeffrey S. Racine&lt;/a&gt;
		&lt;ul&gt;
		&lt;li&gt;&quot;Feasible Cross-Validatory Model Selection for General Stationary Processes&quot;, &lt;cite&gt;Journal of Applied Econometrics&lt;/cite&gt;
&lt;strong&gt;12&lt;/strong&gt; (1997): 169--179
[&lt;a href=&quot;http://www.jstor.org/stable/2284910&quot;&gt;JSTOR&lt;/a&gt;.  This is closely
related to (maybe algebraically just a special case of?) the familiar trick
from splines of writing the CV criterion in terms of the
hat/influence/projection matrix.]
		&lt;li&gt;&quot;Consistent cross-validatory model-selection for dependent
data: hv-block
cross-validation&quot;, &lt;a href=&quot;http://dx.doi.org/10.1016/S0304-4076(00)00030-0&quot;&gt;&lt;cite&gt;Journal
of Econometrics&lt;/cite&gt; &lt;strong&gt;99&lt;/strong&gt; (2000): 39--61&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;David J. Spiegelhalter, Nicola G. Best, Bradley P. Carlin and
Angelika van der Linde, &quot;Bayesian Measures of Model Complexity and
Fit&quot;, &lt;cite&gt;Journal of the Royal Statistical Society
B&lt;/cite&gt; &lt;strong&gt;64&lt;/strong&gt; (2002): 583--639
[&lt;a href=&quot;http://www.soe.ucsc.edu/~draper/DIC.pdf&quot;&gt;PDF reprint&lt;/a&gt;]
	&lt;li&gt;Ryan J. Tibshirani and Robert Tibshirani, &quot;A bias correction for
the minimum error rate in
cross-validation&quot;, &lt;a href=&quot;http://dx.doi.org/10%2E1214/08-AOAS224&quot;&gt;&lt;cite&gt;Annals
of Applied Statistics&lt;/citE&gt; &lt;strong&gt;3&lt;/strong&gt; (2009): 822--829&lt;/a&gt;
= &lt;a href=&quot;http://arxiv.org/abs/0908.2904&quot;&gt;arxiv:0908.2904&lt;/a&gt;
	&lt;li&gt;Sara van de Geer, &lt;cite&gt;Empirical Process Theory in
&lt;/cite&gt;M&lt;cite&gt;-Estimation&lt;/cite&gt;
	&lt;li&gt;Mark J. van der Laan and Sandrine Dudoit, &quot;Unified Cross-Validation
Methodology for Selection Among Estimators and a General Cross-Validated
Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities and Examples&quot;
[&lt;a href=&quot;http://www.bepress.com/ucbbiostat/paper130&quot;&gt;PDF working paper&lt;/a&gt;,
i.e., a 100-page tome.  The first part proves that multi-fold cross-validation
and the like will work for selecting the best estimator out of a finite set of
estimators (provided the loss function is nicely bounded and the data are IID).
The second part ingeniously turns this into a complete estimation procedure, by
effectively creating a discrete sieve and then using CV to say which part of
the sieve to use.  This is a very cool set of results, but (1) the limitations
to bounded loss functions make me nervous, and (2) the formulas appearing in
the finite-sample and even asymptotic bounds are &lt;em&gt;ugly&lt;/em&gt;.  On the other
hand, they &lt;em&gt;have&lt;/em&gt; finite-sample bounds! &amp;mdash; I wonder if the
bounded-and-IID restrictions could be lifted using the techniques in Jiang's
&quot;On Uniform Deviation Bounds&quot; (link and description
under &lt;a href=&quot;learning-theory.html&quot;&gt;Learning Theory&lt;/a&gt;), or those
in &lt;a href=&quot;../weblog/algae-2009-04.html#weak&quot;&gt;Dedecker et al.'s &lt;cite&gt;Weak
Dependence&lt;/cite&gt;&lt;/a&gt;.]
	&lt;li&gt;Aad W. van der Vaart, Sandrine Dudoit and Mark J. van der Laan,
&quot;Oracle inequalities for multi-fold cross
validation&quot;, &lt;a href=&quot;http://dx.doi.org/10.1524/stnd.2006/24.3.351&quot;&gt;&lt;cite&gt;Statistics
and Decisions&lt;/cite&gt; &lt;strong&gt;24&lt;/strong&gt; (2006): 351--371&lt;/a&gt; [Thanks to Prof.
van der Vaart for a reprint]
	&lt;/ul&gt;

&lt;ul&gt;To read:
	&lt;li&gt;Sylvain Arlot
		&lt;ul&gt;
		&lt;li&gt;&quot;Model selection by resampling penalization&quot;,
&lt;a href=&quot;http://arxiv.org/abs/0906.3124&quot;&gt;arxiv:0906.3124&lt;/a&gt; =
&lt;cite&gt;Electronic Journal of Statistics&lt;/cite&gt; &lt;strong&gt;3&lt;/strong&gt; (2009):
557--624
		&lt;li&gt;&quot;Suboptimality of penalties proportional to the dimension
for model selection in heteroscedastic regression&quot;, &lt;a href=&quot;http://arxiv.org/abs/0812.3141&quot;&gt;arxiv:0812.3141&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;Sylvain Arlot and Pascal Massart, &quot;Data-driven Calibration of
Penalties for Least-Squares
Regression&quot;, &lt;a href=&quot;http://jmlr.csail.mit.edu/papers/v10/arlot09a.html&quot;&gt;&lt;cite&gt;Journal
of Machine Learning Research&lt;/cite&gt; &lt;strong&gt;10&lt;/strong&gt; (2009): 245--279&lt;/a&gt;
	&lt;li&gt;Maria Maddalena Barbieri and James O. Berger, &quot;Optimal Predictive
Model Selection&quot;, &lt;a
href=&quot;http://arxiv.org/abs/math.ST/0406464&quot;&gt;math.ST/0406464&lt;/a&gt; = &lt;citE&gt;Annals
of Statistics&lt;/cite&gt; &lt;strong&gt;32&lt;/strong&gt; (2004): 870--897 [Unfortunately,
Bayesian]
	&lt;li&gt;Andrew Barron, Lucien Birg&amp;eacute;, and Pascal Massart, &quot;Risk
bounds for model selection via penalization&quot;, &lt;citE&gt;Probability Theory and
Related Fields&lt;./cite&gt; &lt;strong&gt;113&lt;/strong&gt; (1999): 301--413
	&lt;li&gt;Lucien Birg&amp;eacute;
		&lt;ul&gt;
		&lt;li&gt;&quot;The Brouwer Lecture 2005: Statistical estimation with
model
selection&quot;, &lt;a href=&quot;http://arxiv.org/abs/math.ST/0605187&quot;&gt;math.ST/0605187&lt;/a&gt;
		&lt;li&gt;&quot;Model selection for Poisson processes&quot;,
&lt;a href=&quot;http://arxiv.org/abs/math/0609549&quot;&gt;math/0609549&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;Lucien Birg&amp;racute; and Pascal Massart
		&lt;ul&gt;&quot;Minimal Penalties for Gaussian
Model Selection&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1007/s00440-006-0011-8&quot;&gt;&lt;cite&gt;Probability Theory and
Related Fields&lt;/cite&gt; &lt;strong&gt;138&lt;/strong&gt; (2007): 33--73&lt;/a&gt;
		&lt;li&gt;&quot;From model selection to adaptive estimation&quot;, pp. 55--87
in Pollard, Torgersen and Yang (eds.), &lt;cite&gt;Fetschrift for Lucien Le Cam:
Research Papers in Probability and Statistics&lt;/cite&gt; (1997)
		&lt;/ul&gt;
	&lt;li&gt;Borowiak, &lt;cite&gt;Model Discrimination for Nonlinear Regression
Models&lt;/cite&gt;
	&lt;li&gt;P. Burman, &quot;A comparative study of ordinary cross-validation,
v-fold cross-validation and the repeated learning-testing methods&quot;,
&lt;cite&gt;Biometrika&lt;/cite&gt; &lt;strong&gt;76&lt;/strong&gt; (1989): 503--514
	&lt;li&gt;Alain Celisse, &quot;Model selection in density estimation via
cross-validation&quot;, &lt;a href=&quot;http://arxiv.org/abs/0811.0802&quot;&gt;arxiv:0811.0802&lt;/a&gt;
	&lt;li&gt;A. E. Clark and C. G. Troskie, &quot;Time Series and Model Selection&quot;,
&lt;a href=&quot;http://dx.doi.org/10.1080/03610910701884153&quot;&gt;&lt;cite&gt;Communications in Statistics: Simulation and computing&lt;/citE&gt;
&lt;strong&gt;37&lt;/strong&gt; (2008): 766--771&lt;/a&gt; [Simulation study of the accuracy of
different information criteria]
	&lt;li&gt;Kevin A. Clarke, &quot;A Simple Distribution-Free 
Test for Nonnested Hypotheses&quot; [&lt;a href=&quot;http://www.rochester.edu/college/psc/clarke/ClarkePA.pdf&quot;&gt;PDF preprint&lt;/a&gt;]
	&lt;li&gt;Guilhem Coq, Olivier Alata, Marc Arnaudon and Christian Olivier,
&quot;An improved method for model selection based on Information Criteria&quot;, 
&lt;a href=&quot;http://arxiv.org/abs/math.ST/0702540&quot;&gt;math.ST/0702540&lt;/a&gt;
	&lt;li&gt;Pedro Domingos
		&lt;ul&gt;
		&lt;li&gt;&quot;Process-Oriented Estimation of Generalization Error&quot; [&lt;a href=&quot;http://www.cs.washington.edu/homes/pedrod/papers/ijcai99.pdf&quot;&gt;PDF&lt;/a&gt;]
		&lt;li&gt;&quot;A Process-Oriented Heuristic for Model Selection&quot;
[&lt;a
href=&quot;http://www.cs.washington.edu/homes/pedrod/papers/mlc98.pdf&quot;&gt;PDF&lt;/a&gt;]
		&lt;/ul&gt;
	&lt;li&gt;Sandrine Dudoit and Mark J. van der Laan, &quot;Asymptotics of Cross-Validated Risk Estimation in Estimator Selection and Performance Assessment&quot;,
&lt;cite&gt;Statistical Methodology&lt;/cite&gt; &lt;strong&gt;2&lt;/strong&gt; (2005): 131--154
[&lt;a href=&quot;http://www.bepress.com/ucbbiostat/paper126/&quot;&gt;preprint&lt;/a&gt;]
	&lt;li&gt;Hugo Jair Escalante, Manuel Montes, Luis Enrique Sucar, &quot;Particle
Swarm Model Selection&quot;,
&lt;a href=&quot;http://jmlr.csail.mit.edu/papers/v10/escalante09a.html&quot;&gt;&lt;cite&gt;Journal
of Machine Learning Research&lt;/cite&gt; &lt;strong&gt;10&lt;/strong&gt; (2009): 405--440&lt;/a&gt;
	&lt;li&gt;Jianqing Fan and Runze Li, &quot;Variable Selection via Nonconcave
Penalized Likelihood and its Oracle Properties&quot;, &lt;cite&gt;Journal of
the American Statistical Association&lt;/cite&gt; &lt;strong&gt;96&lt;/strong&gt; (2001): 1348--1360 [&lt;a href=&quot;http://www.orfe.princeton.edu/~jqfan/papers/01/penlike.pdf&quot;&gt;PDF reprint&lt;/a&gt; via Prof. Fan]
	&lt;li&gt;Magalie Fromont, &quot;Model selection by bootstrap penalization for
classification&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1007/s10994-006-7679-y&quot;&gt;&lt;cite&gt;Machine
Learning&lt;/cite&gt;
&lt;strong&gt;66&lt;/strong&gt; (2007): 165--207&lt;/a&gt;
	&lt;li&gt;Christophe Giraud, &quot;Estimation of Gaussian graphs by model
selection&quot;, &lt;a href=&quot;http://arxiv.org/abs/0710.2044&quot;&gt;arxiv:0710.2044&lt;/a&gt;
	&lt;li&gt;Alexander Goldenshluger and Eitan Greenshtein, &quot;Asymptotically
minimax regret procedures in regression model selection and the magnitude of
the dimension
penalty&quot;, &lt;a href=&quot;http://dx.doi.org/10.1214/aos/1015957473&quot;&gt;&lt;cite&gt;Annals of
Statistics&lt;/cite&gt; &lt;strong&gt;28&lt;/strong&gt; (2000): 1620--1637&lt;/a&gt; [Hmmm.  Not sure
how relevant this will be to anything I'd need to do, given the assumptions
they load on.  Via Kevin Kelly.]
	&lt;li&gt;Christian Gourieroux and Alain Monfort, &quot;Testing, Encompassing, and
Simulating Dynamic Econometric Models&quot;, &lt;cite&gt;Econometric Theory&lt;/cite&gt;
&lt;strong&gt;11&lt;/strong&gt; (1995): 195--228 [&lt;a href=&quot;http://www.jstor.org/pss/3532571&quot;&gt;JSTOR&lt;/a&gt;]
	&lt;li&gt;Michael Kearns and Dana Ron, &quot;Algorithmic Stability and
Sanity-Check Bounds for Leave-One-Out Cross-Validation,&quot; &lt;a
href=&quot;http://neco.mitpress.org/cgi/content/abstract/11/6/1427&quot;&gt;&lt;cite&gt;Neural
Computation&lt;/cite&gt; &lt;strong&gt;11&lt;/strong&gt; (1999): 1427--1453&lt;/a&gt;
	&lt;li&gt;Nicholas M. Kiefer and Hwan-Sik Choi, &quot;Robust Model Selection in
Dynamic Models with an Application to Comparing Predictive Accuracy&quot;
[&lt;A href=&quot;http://papers.ssrn.com/sol3/papers.cfm?abstract_id=945144&quot;&gt;SSRN&lt;/a&gt;]
	&lt;li&gt;Sadanori Konishi and Genshiro Kitagawa, &quot;Asymptotic theory for
information crteria in model selection --- functional approach,&quot; &lt;a
href=&quot;http://dx.doi.org/10.1016/S0378-3758(02)00462-7&quot;&gt;&lt;cite&gt;Journal of
Statistical Planning and Inference&lt;/cite&gt; &lt;strong&gt;114&lt;/strong&gt; (2003):
45--61&lt;/a&gt;
	&lt;li&gt;&lt;a href=&quot;http://www.stat.yale.edu/~hl284/&quot;&gt;Hannes Leeb&lt;/a&gt;,
&quot;Conditional Predictive Inference Post Model Selection&quot;, &lt;cite&gt;Annals of
Statistics&lt;/cite&gt; &lt;strong&gt;37&lt;/strong&gt; (2009): 2838--2876
= &lt;a href=&quot;http://arxiv.org/abs/0908.3615&quot;&gt;arxiv:0908.3615&lt;/a&gt; [I heard Leeb
give a talk on this, but I should read the paper]
	&lt;li&gt;Hannes Leeb and Benedikt M. Poetscher
		&lt;ul&gt;
		&lt;li&gt;&quot;Can One Estimate The
Unconditional Distribution of Post-Model-Selection Estimators?&quot;,
&lt;a href=&quot;http://arxiv.org/abs/0704.1584&quot;&gt;arxiv:0704.1584&lt;/a&gt; [They claim the
answer is &quot;No&quot;.]
		&lt;li&gt;&quot;Model Selection and Inference: Facts and Fiction&quot;,
&lt;a href=&quot;http://dx.doi.org/10+10170 S0266466605050036&quot;&gt;&lt;cite&gt;Econometric
Theory&lt;/cite&gt; &lt;strong&gt;21&lt;/strong&gt; (2005): 21--59&lt;/a&gt;
[&lt;a href=&quot;http://www.stat.yale.edu/~hl284/ETAnniv.pdf&quot;&gt;PDF reprint&lt;/a&gt;]
		&lt;/ul&gt;
	&lt;li&gt;F. Liang and A. Barron, &quot;Exact Minimax Strategies for Predictive
Density Estimation, Data Compression, and Model Selection&quot;, &lt;a
href=&quot;http://dx.doi.org/0.1109/TIT.2004.836922&quot;&gt;&lt;cite&gt;IEEE Transactions on
Information Theory&lt;/cite&gt; &lt;strong&gt;50&lt;/strong&gt; (2004): 2708--2726&lt;/a&gt;
	&lt;li&gt;Pascal Massart, &lt;cite&gt;Concentration Inequalities and Model
Selection&lt;/cite&gt;
[&lt;a href=&quot;http://www.math.u-psud.fr/~massart/stf2003_massart.pdf&quot;&gt;PDF preprint
version&lt;/a&gt; (large!)]
	&lt;li&gt;Abraham Meidan and Boris Levin, &quot;Choosing from Competing Theories
in Computerised Learning&quot;, &lt;cite&gt;Minds and Machines&lt;/citE&gt; &lt;strong&gt;12&lt;/strong&gt;
(2002): 119--129
	&lt;li&gt;Nicolai Meinshausen and Peter Buehlmann, &quot;Stability Selection&quot;,
&lt;a href=&quot;http://arxiv.org/abs/0809.2932&quot;&gt;arxiv:0809.2932&lt;/a&gt; [&quot;Estimation of
structure, such as in graphical modeling, cluster analysis or variable
selection, is notoriously difficult, especially for high-dimensional data. We
introduce the new method of stability selection.&quot;]
	&lt;li&gt;Grayham E. Mizon and Massimiliano Marcellino (eds.),
&lt;cite&gt;Progressive Modelling:  Non-nested Testing and Encompassing&lt;/cite&gt;
[&lt;a href=&quot;http://www.oup.com/us/catalog/general/subject/Economics/Econometrics/?view=usa&amp;ci=9780199257324&quot;&gt;Blurb, table of contents&lt;/a&gt;]
	&lt;li&gt;Ali Mohammad-Djafari, &quot;Model selection for inverse problems: Best
choice of basis functions and model order selection,&quot; &lt;a
href=&quot;http://arxiv.org/abs/physics/0111020&quot;&gt;physics/0111020&lt;/a&gt;
	&lt;li&gt;M. Pavlic and M. J. van der Laan, &quot;Fitting of mixtures with
unspecified number of components using cross validation distance
estimate&quot;, &lt;cite&gt;Computational Statistics and Data
Analysis&lt;/cite&gt; &lt;strong&gt;41&lt;/strong&gt; (2003): 413--428
	&lt;li&gt;Zacharias Psaradakis, Martin Sola, Fabio Spagnolo and Nicola Spagnolo, &quot;Selecting nonlinear time series models using information criteria&quot;,
&lt;a href=&quot;http://dx.doi.org/10.1111/j.1467-9892.2009.00614.x&quot;&gt;&lt;cite&gt;Journal of
Time Series Analysis&lt;/cite&gt;
&lt;strong&gt;30&lt;/strong&gt; (2009): 369--394&lt;/a&gt;
	&lt;li&gt;Pradeep Ravikumar, Martin J. Wainwright, John D. Lafferty,
&quot;High-Dimensional Graphical Model Selection Using $\ell_1$-Regularized Logistic
Regression&quot;, &lt;a href=&quot;http://arxiv.org/abs/0804.4202&quot;&gt;arxiv:0804.4202&lt;/a&gt;
	&lt;li&gt;Douglas Rivers and Quang H. Vuong, &quot;Model selection tests for
nonlinear dynamic
models&quot;, &lt;a href=&quot;htttp://dx.doi.org/10.1111/1368-423X.t01-1-00071&quot;&gt;The
Econometrics Journal&lt;/cite&gt; &lt;strong&gt;5&lt;/strong&gt; (2002): 1--39&lt;/a&gt;
	&lt;li&gt;Yiyuan She, &quot;Thresholding-based Iterative Selection
Procedures for Model Selection and Shrinkage&quot;, &lt;a href=&quot;http://arxiv.org/abs/0812.5061&quot;&gt;arxiv:0812.5061&lt;/a&gt;
	&lt;li&gt;David Shilane, Richard H. Liang and Sandrine Dudoit, &quot;Loss-Based
Estimation with Evolutionary Algorithms and Cross-Validation&quot;,
UC Berkeley Biostatistics Working Paper 227 [&lt;a href=&quot;http://www.bepress.com/ucbbiostat/paper227/&quot;&gt;Abstract, PDF&lt;/a&gt;]
	&lt;li&gt;Aris Spanos
		&lt;ul&gt;
		&lt;li&gt;&quot;Statistical Induction, Severe Testing, and Model
Validation&quot; [&lt;a href=&quot;http://www.error06.econ.vt.edu/spanos.pdf&quot;&gt;Preprint&lt;/a&gt;]
		&lt;li&gt;&quot;Statistical Model Specification vs. Model Selection: Akaike-type Criteria and the Reliability of Inference&quot; [preprint kindly
provided by Prof. Spanos]
		&lt;/ul&gt;
	&lt;li&gt;Tina Toni and Michael P. H. Stumpf
		&lt;ul&gt;
		&lt;li&gt;&quot;Parameter Inference and
Model Selection in Signaling Pathway Models&quot;, &lt;a href=&quot;http://arxiv.org/abs/0905.4468&quot;&gt;arxiv:0905.4468&lt;/a&gt;
		&lt;li&gt;&quot;Simulation-based model selection for dynamical systems in systems and population biology&quot;, &lt;a href=&quot;http://arxiv.org/abs/0911.1705&quot;&gt;arxiv:0911.1705&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;Masayuki Uchida and Nakahiro Yoshida, &quot;Information Criteria in
Model Selection for Mixing Processes&quot;, &lt;cite&gt;Statistical Inference for
Stochastic Processes&lt;/cite&gt; &lt;strong&gt;4&lt;/strong&gt; (2001): 73--98 [&quot;The emphasis is
put on the use of the asymptotic expansion of the distribution of an estimator
based on the conditional Kullback-Leibler divergence for stochastic processes.
Asymptotic properties of information criteria and their improvement are
discussed.&quot;]
	&lt;li&gt;Tim van Erven, Peter Grunwald and Steven de Rooij, &quot;Catching Up Faster by Switching Sooner: A Prequential Solution to the AIC-BIC Dilemma&quot;, &lt;a href=&quot;http://arxiv.org/abs/0807.1005&quot;&gt;arxiv:0807.1005&lt;/a&gt;
	&lt;li&gt;Geert Verbeke, Geert Molenberghs, Caroline Beunckens, &quot;Formal and
Informal Model Selection with Incomplete Data&quot;, &lt;cite&gt;Statistical
Science&lt;/citE&gt; &lt;strong&gt;23&lt;/strong&gt; (2008): 201--218
= &lt;a href=&quot;http://arxiv.org/abs/0808.3587&quot;&gt;arxiv:0808.3587&lt;/a&gt;
	&lt;li&gt;Zijun Wang, &quot;Finite Sample Performances of the Model Selection Approach in Nonparametric Model Specification for Time Series&quot;, &lt;a href=&quot;http://dx.doi.org/10.1080/03610920802531314&quot;&gt;&lt;cite&gt;Communications in Statistics: Theory and Methods&lt;/cite&gt; &lt;strong&gt;38&lt;/strong&gt;
(2009): 2302--2330&lt;/a&gt;
	&lt;li&gt;Hirokazu Yanagiharaa and Chihiro Ohmoto, &quot;On distribution of AIC in
linear regression models&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1016/j.jspi.2004.03.016&quot;&gt;&lt;cite&gt;Journal of
Statistical Planning and Inference&lt;/cite&gt; &lt;strong&gt;133&lt;/strong&gt; (2005):
417--433&lt;/a&gt;
	&lt;li&gt;Peng Zhau and Bin Yu, &quot;On Model Selection Consistency of Lasso&quot;,
&lt;a
href=&quot;http://jmlr.csail.mit.edu/papers/volume7/zhao06a/zhao06a.pdf&quot;&gt;&lt;cite&gt;Journal
of Machine Learning Research&lt;/cite&gt; &lt;strong&gt;7&lt;/strong&gt; (2006): 2541--2563&lt;/A&gt;
	&lt;/ul&gt;
</description>
  </item>
  </channel>
</rss>