<?xml version="1.0"?>
<!-- name="generator" content="blosxom/2.0" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
  <channel>
    <title>Notebooks   </title>
    <link>http://bactra.org/notebooks</link>
    <description>Cosma's Notebooks</description>
    <language>en</language>

  <item>
    <title>Machine Learning, Statistical Inference and Induction</title>
    <link>http://bactra.org/notebooks/2011/12/26#learning-inference-induction</link>
    <description>

&lt;P&gt;There's a place where &lt;a href=&quot;ai.html&quot;&gt;AI&lt;/a&gt;, &lt;a
href=&quot;statistics.html&quot;&gt;statistics&lt;/a&gt; and epistemology-&lt;a
href=&quot;scientific-method.html&quot;&gt;methodology&lt;/a&gt; converge, or want to anyhow.
&quot;Machine learning&quot; is the AI label: how do we make a machine that can find and
learn the regularities in a data set?  (If the data set is really, really big,
and we care mostly about making practically valuable predictions, this becomes
&lt;a href=&quot;data-mining.html&quot;&gt;data mining&lt;/a&gt;, or &quot;knowledge discovery in
databases,&quot; KDD.)  The statisticians ask very similar questions about
model-fitting and hypothesis-testing.  The epistemologists are mired in the
problem of induction, and &quot;inference to the best explanation&quot; (a phrase, I am
told by Kenny Easwaran, coined by Gilbert Harman; link below).  The fields
over-lap in the most crazy-quilt and arbitrary way: I've heard university
librarians arguing over whether specific books should go to the engineering or
the philosophy library, for instance.

&lt;P&gt;The connection to &lt;a href=&quot;neuroscience.html&quot;&gt;neuroscience&lt;/a&gt; and &lt;a
href=&quot;cognitive-science.html&quot;&gt;cognitive science&lt;/a&gt; is plain: how on Earth do
human beings, and other critters, actually learn?  Given that there are many
different strategies, which ones do organisms use, and why, and are they good
ones?  (It's entirely possible that we've gotten locked in to inefficient
learning strategies; then the question becomes whether or not they can be
improved.)  Studying learning by organisms lets us test theories of
learning-in-the-abstract, and vice versa: if we had, say, a good proof that a
certain learning scheme simply would not work, we'd &lt;em&gt;know&lt;/em&gt; that animals
don't use it.

&lt;P&gt;One fairly strong result seems to be that &lt;em&gt;tabulae rasae&lt;/em&gt; don't work:
you've got to give the machine/baby/scientist &lt;em&gt;some&lt;/em&gt; hints, or restrict
the field of possible hypotheses initially, or you'll never get anywhere.  This
was at least implicit in &lt;a href=&quot;hume.html&quot;&gt;Hume&lt;/a&gt;, and I believe the other
classical empiricists as well, but they don't seem to have been restrictive
&lt;em&gt;enough&lt;/em&gt; to account for the way we actually do learn.  &lt;a
href=&quot;evol-psych.html&quot;&gt;Natural selection&lt;/a&gt; is the obvious candidate for
having restricted our hypothesis-set, and for having designed our learning
mechanisms.

&lt;P&gt;My &lt;a href=&quot;positivism.html&quot;&gt;positivist&lt;/a&gt; temperament can hardly help
being pleased by this &quot;attempt to introduce the experimental method of
reasoning into moral subjects,&quot; which, as data mining,
has &lt;a href=&quot;data-mining.html&quot;&gt;massive industrial applications&lt;/a&gt;.  My real
interest in this isn't, for once, philosophical. Instead, I want to be able to
quantify, or at the very least
characterize, &lt;a href=&quot;self-organization.html&quot;&gt;self-organization&lt;/a&gt;, which
means I need a good way of automatically finding patterns or regularities in
data-sets.  For someone who's got
the &lt;a href=&quot;computational-mechanics.html&quot;&gt;computational mechanics&lt;/a&gt; gospel,
this means &quot;inferring statistical complexity,&quot; and that means the automated
construction of abstract-machine or formal-language models of data-sets.
(Alternately: Figuring out how natural things compute.)  And doing that well
means addressing all the issues people in these areas address, so I figure I
ought to just steal from them.


&lt;P&gt;See also:
	&lt;a href=&quot;causality.html&quot;&gt;Causality&lt;/a&gt;;
	&lt;a href=&quot;collective-cognition.html&quot;&gt;collective cognition&lt;/a&gt;;
	&lt;a href=&quot;clustering.html&quot;&gt;clustering&lt;/a&gt;;
	&lt;a href=&quot;ensemble-ml.html&quot;&gt;ensemble methods&lt;/a&gt;;
	&lt;a href=&quot;grammatical-inference.html&quot;&gt;grammatical inference&lt;/a&gt;;
	&lt;a href=&quot;graphical-models.html&quot;&gt;graphical models&lt;/a&gt;;
	&lt;a href=&quot;learning-games.html&quot;&gt;learning in games&lt;/a&gt;;
	&lt;a href=&quot;learning-theory.html&quot;&gt;learning theory&lt;/a&gt;;
	the &lt;a href=&quot;mdl.html&quot;&gt;minimum description length&lt;/a&gt; principle;
	&lt;a href=&quot;model-selection.html&quot;&gt;model selection&lt;/a&gt;;
	&lt;a href=&quot;neural-nets.html&quot;&gt;neural nets&lt;/a&gt;;
	&lt;a href=&quot;scientific-thinking.html&quot;&gt;scientific thinking&lt;/a&gt;;
	&lt;a href=&quot;sequential-decisions.html&quot;&gt;sequential decision-making&lt;/a&gt;;
	&lt;a href=&quot;structured-data.html&quot;&gt;statistics with structured data&lt;/a&gt;;
	&lt;a href=&quot;time-series.html&quot;&gt;time series&lt;/a&gt;; and
	&lt;a href=&quot;universal-prediction.html&quot;&gt;universal prediction algorithms&lt;/a&gt;
	now get their own notebooks; other topics also need to be spun off from
this one.

&lt;ul&gt;Recommended, big picture:
	&lt;li&gt;Leo Breiman, &quot;Statistical Modeling: The Two Cultures&quot;,
&lt;a href=&quot;http://projecteuclid.org/euclid.ss/1009213726&quot;&gt;&lt;cite&gt;Statistical
Science&lt;/cite&gt; &lt;strong&gt;16&lt;/strong&gt; (2001): 199--231&lt;/a&gt; [Very much including
the discussion by others and the reply by Breiman.  Thanks to
&lt;a href=&quot;http://www.columbia.edu/~chw2/&quot;&gt;Chris Wiggins&lt;/a&gt; for alerting me to
this.]
	&lt;li&gt;Nicolo Cesa-Bianchi and Gabor Lugosi, &lt;citE&gt;Prediction, Learning,
and Games&lt;/cite&gt; [&lt;a href=&quot;../weblog/algae-2008-07.html#prediction&quot;&gt;Mini-review&lt;/a&gt;]
	&lt;li&gt;Ulf Grenander, &lt;cite&gt;Elements of Pattern Theory&lt;/cite&gt;
	&lt;li&gt;David Hand, Heikki Mannila and Padhraic Smyth, &lt;cite&gt;Principles
of Data Mining&lt;/cite&gt;
	&lt;li&gt;Trever Hastie, Robert Tibshirani and Jerome Friedman, &lt;cite&gt;The
Elements of Statistical Learning: Data Mining, Inference, and Prediction&lt;/cite&gt;
[&lt;a href=&quot;http://www-stat.stanford.edu/~tibs/ElemStatLearn/&quot;&gt;Website&lt;/a&gt;, with full text free in PDF]
	&lt;li&gt;&lt;a href=&quot;john-holland.html&quot;&gt;John H. Holland&lt;/a&gt;, Keith J. Holyoak,
Richard E. Nisbett, and Paul R. Thagard, &lt;cite&gt;Induction: Process of
Inference, Learning and Discovery&lt;/cite&gt;
[&lt;a href=&quot;../reviews/hhnt-induction/&quot;&gt;Review: The Best-Laid Schemes o' Mice an'
Men&lt;/a&gt;]
	&lt;li&gt;Michael J. Kearns and Umesh V. Vazirani, &lt;cite&gt;An Introduction to
Computational Learning Theory&lt;/cite&gt;
[&lt;a href=&quot;../reviews/kearns-vazirani/&quot;&gt;Review: How to Build a Better
Guesser&lt;/a&gt;]
	&lt;li&gt;Deborah G. Mayo, &lt;cite&gt;Error and the Growth of Experimental
Knowledge&lt;/cite&gt; [How to use standard statistical tests to learn from
experiment, without Bayesian priors or other a priori folderol.  &lt;a
href=&quot;../reviews/error/&quot;&gt;Review: We Have Ways of Making You Talk, or, Long Live
Peircism-Popperism-Neyman-Pearson Thought!&lt;/a&gt;]
	&lt;li&gt;Deborah G. Mayo and D. R. Cox, &quot;Frequentist statistics as a theory
of inductive
inference&quot;, &lt;a href=&quot;http://arxiv.org/abs/math.ST/0610846&quot;&gt;math.ST/0610846&lt;/a&gt;
	&lt;li&gt;John Norton, &quot;A Material Theory of Induction&quot;, &lt;cite&gt;Philosophy of Science&lt;/cite&gt; &lt;strong&gt;70&lt;/strong&gt; (2003): 647--670 [&lt;a href=&quot;http://www.pitt.edu/~jdnorton/papers/material.pdf&quot;&gt;PDF&lt;/a&gt; reprint]
	&lt;li&gt;Jorma Rissanen, &lt;cite&gt;Stochastic Complexity in Statistical
Inquiry&lt;/cite&gt; [&lt;a
href=&quot;../reviews/stochastic-complexity-in-statistical-inquiry/&quot;&gt;Review: Less Is
More, or, &lt;em&gt;Ecce data!&lt;/em&gt;&lt;/a&gt;]
	&lt;li&gt;Sara J. Shettleworth, &lt;cite&gt;Cognition, Evolution and
Behavior&lt;/cite&gt;
	&lt;li&gt;Peter Spirtes, Clark Glymour and Richard Scheines, 
&lt;cite&gt;Causation, Prediction, and Search&lt;/cite&gt;
	&lt;li&gt;Chris Thornton, &lt;cite&gt;Truth from Trash: How Learning Makes
Sense&lt;/cite&gt; [Well, half a recommendation.  Review: &lt;a
href=&quot;../reviews/truth-from-trash/&quot;&gt;Two Cheers for Trash&lt;/a&gt;]
	&lt;li&gt;V. N. (=Vladimir Naumovich) Vapnik, &lt;cite&gt;The Nature of
Statistical Learning Theory&lt;/cite&gt; [&lt;a href=&quot;../reviews/vapnik-nature/&quot;&gt;Review:
A Useful Biased Estimator&lt;/a&gt;]
	&lt;li&gt;H. Peyton Young, &lt;cite&gt;Individual Strategy and Social
Structure&lt;/cite&gt; [Pretty dumb agents nonetheless able to learn in a basic
sense, and what they can accomplish in the way of societies.  &lt;a
href=&quot;../reviews/young-strategy-and-structure/&quot;&gt;Review: A Myopic (and Sometimes
Blind) Eye on the Main Chance, or, the Origins of Custom&lt;/a&gt;]
	&lt;/ul&gt;

&lt;ul&gt;Recommended, close-ups:
	&lt;li&gt;Shun-ichi Amari, &quot;Information Geometry on Hierarchical
Decomposition of Stochastic Interactions,&quot; &lt;cite&gt;IEEE Transactions on
Information Theory&lt;/cite&gt; &lt;strong&gt;47&lt;/strong&gt; (2001): 1701-11 [A way of finding
&quot;parts&quot; in complex distributions; uses many differential geometry tricks to
do statistics. &lt;a
href=&quot;http://people.csail.mit.edu/jrennie/trg/papers/amari-ig-hierarchy-01.pdf&quot;&gt;PDF
reprint&lt;/a&gt;]
	&lt;li&gt;Massimiliano Badino, &quot;An Application of Information Theory to the
Problem of the Scientific
Experiment&quot;, &lt;cite&gt;Synthese&lt;/cite&gt; &lt;strong&gt;140&lt;/strong&gt; (2004): 355--389 [&lt;a
href=&quot;http://philsci-archive.pitt.edu/archive/00001830/&quot;&gt;MS Word preprint&lt;/a&gt;.
See comments under &lt;a href=&quot;information-theory.html&quot;&gt;Information Theory&lt;/a&gt;.]
	&lt;li&gt;Jonathan Baxter, &quot;A Model of Inductive Bias Learning,&quot;
&lt;cite&gt;Journal of Artificial Intelligence Research&lt;/cite&gt; &lt;strong&gt;12&lt;/strong&gt;
(2000): 149--198 [How to learn what class of hypotheses you should be trying to
use, i.e., your inductive bias.  Assumes independence, again.]
	&lt;li&gt;William Bialek, Ilya Nemenman, and Naftali Tishby,
&quot;Predictability, Complexity and Learning,&quot; &lt;a
href=&quot;http://arXiv.org/abs/physics/0007070&quot;&gt;physics/0007070&lt;/a&gt;
	&lt;li&gt;Ken Binmore, &quot;Making Decisions in Large Worlds&quot; [&quot;This
paper argues that we need to look beyond Bayesian decision theory for an answer
to the general problem of making rational decisions under
uncertainty.&quot;  &lt;a href=&quot;http://www.carloalberto.org/files/binmore.pdf&quot;&gt;PDF
manuscript&lt;/a&gt;; thanks to Nicolas Della Penna for the pointer]
	&lt;li&gt;Margaret Boden, &lt;cite&gt;The Creative Mind: Myths and
Mechanisms&lt;/cite&gt; [How and when to change the kind of representation you're
using, a topic shamefully neglected in the literature.
&lt;A href=&quot;http://www.bbsonline.org/documents/a/00/00/04/34/&quot;&gt;Precis&lt;/a&gt;]
	&lt;li&gt;Josh Bongard and Hod Lipson, &quot;Automated reverse engineering of
nonlinear dynamical
systems&quot;, &lt;a href=&quot;http://dx.doi.org/10.1073/pnas.0609476104&quot;&gt;&lt;cite&gt;Proceedings
of the National Academy of Sciences&lt;/cite&gt; (USA) &lt;strong&gt;104&lt;/strong&gt; (2007):
9943--9948&lt;/a&gt; [Thanks to Chris Weed for pointing me to this.  Interesting, but
basically unaware of the literature
on &lt;a href=&quot;state-space-reconstruction.html&quot;&gt;state-space reconstruction&lt;/a&gt; in
nonlinear dynamics.]
	&lt;li&gt;R. B. Braithwaite, &lt;cite&gt;Scientific Explanation&lt;/cite&gt;
	&lt;li&gt;&lt;a href=&quot;http://www.cs.washington.edu/homes/pedrod/&quot;&gt;Pedro
Domingos&lt;/a&gt;, &quot;The Role of Occam's Razor in Knowledge Discovery,&quot; &lt;cite&gt;Data
Mining and Knowledge Discovery,&lt;/cite&gt; &lt;strong&gt;3&lt;/strong&gt; (1999) [&lt;a
href=&quot;http://www.cs.washington.edu/homes/pedrod/dmkd99.ps.gz&quot;&gt;Online&lt;/a&gt;]
	&lt;li&gt;Marco Dorigo and Marco Colombetti, &lt;cite&gt;Robot Shaping: An
Experiment in Behavior Engineering&lt;/cite&gt; [&lt;a
href=&quot;../reviews/robot-shaping/&quot;&gt;Review: Crawling Towards the Light&lt;/a&gt;]
	&lt;li&gt;John W. Fisher III, Alexander T. Ihler and Paula A. Viola,
&quot;Learning Informative Statistics: A Nonparametric Approach&quot;, pp. 900--906 in
NIPS 12 (1999) [&lt;a href=&quot;http://books.nips.cc/papers/files/nips12/0900.pdf&quot;&gt;PDF
reprint&lt;/a&gt;.  I'd call this more of a semi-parametric approach than a fully
non-parametric one; they assume a parametric form for the dependence structure,
but are agnostic about the distributions of innovations, and so try to maximize
non-parametrically estimated mutual informations.]
	&lt;li&gt;Francois Fleuret and Donald Geman, &quot;Stationary Features and Cat
Detection&quot;, &lt;a
href=&quot;http://jmlr.csail.mit.edu/papers/v9/fleuret08a.html&quot;&gt;&lt;cite&gt;Journal of
Machine Learning Research&lt;/cite&gt; &lt;strong&gt;9&lt;/strong&gt; (2008): 2549--2578&lt;/a&gt;
	&lt;li&gt;Peter Godfrey-Smith, &quot;Inductions, Samples, and Kinds&quot;
[&lt;a href=&quot;http://www.people.fas.harvard.edu/~pgs/InductionSamplesKinds_INPC_final.pdf&quot;&gt;PDF preprint&lt;/a&gt;]
	&lt;li&gt;David J. Hand, &quot;Classifier Technology and the Illusion of Progress&quot;,
&lt;a href=&quot;http://dx.doi.org/10%2E1214/088342306000000060&quot;&gt;&lt;cite&gt;Statistical
Science&lt;/cite&gt; &lt;strong&gt;21&lt;/strong&gt; (2006): 1--15&lt;/a&gt;
= &lt;a href=&quot;http://arxiv.org/abs/math.ST/0606441&quot;&gt;math.ST/0606441&lt;/a&gt; [Or: don't
believe everything you read in ICML!  With commentary, available from the
arxiv.org link]
	&lt;li&gt;Hinton and Sejnowski (eds.), &lt;cite&gt;Unsupervised Learning&lt;/cite&gt;
[A sort of &quot;&lt;cite&gt;Neural Computation&lt;/cite&gt;'s Greatest Hits&quot; compilation]
	&lt;li&gt;Tommi S. Jaakkola and David Haussler, &quot;Exploiting generative models
in discriminative classifiers&quot;, &lt;cite&gt;NIPS 11&lt;/cite&gt; (1998)
[&lt;a href=&quot;http://books.nips.cc/papers/files/nips11/0487.pdf&quot;&gt;PDF&lt;/a&gt;]
	&lt;li&gt;Aleks Jakulin and Ivan Bratko, &quot;Quantifying and Visualizing
Attribute Interactions&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cs.AI/0308002&quot;&gt;cs.AI/0308002&lt;/a&gt;
	&lt;li&gt;&lt;a
href=&quot;http://www.hss.cmu.edu/philosophy/kelly/research.htm&quot;&gt;Kevin T. Kelly&lt;/a&gt;
[Kelly's work on Occam's Razor is, so far as I know, the only justification for it which doesn't either massively beg the question, change the subject, or make
massive assumptions about the nature of the world, Divine Providence, etc.]
		&lt;ul&gt;
		&lt;li&gt;&quot;A New Solution to the Puzzle of Simplicity&quot;,
&lt;a href=&quot;http://philsci-archive.pitt.edu/archive/00002984/&quot;&gt;phil-sci/2984&lt;/a&gt;
[One of his clearest papers]
		&lt;li&gt;&lt;a href=&quot;http://www.andrew.cmu.edu/user/kk3n/ockham/Ockham.htm&quot;&gt;Ockham Project Web Page&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;Shane Legg, &quot;Is There an Elegant Universal Theory of
Prediction?&quot;, &lt;a href=&quot;http://arxiv.org/abs/cs.AI/0606070&quot;&gt;cs.AI/0606070&lt;/a&gt; [A
nice set of diagonalization arguments against the hope of a universal
prediction scheme which has the nice features of Solomonoff-style induction,
but is actually computable.]
	&lt;li&gt;Jerzy Neyman, &lt;cite&gt;First Course in Probability and
Statistics&lt;/cite&gt; [Fine explanation of his ideas about &quot;rules of inductive
behavior&quot; --- which probably isn't very good methodology, but has the makings
of excellent robotics]
	&lt;li&gt;Leonid Peshkin, &quot;Structure induction by lossless graph compression&quot;,
&lt;a href=&quot;http://arxiv.org/abs/cs.DS/0703132&quot;&gt;cs.DS/0703132&lt;/a&gt; [Adapting
data-compression ideas to discover hierarchical structures in graphs, e.g., the
4 bases from a tinker-toy model of DNA.]
	&lt;li&gt;Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer and Andrew Y. Ng, &quot;Self-taught learning: Transfer learning from unlabeled data&quot;,
&lt;cite&gt;ICML 2007&lt;/cite&gt;
[&lt;a
href=&quot;http://www.stanford.edu/~rajatr/papers/icml07_SelfTaughtLearning.pdf&quot;&gt;PDF&lt;/a&gt;.
This is a clever idea for semi-supervised learning.  Given a big supply of
unlabeled examples, and a small number of labeled examples, use the unlabeled
ones to learn a high-level/abstract representation or set of features.  Then
use &lt;em&gt;those&lt;/em&gt; features in straightforward classifier learning on the
labeled data.  (They have a specific idea for learning the higher-level
representation,
by &lt;a href=&quot;basis-selection-in-function-decomposition.html&quot;&gt;basis
selection&lt;/a&gt;, but that's a separable issue.)]
	&lt;li&gt;Gerhard Schurz, &quot;Universal vs. Local Prediction Strategies: A
Game-Theoretical Approach to the Problem of
Induction&quot;, &lt;a
href=&quot;http://philsci-archive.pitt.edu/archive/00003720/&quot;&gt;phil-sci/3720&lt;/a&gt;
[Slides only?!?]
	&lt;li&gt;Spyros Skouras, &quot;Decisionmetrics: Towards a Decision-Based
Approach to Econometrics&quot; [Suppose what you really want to do with your model
is to make decisions, e.g., to buy and sell and make money doing so.  Then
fitting the model to minimize a standard error measure, e.g., mean square
error, often gives worse performance than fitting the model to minimize
expected losses.  This applies much more broadly than Spyros's financial
examples may suggest.]
	&lt;li&gt;Aris Spanos, &quot;The Curve-Fitting Problem, Akaike-type Model
Selection, and the Error Statistical Approach&quot;
[&lt;a
href=&quot;http://www.econ.vt.edu/Faculty/CVs_&amp;_Research/Aris%20Spanos%20-%20Working%20Papers/spanoscurve-fitting.pdf&quot;&gt;PDF
preprint&lt;/a&gt;]
	&lt;li&gt;Sara van de Geer, &lt;cite&gt;Applications of Empirical Process
Theory&lt;/cite&gt; [A.k.a. &lt;cite&gt;Empirical Process Theory in
&lt;/cite&gt;M&lt;cite&gt;-Estimation&lt;/cite&gt;]
	&lt;li&gt;Vladimir Vovk, Alex Gammerman and Glenn Shafer, &lt;cite&gt;Algorithmic
Learning in a Random World&lt;/cite&gt; [&lt;a href=&quot;../weblog/algae-2011-08.html#conformal-prediction&quot;&gt;Mini-review&lt;/a&gt;]
	&lt;li&gt;Blaz Zupan, Marko Bohanec, Janez Demsar and Ivan Bratko, &quot;Learning
by discovering concept hierarchies&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1016/S0004-3702(99)00008-9&quot;&gt;&lt;cite&gt;Artificial
Intelligence&lt;/cite&gt; &lt;strong&gt;109&lt;/strong&gt; (1999): 211--242&lt;/a&gt; [Thanks to Aleks
Jakulin for letting me know about this.  &lt;a
href=&quot;http://magix.fri.uni-lj.si/blaz/papers/aij99.pdf&quot;&gt;PDF preprint&lt;/a&gt;]
	&lt;/ul&gt;

&lt;ul&gt;Not exactly recommended:
	&lt;li&gt;Dana Ballard, &lt;cite&gt;An Introduction to Natural Computation&lt;/cite&gt;
[&lt;a href=&quot;../reviews/ballard-natural/&quot;&gt;Review: Not Natural Enough&lt;/a&gt;]
	&lt;li&gt;Gilbert Harman and Sanjeev Kulkarni, &lt;cite&gt;Reliable Reasoning:
Induction and Statistical Learning Theory&lt;/cite&gt; [Published by MIT Press; 2006
draft &lt;a href=&quot;http://www.princeton.edu/~kulkarni/Papers/b2006_hk.pdf&quot;&gt;free
online&lt;/a&gt; via Prof. Kulkarni (about 100 pages).  The technical material
on &lt;a href=&quot;learning-theory.html&quot;&gt;learning theory&lt;/a&gt; is mostly alright, so far
as it goes, but the &lt;em&gt;philosophy&lt;/em&gt; is irritatingly lack-luster.
Definitely not worth paying what the publisher charges for it. &amp;mdash; There is
now a good &lt;a href=&quot;http://ndpr.nd.edu/review.cfm?id=12684&quot;&gt;review&lt;/a&gt; by &lt;a
href=&quot;http://www.hss.cmu.edu/philosophy/kelly/research.htm&quot;&gt;Kevin
Kelly&lt;/a&gt; and Conor Mayo-Wilson.]
	&lt;/ul&gt;

&lt;ul&gt;Modesty forbids me to recommend:
	&lt;li&gt;CRS, &lt;cite&gt;&lt;a href=&quot;../thesis/&quot;&gt;Causal Architecture, Complexity and
Self-Organization in Time Series and Cellular Automata&lt;/a&gt;&lt;/cite&gt; [Ph.D.
thesis, UW-Madison, 2001]
	&lt;li&gt;CRS, &quot;Dynamics of Bayesian Updating with Dependent Data and
Mis-specified
Models&quot;, &lt;a href=&quot;http://arxiv.org/abs/0901.1342&quot;&gt;arxiv:0901.1342&lt;/a&gt;
= &lt;a href=&quot;http://dx.doi.org/10.1214/09-EJS485&quot;&gt;&lt;cite&gt;Electronic Journal of
Statistics&lt;/cite&gt; &lt;strong&gt;3&lt;/strong&gt; (2009): 1039--1074&lt;/a&gt;
	&lt;li&gt;CRS and Kristina Lisa Klinkner, &quot;Blind Construction of Optimal
Nonlinear Recursive Predictors for Discrete Sequences&quot;, pp. 504--511 in UAI
2004, &lt;a href=&quot;http://arxiv.org/abs/cs.LG/0406011&quot;&gt;cs.LG/0406011&lt;/a&gt;
	&lt;/ul&gt;

&lt;ul&gt;To read:
	&lt;li&gt;Tatsuya Akutsu, Satoru Miyanoa and Satoru Kuhar, &quot;A simple greedy
algorithm for finding functional relations: efficient implementation and
average case analysis,&quot; &lt;a
href=&quot;http://dx.doi.org/10.1016/S0304-3975(02)00183-4&quot;&gt;&lt;cite&gt;Theoretical
Computer Science&lt;/cite&gt; &lt;strong&gt;292&lt;/strong&gt; (2002): 481--495&lt;/a&gt;
	&lt;li&gt;Atocha Aliseda, &lt;cite&gt;Abductive Reasoning: Logical Investigations
into Discovery and Explanation&lt;/cite&gt;
[&lt;a
href=&quot;http://www.springer.com/sgw/cda/frontpage/0,11855,4-0-22-66194772-0,00.html&quot;&gt;Blurb&lt;/a&gt;]
	&lt;li&gt;Umberto Amato, Anestis Antoniadis, Alexander Samarov, Alexander
Tsybakov, &quot;Noisy Independent Factor Analysis Model for Density Estimation and
Classification&quot;, &lt;a href=&quot;http://arxiv.org/abs/0906.2885&quot;&gt;arxiv:0906.2885&lt;/a&gt;
	&lt;li&gt;Andris Ambainis, &quot;Probabilistic inductive inference: a survey&quot;,
&lt;a href=&quot;http://arxiv.org/abs/cs.LG/9902026&quot;&gt;cs.LG/9902026&lt;/a&gt; [Taking
&quot;inductive inference&quot; exclusively in the sense of learning recursive
functions]
	&lt;li&gt;Rosa I. Arriaga and Santosh Vempala, &quot;An algorithmic theory of learning: Robust concepts and random projection&quot;, &lt;a href=&quot;http://dx.doi.org/10.1007/s10994-006-6265-7&quot;&gt;&lt;cite&gt;Machine Learning&lt;/cite&gt; &lt;strong&gt;63&lt;/strong&gt; (2006): 161--182&lt;/a&gt;
	&lt;li&gt;Nihat Ay
		&lt;ul&gt;
		&lt;li&gt;&quot;Locality of global stochastic interaction in directed
acyclic networks,&quot; preprint, &lt;a
href=&quot;http://www.mis.mpg.de/preprints/2001/prepr5401-abstr.html&quot;&gt;MPI-MIS
54/2001&lt;/a&gt;
		&lt;li&gt;&quot;An information geometric approach to a theory of
pragmatic structuring,&quot; MPI-MIS 52/2000
		&lt;/ul&gt;
	&lt;li&gt;Vijay Balasubramanian, &quot;Statistical Inference, Occam's Razor,
and Statistical Mechanics on the Space of Probability Distributions&quot;,
&lt;cite&gt;Neural Computation&lt;/cite&gt; &lt;strong&gt;9&lt;/strong&gt; (1997): 349--368
	&lt;li&gt;Pierre Baldi et al., &lt;citE&gt;Modeling the Internet and the Web:
Probabilistic Methods and Algorithms&lt;/cite&gt;
	&lt;li&gt;Jayanta Basak, &quot;Online Adaptive Decision Trees&quot;,
&lt;a href=&quot;http://neco.mitpress.org/cgi/content/abstract/16/9/1959&quot;&gt;&lt;cite&gt;Neural
Computation&lt;/cite&gt; &lt;strong&gt;16&lt;/strong&gt; (2004): 1959--1981&lt;/a&gt;
	&lt;li&gt;William Bechtel and Robert C. Richardson, &lt;cite&gt;Discovering
Complexity: Decomposition and Localization as Strategies in Scientific
Research&lt;/cite&gt; [&lt;A href=&quot;http://pup.princeton.edu/titles/4971.html&quot;&gt;Blurb&lt;/a&gt;]
	&lt;li&gt;Sergey V. Beiden, Marcus A. Maloof and Robert F. Wagner, &quot;A
General Model for Finite-Sample Effects in Training and Testing of Competing
Classifiers&quot;, &lt;cite&gt;IEEE Transactions on Pattern Analysis and Machine
Intelligence&lt;/cite&gt; &lt;strong&gt;25&lt;/strong&gt; (2003): 1561--1569
	&lt;li&gt;Ron Bekkerman, Mikhail Bilenko and John Langford (eds.), &lt;cite&gt;Scaling
up Machine Learning: Parallel and Distributed Approaches&lt;/cite&gt; [&lt;a href=&quot;http://ambride.org/9780521192248&quot;&gt;Blurb&lt;/a&gt;]
	&lt;li&gt;D. Paul Benjamin (ed.), &lt;cite&gt;Change of Representation and
Inductive Bias&lt;/cite&gt;
	&lt;li&gt;James Blachowicz, &lt;cite&gt;Of Two Minds: The Nature of Inquiry&lt;/cite&gt;
[From the back cover: &quot;The logic of &lt;em&gt;correction&lt;/em&gt; developed here
directly opposes the claim made by &lt;a href=&quot;evol-epistem.html&quot;&gt;evolutionary
epistemologists&lt;/a&gt; such as &lt;a href=&quot;popper.html&quot;&gt;Popper&lt;/a&gt; and Campbell that
there is no such thing as a 'logical method for having new ideas.' ... This
comprehensive and revolutionary theory challenges traditional epistemology's
conception of justification and provides substantial new interpretations of the
nature of ampliative inference, representation and meaning, Platonic and
Hegelian dialectic, Kantian analysis, the heuristic function of models and
metaphors, and the role of inquiry in the constitution of human
consciousness.&quot;  All this in only four hundred pages!  But the stuff on a
logic of correction is very important --- if correct.]
	&lt;li&gt;Gilles Blanchard and Donald Geman, &quot;Hierarchical testing designs
for pattern recognition&quot;, &lt;a
href=&quot;http://arxiv.org/abs/math.ST/0507421&quot;&gt;math.ST/0507421&lt;/a&gt; = &lt;a
href=&quot;http://dx.doi.org/10%2E1214/009053605000000174&quot;&gt;&lt;cite&gt;Annals of
Statistics&lt;/cite&gt; &lt;strong&gt;33&lt;/strong&gt; (2005): 1155--1202&lt;/a&gt;
	&lt;li&gt;Hendrik Blockeel and Jan Struyf, &quot;Efficient algorithms for
decision tree cross-validation,&quot;
&lt;a href=&quot;http://arxiv.org/abs/cs.LG/0110036&quot;&gt;cs.LG/0110036&lt;/a&gt;
	&lt;li&gt;Abrim Blum, Adam Kalai and Hal Wasserman, &quot;Noise-Tolerant
Learning, the Parity Problem, and the Statistical Query Model,&quot;
&lt;a href=&quot;http://arxiv.org/abs/cs.LG/0010022&quot;&gt;cs.LG/0010022&lt;/a&gt;
	&lt;li&gt;Leo Breiman, &quot;Prediction Games and Arcing Algorithms,&quot; &lt;a
href=&quot;http://neco.mitpress.org/cgi/content/abstract/11/7/1493&quot;&gt;&lt;cite&gt;Neural
Computation&lt;/citE&gt; &lt;strong&gt;11&lt;/strong&gt; (1999): 1493--1517&lt;/a&gt;
	&lt;li&gt;Robert Alan Brown, &lt;cite&gt;Machines that Learn: Based on the
Principle of Empirical Control&lt;/cite&gt;
	&lt;li&gt;Christopher J. C. Burges, &quot;Dimension Reduction: A Guided Tour&quot;,
&lt;a href=&quot;http://dx.doi.org/10.1561/2200000002&quot;&gt;&lt;cite&gt;Foundations and Trends in Machine Learning&lt;/cite&gt; &lt;strong&gt;2:4&lt;/strong&gt; (2010)&lt;/a&gt; [&lt;a href=&quot;http://research.microsoft.com/apps/pubs/default.aspx?id=80833&quot;&gt;Preprint version&lt;/a&gt;]
	&lt;li&gt;Meir Buzaglo, &lt;cite&gt;The Logic of Concept Expansion&lt;/cite&gt;
[&lt;a href=&quot;http://cambridge.org/052180762X&quot;&gt;blurb&lt;/a&gt;]
	&lt;li&gt;Adam Cannon, J. Mark Ettinger, Don Hush, and Clint Scovel,
&quot;Machine Learning with Data Dependent Hypothesis Classes,&quot; &lt;cite&gt;Journal of Machine Learning Research&lt;/cite&gt;
&lt;strong&gt;2&lt;/strong&gt; (2002): 335--358
	&lt;li&gt;Philip Ellery Catton, &quot;The Justification(s) of Induction(s),&quot; &lt;a
href=&quot;http://philsci-archive.pitt.edu/documents/disk0/00/00/09/78/&quot;&gt;online&lt;/a&gt;
	&lt;li&gt;Tommy W. S. Chow and D. Huang, &quot;Estimating Optimal Feature Subsets
Using Efficient Estimation of High-Dimensional Mutual Information&quot;, &lt;cite&gt;IEEE
Transactions on Neural Networks&lt;/cite&gt; &lt;strong&gt;16&lt;/strong&gt; (2005): 213--224
	&lt;li&gt;Andy Clark and Chris Thornton, &quot;Trading Spaces: Computation,
Representation and the Limits of Uninformed Learning,&quot; &lt;cite&gt;Behavioral and
Brain Sciences&lt;/cite&gt; (1997) &lt;strong&gt;20&lt;/strong&gt;:57--90
[&lt;a href=&quot;http://www.bbsonline.org/documents/a/00/00/04/44/&quot;&gt;Draft&lt;/a&gt;]
	&lt;li&gt;Bertrand Clarke, &quot;Desiderata for a Predictive Theory of Statistics&quot;,
&lt;a href=&quot;http://ba.stat.cmu.edu/abstracts/Clarke.php&quot;&gt;&lt;cite&gt;Bayesian Analysis&lt;/cite&gt; &lt;strong&gt;5&lt;/strong&gt; (2010): 1--36&lt;/a&gt;
	&lt;li&gt;David Corfield, &quot;Varieties of Justification in Machine Learning&quot;,
&lt;a href=&quot;http://dx.doi.org/10.1007/s11023-010-9191-1&quot;&gt;&lt;cite&gt;Minds and Machines&lt;/cite&gt; &lt;strong&gt;20&lt;/strong&gt;
(2010): 291--301&lt;/a&gt;
	&lt;li&gt;Mark Culp, George Michailidis and Kjell Johnson, &quot;On multi-view
learning with additive models&quot;, &lt;cite&gt;Annals of Applied
Statistics&lt;/cite&gt; &lt;strong&gt;3&lt;/strong&gt; (2009): 292--318
= &lt;a href=&quot;http://arxiv.org/abs/0906.1117&quot;&gt;arxiv:0906.1117&lt;/a&gt;
	&lt;li&gt;Marco Cuturi and Kenji Fukumizu, &quot;Multiresolution Kernels&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cs.LG/0507033&quot;&gt;cs.LG/0507033&lt;/a&gt;
	&lt;li&gt;Peter Dayan, &quot;Recurrent Sampling Models for the Helmholtz
Machine,&quot; &lt;a
href=&quot;http://neco.mitpress.org/cgi/content/abstract/11/3/653&quot;&gt;&lt;cite&gt;Neural
Computation&lt;/cite&gt; &lt;strong&gt;11&lt;/strong&gt; (1999): 653--677&lt;/a&gt;
	&lt;li&gt;Carlos R.  de la Mora B., Carlos Gershenson and Angelica
Garcia-Vega, &quot;The role of behavior modifiers in representation development&quot;,
&lt;a href=&quot;http://arxiv.org/abs/cs.AI/0403006&quot;&gt;cs.AI/0403006&lt;/a&gt;
	&lt;li&gt;Luc Devroye et al., &lt;cite&gt;A Probabilistic Theory of Pattern
Recognition&lt;/cite&gt;
	&lt;li&gt;&lt;a href=&quot;http://www.cs.orst.edu/~tgd&quot;&gt;Thomas G. Dietterich&lt;/a&gt;,
&quot;Machine Learning for Sequential Data&quot;
[&lt;a
href=&quot;http://web.engr.oregonstate.edu/~tgd/publications/mlsd-ssspr.pdf&quot;&gt;PDF&lt;/a&gt;.
Thanks to Gustavo Lacerda for a pointer.]
	&lt;li&gt;Nicola Di Mauro, Teresa M.A. Basile, Stefano Ferilli, Floriana Esposito, &quot;Feature Construction for Relational Sequence Learning&quot;, &lt;a href=&quot;http://arxiv.org/abs/1006.5188&quot;&gt;arxiv:1006.5188&lt;/a&gt;
	&lt;li&gt;Pedro Domingos [All from &lt;a
href=&quot;http://www.cs.washington.edu/homes/pedrod/&quot;&gt;his web-site&lt;/a&gt;]
		&lt;ul&gt;
		&lt;li&gt;A General Method for Scaling Up Machine Learning Algorithms
and its Application to Clustering
		&lt;li&gt;Mining High-Speed Data Streams
		&lt;li&gt;Mining Time-Changing Data Streams
		&lt;/ul&gt;
	&lt;li&gt;Dowe, Korb and Oliver (eds.), &lt;cite&gt;Information, Statistics and
Induction in Science&lt;/cite&gt;
	&lt;li&gt;Deniz Erdogmus, Kenneth E. Hild, II, Yadunandana N. Rao and
Jos&amp;eacute; C. Pr&amp;iacute;ncipe, &quot;Minimax Mutual Information Approach for
Independent Component Analysis&quot;, &lt;a
href=&quot;http://neco.mitpress.org/cgi/content/abstract/16/6/1235&quot;&gt;&lt;cite&gt;Neural
Computation&lt;/cite&gt; &lt;strong&gt;16&lt;/strong&gt; (2004): 1235--1252&lt;/a&gt;
	&lt;li&gt;Oleg V. Favorov and Dan Ryder, &quot;SINBAD: A neocortical mechanism for
discovering environmental variables and regularities hidden in sensory
input&quot;, &lt;a
href=&quot;http://dx.doi.org/doi:10.1007/s00422-004-0464-8&quot;&gt;&lt;cite&gt;Biological
Cybernetics&lt;/cite&gt; &lt;strong&gt;90&lt;/strong&gt; (2004): 191--202&lt;/a&gt;
	&lt;li&gt;Aidan Feeney and Evan Heit (eds.), &lt;cite&gt;Inductive Reasoning:
Experimental, Developmental, and Computational Approaches&lt;/cite&gt;
[&lt;a href=&quot;http://cambridge.org/0521672449&quot;&gt;blurb&lt;/A&gt;]
	&lt;li&gt;Jacob Feldman, &quot;How surprising is a simple pattern? Quantifying
'Eureka!',&quot; &lt;a
href=&quot;http://dx.doi.org/10.1016/j.cognition.2003.09.013&quot;&gt;&lt;cite&gt;Cognition&lt;/cite&gt;
&lt;strong&gt;93&lt;/strong&gt;(2004): 199--224&lt;/a&gt; [Claims to (a) have a psychologically
valid measure of &lt;em&gt;subjective&lt;/em&gt; complexity, and (b) derive a null
distribution for it!]
	&lt;li&gt;David Finton, &quot;When Do Differences Matter?  On-Line Feature
Extraction Through Cognitive Economy&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cs.LG/0404032&quot;&gt;cs.LG/0404032&lt;/a&gt;
= &lt;a href=&quot;http://dx.doi.org/10.1016/j.cogsys.2004.06.005&quot;&gt;&lt;cite&gt;Cognitive
Systems Research&lt;/cite&gt; &lt;strong&gt;6&lt;/strong&gt; (2005): 263--281&lt;/a&gt;
	&lt;li&gt;Gary William Flake, &quot;The Calculus of Jacobian Adaptation&quot;
	&lt;li&gt;Francois Fleuret and Eric Brunet, &quot;DEA: An Architecture for Goal
Planning and Classification,&quot; &lt;cite&gt;Neural Computation&lt;/cite&gt;
&lt;strong&gt;12&lt;/strong&gt; (2000): 1987--2008
	&lt;li&gt;Flocchini &lt;em&gt;et al.&lt;/em&gt; (eds.), &lt;cite&gt;Structure, Information and
Communication Complexity&lt;/cite&gt;
	&lt;li&gt;Malcolm R. Forster, &quot;How do Simple Rules 'Fit to Reality' in a
Complex World?&quot;, &lt;cite&gt;Minds and Machines&lt;/cite&gt; &lt;strong&gt;9&lt;/strong&gt; (1999):
543--564 [A take on the Gigerenzer et al. idea of fast and frugal heuristics,
especially their ecological adaptation to the evnironment.  &quot;The main purpose
of this article is to apply these ideas to learning rules --- methods for
constructing, selecting or evaluating competing hypotheses in science, and to
the methodology of machine learning...  The bad news is that ecological
validity is particularly difficult to implement and difficult to understand.
The good news is that it builds an important bridge from normative psychology
and machine learning to recent work in the philosophy of science, which
considers predictive accuracy to be a primary goal of science.&quot;]
	&lt;li&gt;Paul Franchesi, &quot;A Solution to Goodman's Paradox,&quot;
&lt;cite&gt;Dialogue&lt;/cite&gt; &lt;strong&gt;40&lt;/strong&gt; (2001) [&lt;a
href=&quot;http://cogprints.soton.ac.uk/documents/disk0/00/00/21/76/&quot;&gt;online&lt;/a&gt;]
	&lt;li&gt;Vinod Goel and Raymond J. Dolan, &quot;Differential involvement of left
prefrontal cortex in inductive and deductive reasoning&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1016/j.cognition.2004.03.001&quot;&gt;&lt;cite&gt;Cognition&lt;/cite&gt;
&lt;strong&gt;93&lt;/strong&gt; (2004): B109--B121&lt;/a&gt;
	&lt;li&gt;Yair Goldberg, Alon Zakai, Dan Kushnir, Ya'acov Ritov, &quot;Manifold Learning: The Price of Normalization&quot;,
&lt;a href=&quot;http://jmlr.csail.mit.edu/papers/v9/goldberg08a.html&quot;&gt;&lt;cite&gt;Journal of Machine Learning Research&lt;/cite&gt; &lt;strong&gt;9&lt;/strong&gt; (2008): 1909--1939&lt;/a&gt;
	&lt;li&gt;John C. Gower and J&amp;ouml;rg Blasius, &quot;Multivariate Prediction with
Nonlinear Principal Components Analysis&quot;
		&lt;ul&gt;
		&lt;li&gt;&quot;Theory&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1007/s11135-005-3005-1&quot;&gt;&lt;cite&gt;Quality and
Quantity&lt;/cite&gt; &lt;strong&gt;39&lt;/strong&gt; (2005): 359--372&lt;/a&gt;
		&lt;li&gt;&quot;Application&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1007/s11135-005-3006-0&quot;&gt;&lt;cite&gt;Quality and
Quantity&lt;/cite&gt; &lt;strong&gt;39&lt;/strong&gt; (2005): 373--390&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;Ulf Grenander, &lt;cite&gt;Abstract Inference&lt;/cite&gt;
	&lt;l&gt;iUlf Grenander and Michael Miller, &lt;cite&gt;Pattern Theory: From Representation to Inference&lt;/cite&gt;
	&lt;li&gt;Laszlo Gyorfi et al., &lt;cite&gt;A Distribution-Free Theory of
Nonparametric Regression&lt;/cite&gt;
	&lt;li&gt;Stephen Jos&amp;eacute; Hanson et al., eds., &lt;cite&gt;Computational
Learning Theory and Natural Learning Systems&lt;/cite&gt;
		&lt;ul&gt;
		&lt;li&gt;I: &lt;cite&gt;Constraints and Prospects&lt;/cite&gt;
		&lt;li&gt;II: &lt;cite&gt;Interactions between Theory and Experiment&lt;/cite&gt;
		&lt;/ul&gt;
	 &lt;li&gt;Petr Hajek and Martin Holena, &quot;Formal logics of discovery and
hypothesis formation by machine,&quot; &lt;a
href=&quot;http://dx.doi.org/10.1016/S0304-3975(02)00175-5&quot;&gt;&lt;cite&gt;Theoretical
Computer Science&lt;/cite&gt; &lt;strong&gt;292&lt;/strong&gt; (2002): 345-357&lt;/a&gt;
	&lt;li&gt;Peter Hall and Qiwei Yao, &quot;Approximating conditional distribution
functions using dimension reduction&quot;, &lt;a
href=&quot;http://arxiv.org/abs/math.ST/0507432&quot;&gt;math.ST/0507432&lt;/a&gt; = &lt;a
href=&quot;http://dx.doi.org/10%2E1214/009053604000001282&quot;&gt;&lt;cite&gt;Annals of
Statistics&lt;/cite&gt; &lt;strong&gt;33&lt;/strong&gt; (2005): 1404--1421&lt;/a&gt;
	&lt;li&gt;Gilbert H. Harman, &quot;The Inference to the Best Explanation&quot;,
&lt;citE&gt;&lt;a href=&quot;&quot;&gt;The Philosophical Review&lt;/cite&gt; &lt;strong&gt;74&lt;/strong&gt; (1965):
88--95 [&lt;a href=&quot;http://www.jstor.org/pss/2183532&quot;&gt;JSTOR&lt;/a&gt;; thanks to Kenny
Easwaran for the pointer]
	&lt;li&gt;Patrick Heas and Mihai Datcu, &quot;Supervised learning on graphs of
spatio-temporal similarity in satellite image
sequences&quot;, &lt;a href=&quot;http://arxiv.org/abs/0709.3013&quot;&gt;0709.3013&lt;/a&gt;
	&lt;li&gt;Jaako Hintikka
		&lt;ul&gt;
		&lt;li&gt;&lt;cite&gt;Socratic Epistemology: Explorations of
Knowledge-Seeking by Questioning&lt;/cite&gt;
[&lt;a href=&quot;http://cambridge.org/0521616514&quot;&gt;blurb&lt;/a&gt;]
		&lt;li&gt;&lt;cite&gt;Inquiry as Inquiry: A Logic of Scientific Discovery&lt;/cite&gt;
		&lt;/ul&gt;
	&lt;li&gt;Yk&amp;auml; Huhtala, Juha K&amp;auml;rkk&amp;auml;inen, Pasi Porkka and Hannu
Toivonen, &quot;TANE: An Efficient Algorithm for Discovering Functional and
Approximate Dependencies,&quot; &lt;cite&gt;The Computer Journal&lt;/cite&gt;
&lt;strong&gt;42&lt;/strong&gt; (1999): 100--111
	&lt;li&gt;Christian Igel and Marc Toussaint, &quot;On Classes of Functions for
which No Free Lunch Results Hold,&quot; &lt;a
href=&quot;http://arXiv.org/abs/cs/0108011&quot;&gt;cs.NE/0108011&lt;/a&gt;
	&lt;li&gt;Lancelot F. James, David J. Marchette and Carey Priebe, &quot;Consistent
estimation of mixture
complexity&quot;, &lt;a href=&quot;http://dx.doi.org/10.1214/aos/1013203454&quot;&gt;&lt;cite&gt;Annals of
Statistics&lt;/cite&gt; &lt;strong&gt;29&lt;/strong&gt; (2001): 1281--1296&lt;/a&gt;
	&lt;li&gt;John R. Josephson and Susan G. Josephson (eds.), &lt;cite&gt;Abductive
Inference: Computation, Philosophy, Technology&lt;/cite&gt;
[&lt;a href=&quot;http://cambridge.org/0521434610&quot;&gt;blurb&lt;/a&gt;]
	&lt;li&gt;Yuri Kalnishkan, Vladimir Vovk and Michael V. Vyugin, &quot;How many
strings are easy to predict?&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1016/j.ic.2005.04.001&quot;&gt;&lt;cite&gt;Information and
Computation&lt;/citE&gt; &lt;strong&gt;201&lt;/strong&gt; (2005): 55--71&lt;/a&gt; [&quot;It is well known
in the theory of Kolmogorov complexity that most strings cannot be compressed;
more precisely, only exponentially few (O(2^n-m)) binary strings of length n
can be compressed by m bits. This paper extends the 'incompressibility'
property of Kolmogorov complexity to the 'unpredictability' property of
predictive complexity. The 'unpredictability' property states that predictive
complexity (defined as the loss suffered by a universal prediction algorithm
working infinitely long) of most strings is close to a trivial upper bound (the
loss suffered by a trivial minimax constant prediction strategy). We show that
only exponentially few strings can be successfully predicted and find the base
of the exponent.&quot;]
	&lt;li&gt;Michael Kearns and Dana Ron, &quot;Algorithmic Stability and
Sanity-Check Bounds for Leave-One-Out Cross-Validation,&quot; &lt;a
href=&quot;http://neco.mitpress.org/cgi/content/abstract/11/6/1427&quot;&gt;&lt;cite&gt;Neural
Computation&lt;/cite&gt; &lt;strong&gt;11&lt;/strong&gt; (1999): 1427--1453&lt;/a&gt;
	&lt;li&gt;Kevin T. Kelly
		&lt;ul&gt;
		&lt;li&gt;&lt;cite&gt;The Logic of Reliable Inquiry&lt;/cite&gt;
[Includes cartoons by the author]
		&lt;li&gt;&quot;How Simplicity Helps You Find the Truth without Pointing
at It&quot;
		&lt;li&gt;&quot;Simplicity, Truth, and the Unending Game of Science&quot;
[&lt;a href=&quot;http://www.hss.cmu.edu/philosophy/kelly/papers/bonn5.pdf&quot;&gt;PDF
preprint&lt;/a&gt;]
		&lt;/ul&gt;
	&lt;li&gt;Eric D. Kolaczyk and Robert D. Nowak, &quot;Multiscale likelihood
analysis and complexity penalized estimation&quot;, &lt;a
href=&quot;http://arxiv.org/abs/math.ST/0406424&quot;&gt;math.ST/0406424&lt;/a&gt; = &lt;cite&gt;Annals
of Statistics&lt;/cite&gt;
&lt;strong&gt;32&lt;/strong&gt; (2004): 500--527
	&lt;li&gt;Ingo Kreuz and Dieter Roller, &quot;Relevant Knowledge First:
Reinforcement Learning and Forgetting in Knowledge Based Configuration,&quot; &lt;a
href=&quot;http://arXiv.org/abs/cs/0109034&quot;&gt;cs.AI/0109034&lt;/a&gt;
	&lt;li&gt;Henry E. Kyburg Jr. and Choh Man Teng, &quot;Evaluating Defaults,&quot; &lt;a
href=&quot;http://arxiv.org/abs/cs.AI/0207083&quot;&gt;cs.AI/0207083&lt;/a&gt;
	&lt;li&gt;Steffen Lange and Gunter Grieser, &quot;Variants of iterative
learning,&quot; &lt;a
href=&quot;http://dx.doi.org/10.1016/S0304-3975(02)00176-7&quot;&gt;&lt;cite&gt;Theoretical
Computer Science&lt;/cite&gt; &lt;strong&gt;292&lt;/strong&gt; (2002): 359--376&lt;/a&gt;
	&lt;li&gt;Nicolas Le Roux and Yoshua Bengio, &quot;Deep Belief Networks Are Compact Universal Approximators&quot;, &lt;a href=&quot;http://dx.doi.org/10.1162/neco.2010.08-09-1081&quot;&gt;&lt;cite&gt;Neural
Computation&lt;/cite&gt; &lt;strong&gt;22&lt;/strong&gt; (2010): 2192--2207&lt;/a&gt;
	&lt;li&gt;F. Liang and A. Barron, &quot;Exact Minimax Strategies for Predictive
Density Estimation, Data Compression, and Model Selection&quot;, &lt;a
href=&quot;http://dx.doi.org/0.1109/TIT.2004.836922&quot;&gt;&lt;cite&gt;IEEE Transactions on
Information Theory&lt;/cite&gt; &lt;strong&gt;50&lt;/strong&gt; (2004): 2708--2726&lt;/a&gt;
	&lt;li&gt;Stephen Luttrell, &quot;Using Self-Organising Mappings to Learn the
Structure of Data Manifolds&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cs.NE/0406017&quot;&gt;cs.NE/0406017&lt;/a&gt;
	&lt;li&gt;David J. C. MacKay, &lt;cite&gt;Information Theory, Inference and
Learning Algorithms&lt;/cite&gt; [&lt;a
href=&quot;http://www.inference.phy.cam.ac.uk/mackay/itprnn/book.html&quot;&gt;Online
version&lt;/a&gt;]
	&lt;li&gt;Sridhar Mahadevan, &lt;cite&gt;Representation Discovery Using Harmonic
Analysis&lt;/cite&gt;
[&lt;a href=&quot;http://dx.doi.org/10.2200/S00130ED1V01Y200806AIM004&quot;&gt;blurb&lt;/a&gt;]
	&lt;li&gt;Gideon S. Mann and Andrew McCallum, &quot;Generalized
expectation criteria for semi-supervised learning with weakly
labeled data&quot;, &lt;a href=&quot;http://jmlr.csail.mit.edu/papers/v11/mann10a.html&quot;&gt;&lt;cite&gt;Journal of Machine Learning Research&lt;/citE&gt;
&lt;strong&gt;11&lt;/strong&gt; (2010): 955--984&lt;/a&gt;
	&lt;li&gt;Heikki Mannila and Kari-Jouko R&amp;auml;ih&amp;auml;, &quot;On the complexity
of inferring functional dependencies,&quot; &lt;a
href=&quot;http://dx.doi.org/10.1016/0166-218X(92)90031-5&quot;&gt;&lt;cite&gt;Discrete Applied
Mathematics&lt;/cite&gt; &lt;strong&gt;40&lt;/strong&gt; (1992): 237--243&lt;/a&gt;
	&lt;li&gt;Martin and Osherson, &lt;cite&gt;Elements of Scientific Inquiry&lt;/cite&gt;
[A good introduction to the theory of formal learning, especially of recursive
functions in the absence of noise.  Not even hand-waving that this is a
sensible idealization of what scientists do.]
	&lt;li&gt;Geoffrey J. McLachlan, &lt;cite&gt;Discriminant Analysis and Statistical
Pattern Recognition&lt;/cite&gt;
	&lt;li&gt;Geoffrey J. McLachlan and David Peel, &lt;cite&gt;Finite Mixture
Models&lt;/cite&gt;
	&lt;li&gt;Abraham Meidan and Boris Levin, &quot;Choosing from Competing Theories
in Computerised Learning&quot;, &lt;cite&gt;Minds and Machines&lt;/citE&gt; &lt;strong&gt;12&lt;/strong&gt;
(2002): 119--129
	&lt;li&gt;I. J. Myung, Vijay Balasubramanian and M. A. Pitt, &quot;Counting
probability distributions: Differential geometry and model selection&quot;,
&lt;a
href=&quot;http://dx.doi.org/10.1073/pnas.170283897&quot;&gt;&lt;cite&gt;Proceedings of the National Academy of Sciences&lt;/cite&gt; (USA)
&lt;strong&gt;97&lt;/strong&gt; (2000): 11170--11175&lt;/a&gt;
	&lt;li&gt;National Research Council, &lt;cite&gt;Massive Data Sets&lt;/cite&gt;
[&lt;a href=&quot;http://books.nap.edu/html/massdata/&quot;&gt;Online&lt;/a&gt;]
	&lt;li&gt;O. Nelles, &lt;cite&gt;Nonlinear System Identification&lt;/cite&gt;
	&lt;li&gt;Ilya Nemenman, &quot;Fluctuation-Dissipation Theorem and Models of
Learning&quot;, &lt;a
href=&quot;http://neco.mitpress.org/cgi/content/abstract/17/9/2006&quot;&gt;&lt;cite&gt;Neural
Computation&lt;/cite&gt; &lt;strong&gt;17&lt;/strong&gt; (2005): 2006--2033&lt;/a&gt; [&quot;We analyze how
various abstract Bayesian learners perform on different data and argue that it
is difficult to determine which learning-theoretic computation is performed by
a particular organism using just its performance in learning a stationary
target (learning curve). Based on the fluctuation-dissipation relation in
statistical physics, we then discuss a different experimental setup that might
be able to solve the problem.&quot;]
	&lt;li&gt;Liam Paninski, &quot;Asymptotic Theory of Information-Theoretic
Experimental Design&quot;, &lt;a
href=&quot;http://neco.mitpress.org/cgi/content/abstract/17/7/1480&quot;&gt;&lt;cite&gt;Neural
Computation&lt;/cite&gt; &lt;strong&gt;17&lt;/strong&gt; (2005): 1480--1507&lt;/a&gt;
	&lt;li&gt;Hanchuan Peng, Fuhui Long and Chris Ding, &quot;Feature Selection Based
on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and
Min-Redundancy&quot;, &lt;a href=&quot;http://dx.doi.org/10.1109/TPAMI.2005.159&quot;&gt;&lt;cite&gt;IEEE
Transactions on Pattern Analysis and Machine
Intelligence&lt;/cite&gt; &lt;strong&gt;27&lt;/strong&gt; (2005): 1226--1238&lt;/a&gt; [This sounds
like an idea I had in 2002, and was too dumb/lazy to follow up on.]
	&lt;li&gt;Leonid Peshkin, Kee-Eung Kim, Nicolas Meuleau and Leslie Pack
Kaelbling, &quot;Learning to Cooperate via Policy Search,&quot;
&lt;a href=&quot;http://arXiv.org/abs/cs/0105032&quot;&gt;cs.LG/0105032&lt;/a&gt;
	&lt;li&gt;Leonid Peshkin and Christian R. Shelton, &quot;Learning from Scarce
Experience,&quot;
&lt;a href=&quot;http://arxiv.org/abs/cs.AI/0204043&quot;&gt;cs.AI/0204043&lt;/a&gt;
	&lt;li&gt;&lt;a href=&quot;http://www-cs-students.stanford.edu/~kpfleger/&quot;&gt;Karl
Pfleger&lt;/a&gt;
		&lt;ul&gt;
		&lt;li&gt;On-Line Learning of Undirected Sparse n-grams
		&lt;li&gt;Learning Predictive Compositional Hierarchies
[&lt;a href=&quot;http://www-cs-students.stanford.edu/~kpfleger/publications/exploratory.ps.gz&quot;&gt;PS.gz&lt;/a&gt;]
		&lt;/ul&gt;
	&lt;li&gt;Fenna H. Poletiek, &lt;cite&gt;Hypothesis Testing Behaviour&lt;/cite&gt;
[&lt;a href=&quot;http://users.fmg.uva.nl/dborsboom/borsboomPoletiek2001.pdf&quot;&gt;Review by
Denny Borsboom&lt;/a&gt;]
	&lt;li&gt;Joel B. Predd, Sanjeev R. Kulkarni and H. Vincent Poor
		&lt;ul&gt;
		&lt;li&gt;&quot;Consistency in Models for Distributed Learning under
Communication Constraints&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cs.IT/0503071&quot;&gt;cs.IT/0503071&lt;/a&gt;
		&lt;li&gt;&quot;Distributed Learning in Wireless Sensor Networks&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cs.IT/0503072&quot;&gt;cs.IT/0503072&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;Detlef Prescher, &quot;A Tutorial on the Expectation-Maximization
Algorithm Including Maximum-Likelihood Estimation and EM Training of
Probabilistic Context-Free Grammars&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cs.CL/0412015&quot;&gt;cs.CL/0412015&lt;/a&gt;
	&lt;li&gt;Vasin Punyakanok and Dan Roth, &quot;The Use of Classifiers in
Sequential Inference,&quot;
&lt;a href=&quot;http://arxiv.org/abs/cs.LG/0111003&quot;&gt;cs.LG/0111003&lt;/a&gt;
	&lt;li&gt;Joaquin Quinonero-Candela, Masashi Sugiyama, Anton Schwaighofer and Neil D. Lawrence (eds.), &lt;cite&gt;Dataset Shift in Machine Learning&lt;/cite&gt;
[&lt;a href=&quot;http://mitpress.mit.edu/978-0-262-17005-5&quot;&gt;blurb&lt;/a&gt;]
	&lt;li&gt;Maxim Raginsky, &quot;A complexity-regularized quantization approach to
nonlinear dimensionality reduction&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cs.IT/0501091&quot;&gt;cs.IT/0501091&lt;/a&gt;
	&lt;li&gt;Magnus Rattray, &quot;Stochastic trapping in a solvable model of
on-line independent component analysis,&quot;
&lt;a href=&quot;http://arxiv.org/abs/cond-mat/0105057&quot;&gt;cond-mat/0105057&lt;/a&gt;
	&lt;li&gt;Lorenzo Rosasco, Mikhail Belkin, Ernesto De Vito, &quot;On Learning with
Integral
Operators&quot;, &lt;a href=&quot;http://jmlr.csail.mit.edu/papers/v11/rosasco10a.html&quot;&gt;&lt;cite&gt;Journal
of Machine Learning REsearch&lt;/cite&gt; &lt;strong&gt;11&lt;/strong&gt; (2010): 905--934&lt;/a&gt;
	&lt;li&gt;Dan Roth, &quot;Learning in Natural Language: Theory and Algorithmic
Approaches&quot; [&lt;a
href=&quot;http://l2r.cs.uiuc.edu/cgi-bin/papers.pl?file=lnlp-conll.html&quot;&gt;online&lt;/a&gt;]
	&lt;li&gt;Hichem Sahbi and Donald Geman, &quot;A Hierarchy of Support Vector
Machines for Pattern Detection&quot;, &lt;a
href=&quot;http://jmlr.csail.mit.edu/papers/v7/sahbi06a.html&quot;&gt;&lt;citE&gt;Journal of
Machine Learning Research&lt;/cite&gt; &lt;strong&gt;7&lt;/strong&gt; (2006): 2087--2123&lt;/a&gt;
	&lt;li&gt;Erik Sandewall, &lt;cite&gt;Features and Fluents: The Representation of
Knowledge about Dynamical systems&lt;/cite&gt;
	&lt;li&gt;Gerhard Schurz
		&lt;ul&gt;
		&lt;li&gt;&quot;Meta-Induction and the Prediction Game: A New View On Hume's Problem&quot; [&lt;a href=&quot;http://thphil.phil-fak.uni-duesseldorf.de/index.php/article/articleview/356/1/53/&quot;&gt;PDF preprint&lt;/a&gt;]
		&lt;li&gt;&quot;Patterns of Abduction&quot; [&lt;a href=&quot;http://thphil.phil-fak.uni-duesseldorf.de/index.php/article/articleview/363/1/53/&quot;&gt;PDF preprint&lt;/a&gt;]
		&lt;/ul&gt;
	&lt;li&gt;Aris Spanos
		&lt;ul&gt;
		&lt;li&gt;&quot;Statistical Induction, Severe Testing, and Model
Validation&quot; [&lt;a href=&quot;http://www.error06.econ.vt.edu/spanos.pdf&quot;&gt;Preprint&lt;/a&gt;]
		&lt;li&gt;&quot;Revisiting data mining: `hunting' with or without a
license&quot;, &lt;cite&gt;Journal of Economic Methodology&lt;/cite&gt; &lt;strong&gt;7&lt;/strong&gt;
(2000): 231--264 [&lt;a href=&quot;http://www.error06.econ.vt.edu/Spanosa.pdf&quot;&gt;PDF
reprint&lt;/a&gt;]
		&lt;/ul&gt;
	&lt;li&gt;Peter Sollich and Anason Halees, &quot;Learning curves for Gaussian
process regression: Approximations and bounds,&quot;
&lt;a href=&quot;http://arxiv.org/abs/cond-mat/0105015&quot;&gt;cond-mat/0105015&lt;/a&gt;
	&lt;li&gt;&lt;a href=&quot;http://world.std.com/~rjs/pubs.html&quot;&gt;Ray Solomonoff's
Papers&lt;/a&gt;
	&lt;li&gt;Sonnenberg et al., &quot;The SHOGUN Machine Learning Toolbox&quot;,
&lt;a href=&quot;http://jmlr.csail.mit.edu/papers/v11/sonnenburg10a.html&quot;&gt;&lt;cite&gt;Journal of Machine Learning Research&lt;/cite&gt; &lt;strong&gt;11&lt;/strong&gt; (2010): 1799--1802&lt;/a&gt;
	&lt;li&gt;Eduardo D Sontag, &quot;Adaptation Implies Internal Model,&quot;
&lt;a href=&quot;http://arxiv.org/abs/math.OC/0203228&quot;&gt;math.OC/0203228&lt;/a&gt;
	&lt;li&gt;Daria Sorokina, Rich Caruana and Mirek Riedewald, &quot;Additive
Groves of Regression Trees&quot;, ECML 2007 [&lt;a href=&quot;http://www.cs.cornell.edu/~daria/papers/Groves.pdf&quot;&gt;PDF&lt;/a&gt;]
	&lt;li&gt;Daria Sorokina, Rich Caruana, Mirek Riedewald and Daniel Fink,
&quot;Detecting Statistical Interactions with Additive Groves of Trees&quot;, ICML 2008
[&lt;a href=&quot;http://www.cs.cornell.edu/~daria/papers/Interactions.pdf&quot;&gt;PDF&lt;/A&gt;]
	&lt;li&gt;Susanne Still, &quot;Information theoretic approach to interactive learning&quot;, &lt;a href=&quot;http://arxiv.org/abs/0709.1948&quot;&gt;arxiv:0709.1948&lt;/a&gt;
	&lt;li&gt;Ron Sun and C. L. Giles (eds.), &lt;cite&gt;Sequence Learning: Paradigms,
Algorithms, and Applications&lt;/cite&gt;
	&lt;li&gt;Suvrit Sra, Sebastian Nowozin and Stephen J. Wright (eds.), 
&lt;cite&gt;Optimization for Machine Learning&lt;/cite&gt; [&lt;a href=&quot;http://mitpress.mit.edu/9780262016469&quot;&gt;Blurb&lt;/a&gt;]
	&lt;li&gt;Eiji Takimoto and Akira Maruoka, &quot;Top-down decision tree learning
as information based boosting,&quot; &lt;a
href=&quot;http://dx.doi.org/10.1016/S0304-3975(02)00181-0&quot;&gt;&lt;cite&gt;Theoretical
Computer Science&lt;/cite&gt; &lt;strong&gt;292&lt;/strong&gt; (2002): 447-464&lt;/a&gt;
	&lt;li&gt;Sebastian Thrun and Lorien Pratt (eds.), &lt;cite&gt;Learning to
Learn&lt;/cite&gt;
	&lt;li&gt;Robert Tibshirani and Larry Wasserman, &quot;Correlation-sharing for
detection of differential gene
expression&quot;, &lt;a href=&quot;http://arxiv.org/abs/math.ST/0608061&quot;&gt;math.ST/0608061&lt;/a&gt;
[&quot;Our proposal averages the univariate scores of each feature with the scores
in correlation neighborhoods. ...  The general idea of correlation-sharing can
be applied to other prediction problems involving a large number of correlated
features.&quot;]
	&lt;li&gt;Nicholas B. Turk-Browne, Brian J. Scholl, Marvin M. Chun, and Marcia K. Johnson, &quot;Neural Evidence of Statistical Learning; Efficient Detection
of Visual Regularities Without Awareness&quot;, &lt;cite&gt;&lt;a href=&quot;http://dx.doi.org/10.1162/jocn.2009.21131&quot;&gt;Journal of Cognitive Neuroscience&lt;/cite&gt; &lt;strong&gt;21&lt;/strong&gt; (2009): 1934--1945&lt;/a&gt;
	&lt;li&gt;Richard Turner, Maneesh Sahani, &quot;A Maximum-Likelihood
Interpretation for Slow Feature Analysis&quot;, &lt;a
href=&quot;http://neco.mitpress.org/cgi/content/abstract/19/4/1022&quot;&gt;&lt;cite&gt;Neural
Computation&lt;/cite&gt;
&lt;strong&gt;19&lt;/strong&gt; (2007): 1022-1038&lt;/a&gt;
	&lt;li&gt;Peter D. Turney, &quot;How to shift bias: Lessons from the Baldwin
effect,&quot; &lt;cite&gt;Evolutionary Computation&lt;/cite&gt; &lt;strong&gt;4&lt;/strong&gt;
(1996): 271-295 [&lt;a
href=&quot;http://cogprints.soton.ac.uk/documents/disk0/00/00/18/18/&quot;&gt;online&lt;/a&gt;]
	&lt;li&gt;Laurens van der Maaten and Geoffrey Hinton, &quot;Visualizing Data using
t-SNE&quot;, &lt;a
href=&quot;http://jmlr.csail.mit.edu/papers/v9/vandermaaten08a.html&quot;&gt;&lt;cite&gt;Journal
of Machine Learning Research&lt;/citE&gt;
&lt;strong&gt;9&lt;/strong&gt; (2008): 2579--2605&lt;/a&gt; [SNE = &quot;stochastic neighbor
embedding&quot;, a manifold-learning technique]
	&lt;li&gt;D. Volk and M. G. Stepanov, &quot;Resampling methods for document
clustering,&quot;
&lt;a href=&quot;http://arxiv.org/abs/cond-mat/0109006&quot;&gt;cond-mat/0109006&lt;/a&gt;
	&lt;li&gt;Grace Wahba, &quot;An introduction to (smoothing spline) ANOVA models in
RKHS with examples in geographical data, medicine, atmospheric science and
machine learning&quot;, &lt;a
href=&quot;http://arxiv.org/abs/math.ST/0410419&quot;&gt;math.ST/0410419&lt;/a&gt;
	&lt;li&gt;Xiaohui Wang, J. S. Marron, &quot;A scale-based approach to finding effective dimensionality in manifold learning&quot;, &lt;a href=&quot;http://projecteuclid.org/euclid.ejs/1205761031&quot;&gt;&lt;cite&gt;Electronic Journal of Statistics&lt;/cite&gt; &lt;strong&gt;2&lt;/strong&gt; (2008): 127--148&lt;/a&gt;, &lt;a href=&quot;http://arxiv.org/abs/0710.5349&quot;&gt;arxiv:0710.5349&lt;/a&gt;
	&lt;li&gt;Satoshi Watanabe, &lt;cite&gt;Knowing and Guessing: A Quantitative Study
of Inference and Information&lt;/cite&gt;
	&lt;li&gt;Ying Yang, Xindong Wu and Xingquan Zhu, &quot;Mining in Anticipation for
Concept Change: Proactive-Reactive Prediction in Data
Streams&quot;, &lt;a href=&quot;http://dx.doi.org/10.1007/s10618-006-0050-x&quot;&gt;&lt;cite&gt;Data
Mining and Knowledge Discovery&lt;/cite&gt; &lt;strong&gt;13&lt;/strong&gt; (2006): 261--289&lt;/a&gt;
	&lt;li&gt;H. Zha, X. He, C. Ding, M. Gu and H. Simon, &quot;Bipartite Graph
Partitioning and Data Clustering,&quot;
&lt;a href=&quot;http://arXiv.org/abs/cs/0108018&quot;&gt;cs.IR/0108018&lt;/a&gt;
	&lt;/ul&gt;

	&lt;ul&gt;To write:
	&lt;li&gt;CRS, &lt;cite&gt;Causal Architecture and Model Discovery: Theory,
Algorithms and Examples&lt;/cite&gt;
	&lt;li&gt;CRS, &quot;Three Kinds of Complexity in Prediction: Induction,
Estimation and Calculation&quot;
	&lt;/ul&gt;
</description>
  </item>
  </channel>
</rss>
