←November→
| Sun |
Mon |
Tue |
Wed |
Thu |
Fri |
Sat |
| 1 |
2 |
3 |
4 |
5 |
6 |
7 |
| 8 |
9 |
10 |
11 |
12 |
13 |
14 |
| 15 |
16 |
17 |
18 |
19 |
20 |
21 |
| 22 |
23 |
24 |
25 |
26 |
27 |
28 |
| 29 |
30 |
|
|
|
|
|
Archives
Categories
Self-Centered
Books to Read While the Algae Grow in Your Fur
Books I've read in the last month or so and
feel I can recommend (warning: I have no taste)
- Rick Geary, Trotsky: A Graphic Biography
- Well-told and well-drawn — though nothing in the rest of the art
matches to the level of the hero/monster panels of the opening pages (and
cover!).
- Sarah
Graves, Mallets
Aforethought
- Jonathan
Israel, A
Revolution of the Mind: Radical Enlightenment and the Intellectual Origins of
Modern Democracy
- The story, or part of the story, of how the outlandish and unprecedented
ideology of a network of radical, subversive scribblers became what we all at
least pay lip-service to. Really deserves a detailed discussion; I'll just say
that there's a lot of fascinating material in here, but also many places where
I felt he didn't really prove his point, even o especially when I was very
sympathetic to what he was saying.
- Stephen Budiansky, The Bloody Shirt: Terror After the Civil War
- Once upon a time, the US Army attempted to bring democracy to a backward
part of the world which had long been wracked by ethnic conflict. There were
some promising beginnings, but the defeated, formerly dominant faction refused
to accept that their relative demotion, and engaged in a vicious,
well-organized campaign of terrorism, which ultimately proved to
be entirely successful. Those who had trusted enough in the power and
benevolence of the United States enough to participate in the governments
ultimately overthrown by "violence and fraud" (in the words of one of the
over-throwers) were lucky to escape with their lives (as many did not).
Minimal democratic norms were not re-established for ninety years or more.
- This is of course the story of the failed Reconstruction of the South after
the civil war, which Budiansky tells by recounting the inter-cut, and
occasionally overlapping, lives of a number of individuals on the
Reconstruction side of the conflict. One of his more effective tactics is to
quote extensively from their letters and journals, as well as from contemporary
books and newspapers. Caveat lector: many of these —
especially the newspapers — are full of vicious racist bile, as well as
the astonishing lies elite white Southerners told to portray themselves as
oppressed victims. (This begins with the story of the "bloody shirt" that
opens the book.) This stuff was hard for me to stomach, and might be too much
for some.
- My biggest complaint with the book is that I wish Budiansky had done more
to tell the stories of black Americans, the way he did with his white subjects
— not that there are none, I hasten to add. I can guess at reasons why
it would be harder to find materials (all of them ultimately having to do with
the fact that Southern blacks were an oppressed people who emerged from slavery
for a few years before being crushed back down to serfdom), but still... That
said, Budiansky's story of crushed hopes, futile bravery and murderous hatred
is wonderfully written and incredibly depressing. I hope that it fills many
American with the sort of patriotic shame which helps us be better.
- Luc Devroye and Gabor Lugosi, Combinatorial Methods in Density
Estimation
- The fundamental theorem of statistics,
says Pitman, is
the Glivenko-Cantelli
theorem: the empirical distribution function Fn of a
large sample of independent, identically-distributed random variables comes
arbitrarily close to their true distribution function F: as n
goes to infinity, maxx |Fn(x)
- F(x)| goes to 0 almost surely. This means that we can
learn any underlying probability distribution to arbitrary accuracy just by
collecting enough data.* Unfortunately the empirical distribution function is
always discrete, so it doesn't have a density, even if the underlying
distribution does. Or, if you like, it has a density, but it's a mixture of
Dirac delta functions. (The convergence is in the sense of "weak convergence"
or "convergence in distribution".) Density estimation is basically about
taking the empirical distribution function and smoothing it so that it has a
well-behaved density. The oldest way of doing this is to build a histogram,
which gives constant densities to intervals; other methods include fitting
function series (Fourier or wavelet expansions) to the data, or using kernels
(replacing each of the delta function spikes with a smooth density, say a
Gaussian bell-shaped curve). The art here is to pick the manner of smoothing,
and the amount of smoothing, so that (1) the convergence promised by
Glivenko-Cantelli for the unsmoothed distribution is not just maintained but is
(2) strengthened to convergence of the estimated density on the true density,
and ideally (3) the latter convergence happens rapidly.
- Devroye and Lugosi's book is devoted to establishing conditions under which
common density estimators have these three desirable properties (or, more
rarely, when they do not). Throughout, they focus on the "total variation"
or L1 distance between
densities: dTV(f,g) is the integral of
|f(x) - g(x)| over all x. They
mention, but generally avoid, other common distances or pseudo-distances such
as L2 (integral of |f(x)
- g(x)|2), Hellinger distance (too ugly to write
in HTML), or relative entropy (Kullback-Leibler divergence, expected
log-likelihood ratio). The total variation distance has a very natural
probabilistic interpretation (the maximum amount by which the estimated
probability of any event differs from its true probability), and they can get
very nice finite-sample bounds by minimizing it over various classes of
possible estimates, so this choice is eminently defensible; it does however cut
them off from using a lot of existing theory. (For instance, the optimal
coefficients in a Fourier series, from an L1 point of view,
are not just the empirical Fourier coefficients, since the latter
are L2 optimal.)
- Their general goal is to prove finite-sample upper bounds on
the L1 error of their density estimates; if these go to
zero as n grows, we get (1) and (2) above, and the rate of convergence
tells us how close we are to obtaining (3). Their route to this goal is almost
always through VC theory, and
empirical process
theory more generally. As always, this has two parts: one is deviation
inequalities
(e.g., Hoeffding's)
which bound the probability that any one candidate density will look much
better in sample than it will look out of sample. The other part is
combinatorial arguments that the behavior of an entire space of functions can
be approximated by that of a finite number of key functions. Meshed together
by a union bound, these
give uniform concentration bounds, with rates of convergence depending on the
complexity of the combinatorial construction needed to achieve a given degree
of approximation (i.e., the VC dimension). Devroye and Lugosi's key theorems
bound the error of their density estimates in terms of the VC dimensions of the
sets formed by comparing two densities in the class. (Specifically, they are
interested in the sets where one estimate is higher than another by a given
amount; this is, as they note, extremely similar to the threshold procedure
used to apply VC theory to regression problems.) Finite VC dimension for such
sets implies convergence to within a constant factor of the best available
approximation to the true density. They extend such results to ones where the
amount of smoothing is determined by data-set splitting, i.e., dividing the
data into a training and a testing set, and picking the degree of smoothing
which best generalizes from the training set to the testing set. (They do not
consider any other form of cross-validation, which is a shame because they're a
lot more common than simple data-splitting, but understandable because they're
very ugly to analyze.) They give a lot of attention to kernel density
estimates, including bounds for continuous kernels in terms of how hard it is
to approximate them by simple step-functions, for which the combinatorics are
easy.
- Strictly speaking, the book presupposes measure-theoretic probability, but
readers uncomfortable with sigma-fields and Radon-Nikodym derivatives
could mostly get away with ignoring the former and reading
"probability density functions" for the latter. Similarly, the actual
combinatorics are either elementary, or can be taken on trust. This book is
probably not the best way to first encounter density estimation — I
suspect a less theoretical introduction would not only make the ideas clearer,
but also make readers want theoretical guidance — but no
experience on that score is, strictly, necessary. Neither, really, is prior
knowledge of learning theory or VC theory, though again it would probably help.
The ideal situation for the book is, I'd guess, a second-year graduate-level
course on density estimation (there are many excellent problems), or
self-study.
- *: Well, we have to pretend the data are IID, but
let that slide. Or: assume sufficiently rapid strong mixing and argue, as
in Vidyasagar, that VC results then
hold with tolerable corrections. Kernel density estimates for stochastic
processes are treated at length in
Bosq's Nonparametric
Statistics for Stochastic Processes: Estimation and Prediction, but
the starting point there is ergodic theory, not learning theory.
- George Clark, Science and Social Welfare in the Age of Newton
- Connections between the scientific revolution, economic development and
economic policy (such as it was) in late-17th and early-18th century England,
and to a lesser extent France and the Netherlands. Interesting stuff on the
connections between the activities of scientists and technological development,
including the shrewd observation, contra Marxists claiming that
scientific progress was basically directed to solving the capitalists'
problems, that there were plenty of lucrative problems where scientists got
nowhere, or didn't even try to get anywhere, because it was just
not scientifically feasible. Also some interesting material on the
early history of statistics. The first edition was published in 1937, and
shows both that it was written during the Depression, and that respectable
economists had no idea what was going on. (This does not much harm the
book.)
|
November 19, 2009
"Statistical Analysis of Stellar Evolution" (Next Week at the Statistics Seminar)
In which the starry heavens above submit to statistical analysis:
- David van Dyk, "Statistical
Analysis of Stellar Evolution"
- Abstract: Color-Magnitude Diagrams (CMDs) are plots that compare
the magnitudes (luminosities) of stars in different wavelengths of light
(colors). High non-linear correlations among the mass, color and surface
temperature of newly formed stars induce a long narrow curved point cloud in a
CMD known as the main sequence. Aging stars form new CMD groups of red giants
and white dwarfs. The physical processes that govern this evolution can be
described with mathematical models and explored using complex computer
models. These calculations are designed to predict the plotted magnitudes as a
function of parameters of scientific interest such as stellar age, mass, and
metallicity. Here, we describe how we use the computer models as a component of
a complex likelihood function in a Bayesian analysis that requires
sophisticated computing, corrects for contamination of the data by field stars,
accounts for complications caused by unresolved binary-star systems, and aims
to compare competing physics-based computer models of stellar evolution.
- This is joint work with Steven DeGennaro, Nathan Stein, William
H. Jefferys, Ted von Hippel, and Elizabeth Jeffery.
- Place and time: Doherty Hall A310, Monday, 23 November, 4--5 pm.
Enigmas of Chance;
The Eternal Silence of These Infinite Spaces;
Physics
Posted by crshalizi at November 19, 2009 12:02 | permanent link
November 13, 2009
"Some Things Statisticians Do at Google" (Next Week at the Statistics Seminar)
Attention conservation notice: Of no use to you unless
(1) you want to know what statisticians do at search-engine companies
and (2) you are in Pittsburgh.
- Mike Meyer, "Some Things Statisticians Do at Google"
- Abstract: I'll talk about a number of projects at Google where statisticians
have made a large contribution. There will not be a lot of technical
details. In some cases I will just describe the problem.
- The major example will be a description of the statistical and
engineering infrastructure to support live traffic experiments
at Google.
- A common theme of the problems is the importance of understanding
basic statistical principles that can be applied and modified to
handle new data and new circumstances.
- Place and time: Monday, 16 November at 4 pm, in Doherty Hall
A310
As always, the talk is free and open to the public.
Enigmas of Chance
Posted by crshalizi at November 13, 2009 15:09 | permanent link
November 08, 2009
The Shadow Price of Power
Attention conservation notice: Quasi-teaching note giving
an economic interpretation of the Neyman-Pearson lemma on statistical
hypothesis testing.
Suppose we want to pick out some sort of signal from a background of noise.
As every schoolchild knows, any procedure for doing this,
or test, divides the data space into two parts, the one where
it says "noise" and the one where it says "signal".* Tests will make two kinds
of mistakes: they can can take noise to be signal, a false
alarm, or can ignore a genuine signal as noise,
a miss. Both the signal and the noise are stochastic, or we
can treat them as such anyway. (Any determinism distinguishable from chance is
just insufficiently complicated.) We want tests where
the probabilities of both types of errors are small. The probability
of a false alarm is called the size of the test; it is the
measure of the "say 'signal'" region under the noise distribution. The
probability of a miss, as opposed to a false alarm, has no short name in the
jargon, but one minus the probability of a miss — the probability of
detecting a signal when it's present — is called power.
Suppose we know the probability density of the noise p and that of
the signal is q. The Neyman-Pearson lemma, as many though not all
schoolchildren know, says that then, among all tests off a given size s,
the one with the smallest miss probability, or highest power, has the form "say
'signal' if q(x)/p(x) > t(s),
otherwise say 'noise'," and that the threshold t varies inversely
with s. The quantity q(x)/p(x) is
the likelihood ratio; the Neyman-Pearson lemma says that to
maximize power, we should say "signal" if its sufficiently more likely
than noise.
The likelihood ratio indicates how different the two distributions —
the two hypotheses — are at x, the data-point we
observed. It makes sense that the outcome of the hypothesis test should depend
on this sort of discrepancy between the hypotheses. But why
the ratio, rather than, say, the difference q(x)
- p(x), or a signed squared difference, etc.? Can we make this
intuitive?
Start with the fact that we have an optimization problem under a constraint.
Call the region where we proclaim "signal" R. We want to maximize its
probability when we are seeing a signal, Q(R), while constraining
the false-alarm probability, P(R)
= s. Lagrange
tells us that the way to do this is to minimize Q(R)
- t[P(R) - s] over R and t jointly.
So far the usual story; the next turn is usually "as you remember from the
calculus of variations..."
Rather than actually doing math, let's think like economists. Picking the
set R gives us a certain benefit, in the form of the
power Q(R), and a cost, tP(R).
(The ts term is the same for all R.) Economists, of course, tell
us to equate marginal costs and benefits. What is the marginal
benefit of expanding R to include a small neighborhood around the point
x? Just, by the definition of "probability
density", q(x). The marginal cost is
likewise tp(x). We should include x in R
if q(x) > tp(x),
or q(x)/p(x) > t. The boundary of R
is where marginal benefit equals marginal cost, and that is why we need the
likelihood ratio and not the likelihood difference, or
anything else. (Except for a monotone transformation of the ratio, e.g. the
log ratio.) The likelihood ratio threshold t is, in fact, the
shadow price of
statistical power.
I am pretty sure I have not seen or heard the Neyman-Pearson lemma explained
marginally before, but in retrospect it seems to simple to be new, so pointers
would be appreciated.
Manual trackback: John Barrdear
*: Yes, you could have a randomized test procedure,
but the situations where those actually help pretty much define "boring,
merely-technical complications."
Enigmas of Chance
Posted by crshalizi at November 08, 2009 03:06 | permanent link
November 05, 2009
36-350, Data Mining: Course Materials (Fall 2009)
My lesson-plan having survived first contact with
the enemy students, it's time to start posting the lecture
handouts & c. This page will be updated as the semester goes on; the RSS
feed for it should be here.
The class homepage has more
information.
- Introduction
to the course (24 August) What is data mining? how is it used? where did it
come from? Some themes.
- Information
retrieval and similarity searching I (26 August) Finding the data you are
looking for. Ideas we will avoid: meta-data and cataloging; meanings. Textual
features. The bag-of-words representation; its vector form. Measuring
similarity and distance for vectors. Example with the New York Times
Annotated Corpus.
- IR continued (28 August). The
trick to searching: queries are documents. Search evaluation: precision,
recall, precision-recall curves; error rates. Classification: nearest
neighbors and prototypes; classifier evaluation by mis-classification rate and
by confusion matrices. Inverse document frequency weighting. Visualizing
high-dimensional data by multi-dimensional scaling. Miscellaneous topics:
stemming, incorporating user feedback.
Homework 1, due 4 September: assignment,
R, data; SOLUTIONS
- Page
Rank (31 August). Links as pre-existing feedback. How to exploit link
information? The random walk on the graph; using the ergodic theorem.
Eigenvector formulation of page-rank. Combining page-rank with textual
features. Other applications. Further reading on information retrieval.
- Image
Search, Abstraction and Invariance (2 September). Similarity search for
images. Back to representation design. The advantages of abstraction:
simplification, recycling. The bag-of-colors representation. Examples.
Invariants. Searching for images by searching text. An example in practice.
Slides for this lecture.
- Information
Theory I (4 September). Good features help us guess what we can't
represent. Good features discriminate between different values of unobserved
variables. Quantifying uncertainty with entropy. Quantifying reduction in
uncertainty/ discrimination with mutual information. Ranking features based on
mutual information. Examples, with code, of informative words for
the Times. Code.
Supplementary reading: David
P. Feldman, Brief Tutorial on
Information Theory, chapter 1
Homework 2, due 11 September: assignment;
SOLUTIONS
TEXT; SOLUTIONS R
- Information Theory II (9
September). Dealing with multiple features. Joint entropy, the chain rule for
entropy. Information in multiple features. Conditional information, chain
rule for information, conditional independence. Interactions, positive and
negative, and redundancy. Greedy feature selection with low redundancy.
Example, with code, of selecting words for the Times. Sufficient
statistics and the information
bottleneck. Code.
Supplementary reading; Aleks Jakulin and Ivan Bratko, "Quantifying and
Visualizing Attribute
Interactions", arxiv:cs.AI/0308002
- Categorization;
Clustering I (11 September). Dividing the world up into categories.
Classification: known categories with labeled examples. Taxonomy of learning
problems (supervised, unsupervised, semi-supervised, feedback, ...).
Clustering: discovering unknown categories from unlabeled data. Benefits of
clustering, with an digression on where official classes come from. Basic
criterion for good clusters: lots of information about features from little
information about cluster. Practical considerations: compactness, separation,
parsimony, balance. Doubts about parsimony and balance. The k-means
clustering algorithm, or unlabeled prototype classification: analysis,
geometry, search. Appendix: geometric aspects of the prototype and
nearest-neighbor method.
Homework 3, due 18
September: assignment; SOLUTIONS
- Clustering II (14 September).
Distances between partitions; variation-of-information distance.
Hierarchical clustering by agglomeration and its varieties. Picking the
number of clusters by merging costs. Performance of different clustering
methods on various doodles. Why we would like to pick the number of clusters
by predictive performance, and why it is hard to do at this stage. Reifying clusters.
- Transformations: Rescaling and
Low-Dimensional Summaries (16 September). Improving on our original
features. Re-scaling, standardization, taking logs, etc., of individual
features. Forcing things to be Gaussian considered harmful. Low-dimensional
summaries by combining features. Exploiting geometry to eliminate redundancy.
Projections on to linear subspaces. Searching for structure-preserving
projections.
- Principal Components I (18
September). Principal components are the directions of maximum variance.
Derivation of principal components as the best approximation to the data in a
linear subspace. Equivalence to variance maximization. Avoiding explicit
optimization by finding eigenvalues and eigenvectors of the covariance matrix.
Example of principal components with cars; how to tell a sports car from a
minivan. The standard recipe for doing PCA. Cautions in interpreting
PCA. Data-set used in the notes.
Homework 4, due 25
September: assignment; SOLUTIONS
- Principal
Components II (21 September). PCA + information retrieval = latent
semantic indexing; why LSI is a Good Idea. PCA and multidimensional scaling.
- Factor
Analysis (23 and 25 September). From PCA to factor analysis by adding
noise. Roots of factor analysis in causal discovery: Spearman's general factor
model and the tetrad equations. Problems with estimating factor models: number
of equations does not equal number of unknowns. Solution 1, "principal
factors", a.k.a. estimation through heroic feats of linear algebra. Solution
2, maximum likelihood, a.k.a. estimation through imposing distributional
assumptions. The rotation problem: the factor model is
unidentifiable; the number of factors may be meaningful, but the individual
factors are not.
- The
Truth about PCA and Factor Analysis (28 September) PCA is data reduction
without any probabilistic assumptions about where the data came from. Picking
number of components. Faking predictions from PCA. Factor analysis makes
stronger, probabilistic assumptions, and delivers stronger, predictive
conclusions --- which could be wrong. Using probabilistic assumptions and/or
predictions to pick how many factors. Factor analysis as a first, toy
instances of a graphical causal model. The rotation problem once more with
feeling. Factor models and mixture models. Factor models and Thomson's
sampling model: an outstanding fit to a model with a few factors is actually
evidence of a huge number of badly measured latent variables.
Final advice: it all depends, but if you can only do one, try PCA.
R
code for the Thomson sampling model.
- Nonlinear
Dimensionality Reduction I: Locally Linear Embedding (5 October). Failure
of PCA and all other linear methods for nonlinear structures in data; spirals,
for example. Approximate success of linear methods on small parts of nonlinear
structures. Manifolds: smoothly curved surfaces embedded in higher-dimensional
Euclidean spaces. Every manifold looks like a linear subspace on a
sufficiently small scale, so we should be able to patch together many small
local linear approximations into a global manifold. Local linear embedding:
approximate each vector in the data as a weighted linear combination of
its k nearest neighbors, then find the low-dimensional vectors best
reconstructed by these weights. Solving the optimization problems by linear
algebra. Coding up LLE. A spiral
rainbow. R.
- Nonlinear
Dimensionality Reduction II: Diffusion Maps (9 October). Making a graph
from the data; random walks on this graph. The diffusion operator,
a.k.a. Laplacian. How the Laplacian encodes the shape of the data.
Eigenvectors of the Laplacian as coordinates. Connection to page-rank.
Advantages when data are not actually on a manifold. Example.
Pre-midterm review (12 October): highlights of the course to date; no
handout.
MIDTERM (14
October): exam, solutions
Homework 5, due 23 October:
assignment;
solutions
- Regression
I: Basics. Guessing a real-valued random variable; why expectation values
are mean-square optimal point forecasts. The regression function; why its
estimation must involve assumptions beyond the data. The bias-variance
decomposition and the bias-variance trade-off. First example of improving
prediction by introducing variance. Ordinary least squares linear regression
as smoothing. Other linear smoothers: k-nearest-neighbors and kernel
regression. How much should we
smooth? R, data
for running example
- Regression
II: The Truth About Linear Regression (21 October). Linear regression is
optimal linear (mean-square) prediction; we do this because we hope a linear
approximation will work well enough over a small range. What linear regression
does: decorrelate the input features, then correlate them separately with the
response and add up. The extreme weakness of the probabilistic assumptions
needed for this to make sense. Difficulties of linear regression;
collinearity, errors in variables, shifting distributions of inputs, omitted
variables. The usual extra probabilistic assumptions and their implications.
Why you should always looking at residuals. Why you generally shouldn't use
regression for causal inference. How to torment angels. Likelihood-ratio
tests for restrictions of nice models.
- Regression III: Extending Linear
Regression (23 October). Weighted least squares. Heteroskedasticity:
variance is not the same everywhere. Going to consult the oracle. Weighted
least squares as a solution to heteroskedasticity. Nonparametric estimation of
the variance function. Local polynomial regression: local constants (= kernel
regression), local linear regression, higher-order local polynomials. Lowess =
locally-linear smoothing for scatter plots. The oracles fall silent.
Homework 6, due Friday, 30 October: assignment, data set; solutions
- Evaluating Predictive Models (26
and 28 October). In-sample, out-of-sample and generalization loss or error;
risk as expected loss on new data. Under-fitting, over-fitting, and examples
with polynomials. Methods of model selection and controlling over-fitting:
empirical risk minimization, penalization, constraints/sieves, formal learning
theory, cross-validation. Limits of
generalization. R for creating figures.
- Smoothing
Methods in Regression (30 October). How much smoothing should we do?
Approximation by local averaging. How much smoothing we should do to
find the unknown curve depends on how smooth the curve really is,
which is unknown. Adaptation as a partial substitute for actual knowledge.
Cross-validation for adapting to unknown smoothness. Application: testing
parametric regression models by comparing them to nonparametric fits. The
bootstrap principle. Why ever bother with parametric
regressions? R
code for some of the examples.
Homework 7, due Friday, 6 November: assignment
- Additive
Models (2 November). A nice feature of linear models: partial responses,
partial residuals, and backfitting estimations. Additive models: regression
curve is a sum of partial response functions; partial residuals and the
backfitting trick generalize. Parametric and non-parametric rates of
convergence. The curse of dimensionality for unstructured nonparametric
models. Additive models as a compromise, introducing bias to reduce variance.
Example with the data from homework 6.
- Classification
and Regression Trees (4 and 6 November). Prediction trees. A
classification tree we can believe in. Prediction trees combine simple local
models with recursive partitioning; adaptive nearest neighbors. Regression
trees: example; a little math; pruning by cross-validation; more R mechanics.
Classification trees: basics; measuring error by mis-classification; weighted
errors; likelihood; Neyman-Pearson classifiers. Uncertainty for trees.
Corrupting the Young;
Enigmas of Chance
Posted by crshalizi at November 05, 2009 22:45 | permanent link
November 04, 2009
Blosxom Fading in November
My old Blosxom installation (v. 2.0.2),
after several years of working nicely, is growing increasingly cranky, and
mulishly refusing to generate or update posts as the whim takes it. (I am not
sure how much kicking and shoving it will need to produce this.) I'd
appreciate a pointer to something which works similarly, but does
work: I write posts in plain HTML in Emacs and drop them in a directory; it
makes them look nice. If it handles tags and/or LaTeX nicely, so much the
better.
Self-Centered
Posted by crshalizi at November 04, 2009 19:34 | permanent link
October 31, 2009
Books to Read While the Algae Grow in Your Fur, October 2009
- Rosemary Kirstein, The Lost Steersman
- Sequel to Steerswoman's Road (below); excellent and perfectly
continuous, despite a long gap in the writing. The trick of celebrating
intelligence while maintaining the tone and color of a good fantasy novel is
not something I have encountered elsewhere, and find deeply addictive.
- Everything else I have to say is a spoiler: This owes
a massive debt to Lovecraft's At the Mountains of Madness. The
plot-hinge mystery here has to do with "demons", amphibious barrel-shaped
creatures with quadrilateral symmetry, very like (though not exactly the same
as) Lovecraft's Antarctic Old Ones. There are scenes of dissecting demons
under the impression that they are just animals, and realizing they belong to
some radically different division of life than familiar terrestrial organisms;
an exploring expedition to an unknown part of the world where the demons are
found; explorations of demons' cities and observations of their customs,
including subterranean chambers used for their rituals, etc.; and the dawning
realization that the creatures are in fact sapient. (HPL: "Radiates,
vegetables, monstrosities, star spawn — whatever they had been, they were
men!" RK does not put such florid outbursts in her characters' mouths; she just
has Rowan come to see that the demons are "people".) Kirstein does a better
job, in my view, of making the creatures actually alien, in particular starting
from giving them a very inhuman sensorium (continual sonar, without
any vision) and means of communication (excreting specially shaped lumps of
organic material, reminiscent of the pieces of carved soapstone Lovecraft
associated with his Old Ones), and building out logically from there. Needless
to say, this complicates the ethics of terraforming the steerswomen's planet
considerably. — Janus's speeches (pp. 43--44 and p. 356) about the
dangers of learning too much about the world also seems drawn from Lovecraft,
though they bring to mind the opening of "The Call of Cthulhu" more
than Mountains.
- Jeffrey
D. Hart, Nonparametric Smoothing and Lack-of-Fit
Tests
- A sound, friendly but reasonably theoretical introduction to nonparametric
regression, giving about equal attention to kernel-based methods and to series
expansions (Fourier series, orthogonal polynomials, etc.). The first half of
the book, through ch. 4, introduces these methods, considers their ability to
predict new data (emphasizing, naturally, the bias-variance trade-off), and
looks at methods for selecting how much smoothing to do based on the data being
smoothed, with a fondness for leave-one-out cross-validation and its variants.
(I can't recall if k-fold CV is even mentioned.) The second half is
about testing parametric regression specifications. Chapter 5 reviews some
classical tests for fully-specified and especially for linear-in-the-parameters
parametric models, including Neyman's smooth tests: the latter involve, roughly
speaking, fitting an orthogonal series to the deviations from the null model,
and checking that all the coefficients are small, and so form a bridge to the
smoothing-based tests used in the rest of the book. Basically, one can either
smooth the parametric residuals, which should have mean zero and constant
variance under the null hypothesis, or compare the parametric estimate to the
nonparametric smooth. Hart prefers the former approach, and develops tests for
regression functions being constants in chapter 7, which in chapter 8 are
turned into tests for departures from arbitrary parametric regression models.
The distribution of these test statistics is too complicated for anything
except bootstrapping, which needs to be done carefully to preserve power. To
simplify the math, up to this point Hart assumes that the input variable takes
values at a deterministic set of points on the unit interval ("fixed-design
univariate regression"); chapter 9 generalizes to random-design and
multivariate regressions, as well as lifting some other restrictions. Chapter
10 contains some case studies of real data.
- This book should be accessible to anyone who understands parametric
inference at the level of,
say, All
of Statistics; no prior exposure to smoothing methods is really
needed. The series-expansion methods will probably go down more easily with
some priori exposure to Fourier analysis. People who are serious about using
parametric regression models in the real world (cough
econometricians cough) owe it to themselves to test them with these
methods.
- Rosemary Kirstein, Steerswoman's Road
- First two books in an epic fantasy series about the scientific method,
reprinted in one volume. There are more books, which I now covet powerfully,
but the series is not finished.
- Spoilers: "Epic fantasy" here is, I am pretty sure,
totally misleading. Initially, and from most of the characters'
perspectives, the world looks like a bog-standard medieval fantasyland, only
with the addition of an itinerant semi-monastic order of geographers and
natural philosophers, the eponymous steerswomen. By the end of this volume,
however, I am pretty sure that the setting is actually another planet in this
universe, with no magic at all. The steerswomens' world is being terraformed;
the Guidestars are satellites in geosynchronous orbit. The native ecology
(based on "blackgrass" and "redgrass") is being systematically destroyed (by
microwave heating from the orbiting Guidestars ("the spell Routine Bioform
Clearance"), and by the Outskirters' goats [which may be genetically
modified?]) and replaced by terrestrial flora ("greengrass"), microbes and
fauna. Wizards are simply the inhabitants of the planet who retain the old
technology, such as electricity and explosives. Why most of the
colonists have regressed to medieval technology, and why, having done so, they
have an institution like the steerswomen, I couldn't tell you. (I can tell you
that sailors and steerswomen are immune to some "curses" because they wear
rubber-soled, i.e. electrically insulating, boots.) But I am dying to find
out.
- J. K. Ghosh and R. V. Ramamoorthi, Bayesian Nonparametrics
- I have written extensively about the general subject of Bayesian
nonparametrics and especially of its consistency elsewhere
(here, here,
or, indeed, here), so I'll just
plunge in. This 2003 monograph is the best overview of Bayesian nonparametrics
from the viewpoint of theoretical statistics which I've found, though there has
been a great deal of work since it was written, and I know that a number of new
books are coming out soon.
- The author begin (ch. 1) by reviewing* results on the consistency of
Bayesian learning on finite sample spaces and Dirichlet prior distributions.
They then carefully (ch. 2) consider the measure-theoretic issues involved in
constructing prior probability distributions over infinite-dimensional spaces,
especially priors over all probability measures (or all probability densities)
on the real line. Chapter 3 describes in detail the properties of
Dirichlet-process priors and of Polya tree priors. Chapter 4 is concerned with
consistency for Bayesian updating with IID data, emphasizing the
"Kullback-Leibler property" (the prior must put sufficient weight on
distributions with small relative entropy from the truth) and the
exponentially-consistent-testing conditions which go back to Schwartz. Chapter
5 specializes to inferring probability densities; this is the only place they
use Gausian process priors. Chapter 6 considers inferring the location
parameter of distributions of unknown shape, and outlines (without full detail)
the notorious examples, due to Freedman and Diaconis, of how Bayesian learning
can fail to be consistent. Chapter 7 considers linear regression with an
unknown noise distribution; this is the only departure from assuming IID data
made here. The remaining chapters try to construct uniform distributions on
infinite-dimensional spaces, look at some issues in survival analysis, and
technical aspects of "neutral to the right" priors, ones whose cumulative
hazard functions have independent increments. It is assumed throughout that
the true, data-generating distribution lies within the support of the
prior.
- Ghosh and Ramamoorthi focus on mathematical issues, to the exclusion of
computational and statistical considerations. (There are no applications to
data, or even to elaborate simulations.) The writing is adequate for a work of
"theorem-proof, theorem-proof" math, but no more. Those proofs, however, are
really clear and clean, without tricks or complications. I recommend the book
for those who want to understand, in depth, the technicalities of constructing
priors on infinite-dimensional spaces, and of establishing their consistency
when updated with IID data. There are a handful of exercises at the end of the
book, but I do not think it would be suitable as a classroom textbook. It
could work as the first part of an advanced graduate seminar, or for self-study
for motivated and mathematically mature readers.
- *: Actually, it's my impression that lots of
introductions to Bayesian statistics, even at the graduate level,
do not cover these results. This is, I think, something of a scandal
for the profession. That goes double if it's due to the attitude which Ghosh
and Ramamoorthi (p. 122) paraphrase as "the prior and the posterior given by
Bayes theorem [sic] are imperatives arising out of axioms of rational behavior
--- and since we are already rational why worry about one more" criterion,
namely convergence to the truth. One does indeed find pernicious relativism
and epistemic nihilism everywhere these days!
- Nadia
Gordon, Lethal
Vintage
- Continuing amateur-sleuthing adventures of a Napa Valley restaurant-owner
and her foodie (and wine-y) friends. No prior acquaintance with the series [1, 2, 3] needed.
- Larry
Gonick, The
Cartoon History of the Modern World, Part II: From the Bastille to
Baghdad
- My parents got the first part of the Cartoon History of the
Universe (in its original, shorter edition) for my brother and I in
1981, when I was seven. We loved it so much they ended up having to get
us two copies. I have thus been reading the History, as
it came out, all my conscious life. (And re-reading it, without any
visitations from
the Suck
Fairy so far.) This latest volume is, as always a delight, but not a pure
one, because it's also the last. I can understand wanting to be finished with
the work of a lifetime, especially one which in the nature of things could be
spun out indefinitely; but I can't help wishing for more.
- G. Willow Wilson and M. K. Perker, Air,
vol. 2: Flying Machine
- James Sethna, Statistical Mechanics: Entropy, Order Parameters, and
Complexity
- The best introductory statistical mechanics book I have ever seen.
(Meaning: advanced undergraduates, not the graduate level of Landau and
Lifshitz.) The reader is supposed to have some familiarity with classical and
quantum mechanics, a little electromagnetism, and the very barest rudiments of
thermodynamics, the latter not going beyond what's in a good first-year physics
course. Beyond the basics of differential equations and linear algebra, the
only real pieces of math used here are Fourier transforms and elementary
probability (such as one sees in undergraduate quantum mechanics). On this
basis Sethna erects classical (and, in one chapter, quantum) statistical
mechanics, emphasizing the modern applications of the theory and physical
intuition.
- The exposition begins with random walks, including diffusion and the
central limit theorem. The micro-canonical ensemble comes next, along with a
very nice chapter on its ergodic
basis and failures of ergodicity (such
as KAM
theory). The other ensembles are derived from imposing the micro-canonical
ensemble on the whole system, and looking at marginal distribution of
sub-systems. The elaborate axiomatic structure of pure thermodynamics is
touched on only briefly; thermodynamic quantities are seen, quite properly, as
derivative of statistical-mechanical ones. The question of what macroscopic
variables need to be included in the free energy leads naturally to a superb
chapter on the meaning and identification of order parameters. This in turn is
followed by a really lucid treatment of the connections between spontaneous
fluctuations, the decay correlations, response to external forces, and the
dissipative approach to equilibrium. The whole is capped off by chapters on
abrupt (e.g., ice-water, water-steam) and continuous (e.g., magnetic) phase
transitions, including a nice hand-waving discussion of the renormalization
group. In addition to the main thread of exposition, each chapter has a large
collection of problems, ranging from mathematical proofs through calculations
to simulation challenges, which contain a lot of neat applications and
special topics, and should at least be read if not attempted.
- There are a few places where I would quibble —
per Lebowitz, surely the
Boltzmann entropy is more useful out of equilibrium than the Gibbs?; couldn't
he have been more explicit about
the probabilistic foundations
of renormalization? — but mostly I just wish this book had been
written sixteen years ago when I was taking stat. mech.
- Disclaimer: Friends of mine used to work for Sethna, and he's
lectured at the SFI summer school (the chapter on order parameters began as a
lecture there in 1991), but I've never met him, and have no stake in the
success of the book.
- Update: Thanks to T.
A. Abinandanan for alerting me to the fact that there's
a free PDF of
the whole book!
- Laura E. Reeve, Vigilante
- Sequel
to Peacekeeper, with
an even more awful and totally-misleading cover. (The synposes at the link are
accurate, however.) Tasty mind-candy.
- C. L. Anderson, Bitter Angels
- Space opera about active struggles to prevent war, and other
morally-compromising endeavors; military science fiction that lets me respect
myself in the morning. The climax, where it becomes clear what is going on,
and why and how, and what the peace-keepers will do about it, with what
consequences, was very fine indeed. Picked up after reading the author's
self-advertisement
on Scalzi's blog, which has more.
- Madeleine
E. Robins, Petty Treason
- More
alternate-history Regency England private-eye detection (romance-free this
time). Very enjoyable; I wish there were more.
Books to Read While the
Algae Grow in Your Fur;
Enigmas of Chance;
Scientifiction and Fantastica;
Pleasures of Detection, Portraits of Crime;
Writing for Antiquity;
The Great Transformation;
Physics;
Complexity;
Cthulhiana
Posted by crshalizi at October 31, 2009 23:59 | permanent link
October 13, 2009
The Professions Considered as Pitchers of Icy Refreshing Lemonade
Attention conservation notice: Idle economic musings of a
non-economist. Sparked
by recent
developments, but if you're interested in that you'd be better off
elsewhere.
The usual libertarian story about professional licensing requirements
— e.g., requiring someone who wants to practice medicine to go to medical
school and pass exams, on pain of fines or jail — is that these are
simply professionals conspiring in restraint of trade. Licensing simply erects
a barrier to entry into the market for medical services, restricting supply and
driving up price. Eliminate it, they say, and supply will expand and prices
fall.
This presumes, however, that the demand for unlicensed professionals will be
equal to the demand for licensed ones. It seems to me very easy to tell a
"market for lemons" story here: someone in the market for professional services
generally knows very little about how skilled various potential providers
actually are. The sellers, however, generally know a lot about their own skill
level, or at least more than the potential clients do. (There are no doubt
exceptions, such as sincere quacks and the
Dunning-Kreuger
effect, but I don't think matters for the story.) This is the classic
asymmetric
information problem from Akerlof, with the usual result: the skilled
providers demand more, but the clients have no way of telling them from the
unskilled ones, so the only equilibrium is for only unskilled providers to be
on the market and for trade to be depressed, or indeed absent. By putting a
floor on the incompetence of professionals, licensing requirements stop the
unraveling of the market and increase demand. They get us out of the market
for lemons.
This occurred to me the other day, but it's obvious enough that I'm sure
someone wrote it up long ago; where? (And did I read it and forget about it?)
(After-notes: 1. Of course, having told the story I have no idea if it's
true of actual markets for professional services; learning that would require
rather delicate empirical investigations. Checking the restraint-of-trade
fable from Milton Friedman would, naturally, require those same investigations.
2. This doesn't rationalize why professions should be so largely
self-governing, nor does it rule out the idea that some licensing
requirements are counter-productive barriers to entry. 3. Replacing
professional certification with some sort of market-based entity telling
consumers about the quality of professional service-sellers won't work, for all
the usual reasons that competitive markets are incapable of adequately
providing information — to say nothing of the difficulty of telling
whether the raters know what they're talking about. 4. Universities are
accredited because students and parents would otherwise be in a market for
lemons. Universities themselves, however, can tell how skilled those selling
academic services are — or at least they're supposed to have
that ability. 5. I should re-read Phil Agre
on the
professionalization of everything and see if it holds up.)
The Dismal Science
Posted by crshalizi at October 13, 2009 22:26 | permanent link
October 09, 2009
Twilight of the Market Gods
My review
of Justin
Fox's Myth
of the Rational Market
in American
Scientist is out. (Shorter me: read the book.) Sometime soon I'll
put up a version with links, which alas don't work in print.
Manual
trackback: 3
Quarks Daily
The Dismal Science
Posted by crshalizi at October 09, 2009 00:54 | permanent link
October 08, 2009
Wit and Wisdom of Pittsburgh Bar Patrons (Part 1)
"They [= the Steelers] are like this utterly adorable, totally hot girl next
door, who you suddenly realize is everything you've ever wanted in a football
team — I mean, girlfriend."
Heard About Pittsburgh PA
Posted by crshalizi at October 08, 2009 23:00 | permanent link
"Completely Random Measures for Bayesian Nonparametrics" (This Year at the DeGroot Lecture)
Attention conservation notice: Only of interest if you (1)
care about specifying probability distributions on infinite-dimensional spaces
for use in nonparametric Bayesian inference, and (2) are in
Pittsburgh.
The CMU statistics department sponsors an annual distinguished lecture
series in memory of our sainted
founder, Morris
H. DeGroot. This year, the lecturer
is Michael Jordan. (I
realize that's a common name; I mean the one my peers and I wanted to be when
we grew up.)
- "Completely Random Measures for Bayesian Nonparametrics"
- Abstract: Bayesian nonparametric modeling and inference are based
on using general stochastic processes as prior distributions. Despite the
great generality of this definition, the great majority of the work in Bayesian
nonparametrics is based on only two stochastic processes: the Gaussian process
and the Dirichlet process. Motivated by the needs of applications, I present a
broader approach to Bayesian nonparametrics in which priors are obtained from a
class of stochastic processes known as "completely random measures" (Kingman,
1967). In particular I will present models based on the beta process and the
Bernoulli process, and will discuss an application of these models to the
analysis of motion capture data in computational vision.
- (Joint work with Emily Fox, Erik Sudderth and Romain Thibaux.)
- Time and place: 4:15 pm on Friday, 16 October 2009, in the Giant
Eagle Auditorium in Baker Hall (room A51)
Update: I counted over 210 people in the audience.
Enigmas of Chance
Posted by crshalizi at October 08, 2009 15:02 | permanent link
"High Dimensional Nonlinear Learning using Local Coordinate Coding" (Next Week at the Statistics Seminar)
Attention conservation notice: Only of interest if you (1)
care about statistical learning in high-dimensional spaces and (2) are in
Pittsburgh.
Since manifold learning has been on my mind this week, owing to trying to
teach it in data-mining, I am extra pleased by the
scheduling of this talk:
- "High Dimensional Nonlinear Learning using Local Coordinate Coding"
- Prof. Tong Zhang,
Rutgers University
- Abstract: We present a new method for learning nonlinear functions
in high dimension using semisupervised learning. Our method includes a phase of
unsupervised basis learning and a phase of supervised function learning. The
learned bases provide a set of anchor points to form a local coordinate system,
such that each data point on a high dimensional manifold can be locally
approximated by a linear combination of its nearby anchor points, with the
linear weights offering its local-coordinate coding. We show that a high
dimensional nonlinear function can be approximated by a global linear function
with respect to this coding scheme, and the approximation quality is ensured by
the locality of such coding. The method turns a difficult nonlinear learning
problem into a simple global linear learning problem, which overcomes some
drawbacks of traditional local learning methods. The empirical success of our
approach has been demonstrated in a recent pascal image classification
competition, where the top performance was achieved by an NEC system using this
idea.
- (Joint work with Kai Yu at NEC Lab America.)
- Time and place: 4 pm on Monday, 12 October 2009, in Doherty Hall
310
As always, the seminar is free and open to the public.
Enigmas of Chance
Posted by crshalizi at October 08, 2009 15:01 | permanent link
October 05, 2009
In re John Holland
Having vowed two weeks ago to post something positive
at least once a week, I missed last week, with the excuse of being back in Ann
Arbor for
the celebration of
John Holland's 80th birthday at
the Center for the Study of Complex
Systems. There was no time to post, or even to see everyone I wanted to,
but I did actually start writing something about Holland's scientific work,
only to realize yesterday I was merely engaged in self-plagiarism, from
this, this
and this, and probably other
things I'd written too, because reading Holland has quite profoundly shaped my
thinking. So I'll just point you to the back-catalogue, as it were, and get
back to revising a paper I'd never
have written if I hadn't read
Adaptation in Natural and Artificial Systems.
(So long as I'm talking about the workshop, and without any slight to the
other presentations, the neatest work was that
by Stephanie Forrest et al. on
using genetic programming
to evolve bug
fixes.)
Complexity;
Minds, Brains, and Neurons;
Enigmas of Chance
Posted by crshalizi at October 05, 2009 14:30 | permanent link
October 02, 2009
"Analyzing Networks and Learning with Graphs"
See you in Whistler?
a workshop in conjunction with
December 11 or 12, 2009 (exact date TBD) Whistler, BC, Canada
Deadline for Submissions: Friday, October 30, 2009
Notification of Decision: Monday, November 9, 2009
Overview:
Recent research in machine learning and statistics has seen the proliferation of computational methods for analyzing networks and learning with graphs. These methods support progress in many application areas, including the social sciences, biology, medicine, neuroscience, physics, finance, and economics.
The primary goal of the workshop is to actively promote a concerted effort to address statistical, methodological and computational issues that arise when modeling and analyzing large collection of data that are largely represented as static and/or dynamic graphs. To this end, we aim at bringing together researchers from applied disciplines such as sociology, economics, medicine and biology, together with researchers from more theoretical disciplines such as mathematics and physics, within our community of statisticians and computer scientists. Different communities use diverse ideas and mathematical tools; our goal is to to foster cross-disciplinary collaborations and intellectual exchange.
Presentations will include novel graph models, the application of established models to new domains, theoretical and computational issues, limitations of current graph methods and directions for future research.
Online Submissions
We welcome the following types of papers:
- Research papers that introduce new models or apply established models to novel domains,
- Research papers that explore theoretical and computational issues, or
- Position papers that discuss shortcomings and desiderata of current approaches, or propose new directions for future research.
All submissions will be peer-reviewed; exceptional work will be considered for oral presentation. We encourage authors to emphasize the role of learning and its relevance to the application domains at hand. In addition, we hope to identify current successes in the area, and will therefore consider papers that apply previously proposed models to novel domains and data sets.
Submissions should be 4-to-8 pages long, and adhere to NIPS format. Please email your
submissions to: nipsgraphs2009 [at] gmail [dot] com
Workshop Format
This is a one-day workshop. The program will feature invited talks, poster sessions, poster spotlights, and a panel discussion. All submissions will be peer-reviewed; exceptional work will be considered for oral presentation. More details about the program will be announced soon.
Organizers
Networks;
Enigmas of Chance;
Incestuous Amplification
Posted by crshalizi at October 02, 2009 10:24 | permanent link
September 30, 2009
Books to Read While the Algae Grow in Your Fur, September 2009
- Alexander
Rosenberg, Economics: Mathematical Politics or Science of
Diminishing Returns?
- Rosenberg is a philosopher of science focused on biology and neo-classical
economics. This is his second attempt at assessing the latter. (I haven't
read his first go-round and it doesn't seem to be necessary.) Ordinarily, he
says, the goal of science is to make increasing accurate and precise
predictions about the world. (This includes the historical sciences like
geology and paleontology, which predict new evidence rather than new events at
definite future times. [Different branches of astronomy actually make both
kinds of predictions.]) Economics, however, is incredibly bad at prediction
— certainly at the improving-precision part. Rosenberg does not argue
this point very strongly, taking it to be more or less obvious to anyone
familiar with the state of economics, and I'm inclined to agree. This picture
might be complicated by a detailed consideration of applied econometric models,
but even when those work, they are very poorly grounded in economic
theory*. (Incidentally, one of the pleasures of reading this was seeing
Rosenberg assault Friedman's "Methodology of Positive Economics" essay, whose
influence has been profound and utterly malign.) Rosenberg then has two
questions: (1) if economics does share the usual goal of science, what are its
prospects for achieving it? (2) if it does not have that goal, what
is it trying to achieve — or, perhaps, better, what is the kind of thing
economists do and want to keep doing fitted to achieve?
- As to (1) he is intensely skeptical, because he sees microeconomic
explanations as grounded in intentional explanations, a not-too-compelling
formalization of folk psychology (desires mapping to utility functions and
beliefs to subject probability distributions). He is extremely skeptical, on
strictly philosophical grounds, about intentional explanations being made much
better in the future than they have been through recorded history. His also
skeptical that we could ever have something like a cognitive-scientific or
neuro-scientific theory which explains behavior and recovers folk psychology as
a useful approximation in certain domains; I completely fail to follow his
argument here. What I seem to understand would imply that he thinks
thermostats and self-guided missiles are impossible, so if I am right he should
really listen to Uncle Norbert,
especially before
the inevitable robot
uprising (I can just see his last words being, a la pp. 142--143, "this
robot can't really be trying to kill me, because if that intention were
represented in one part of its computational system, l1,
who is the interpreter who treats the configuration of memory
registers in l1 as expressing this goal? Surely it must be
some other sub-system, call it l2, which
reads l1, but then we face the same question all over again
for l2 — urk!"); but no doubt I am wrong and he has
some more reasonable idea which he does not, however, convey. The resounding
experimental failure of maximizing-subjectively-expected-utility theory is not
addressed. (Perhaps this was less clear in 1994 than it is now, but I doubt
it.) I suspect that he would feel any of the models of choice proposed in
behavioral economics are subject to the same critique, mutatis mutandis, that
he makes of conventional microeconomics, because they're basically
intentional.
- As to (2), Rosenberg argues as follows. There is a three-way relationship
between a discipline's goals, its theories, and its methods: given the goals
(say, maximizing predictive accuracy), the theories tell us something about how
well different methods will meet the goals. Likewise if you fix the goals and
methods, only certain kinds of theories will be acceptable or reachable. And
if you fix the theories and methods, you constrain the goals you can attain.
(Rosenberg's argument here is very close to that of Larry Laudan in his great
book Science
and Values, and I think it's correct.) If we take neo-classical
methods and theories as given, what might economics be successfully
aiming at? Clearly not, by the previous argument, scientific prediction.
Rather, Rosenberg offers two possibilities, not mutually exclusive. On the one
hand, maybe it's really a species of hyper-formalized social-contract theory
from political philosophy, with (as he says) the Walrasian auctioneer in the
role of Hobbes's sovereign. Or: maybe it's a species of applied mathematics,
interested in the implications of interacting transitive preference orderings.
As he says, applied mathematicians are rarely interested in whether their math
can, in fact, be applied to the real world — that's not their
department.
- Excusing economics's poor track-record as an empirical science by saying
it's really political philosophy and/or applied math may be a defense
worse than the original accusation. As Rosenberg notes, it makes the idea of
attending to what economists have to say about policy matters rather odd; at
best one should listen to them as much as to any other sect of political
philosophers. I would suspect that Rosenberg was proposing this maliciously,
but he seems to be sincere and not just good at writing with a straight face.
I don't think economics is in quite such a plight as he does, but
having just put the book down I admit I'm hard-pressed to articulate why.
- *: For instance, when real business cycle theorists and their kin fit
dynamic stochastic "general equilibrium"** models to empirical time-series of
macro-economic quantities, these time series are first de-trended, i.e., made
stationary. This has no justification in the representative-agent story
underlying the models, but seems, at present, to be essential to actually
getting estimates. Typically the de-trending is done through
the "Hodrick-Prescott"
filter***, again with no theoretical justification, and the business cycle
is operationally defined as "the residuals of the filter". I suspect
that most of the predictive ability of DSGEs comes from the filter, plus
implicitly doing a moving-average smoothing of the residuals. It would be
interesting to pit them against naive nonparametric forecasting (along
say these
lines).
- **: I use the scare-quotes because I don't agree that representative agent
models are general equilibrium models.
- ***: Known in statistics decades
before Hodrick and Prescott
as a
"smoothing
spline". (The word "spline" does not appear in their paper, and they are
entirely innocent of the vast literature on how much smoothing to
do.)
- Sarah Graves, Wicked Fix
- More cozy comfort-reading about
sordid multiple homicide. (But whatever happened to Sam's girlfriend from the
previous book?)
- John Billheimer, Highway Robbery
- Well-written, amusing and absorbing mystery novel about a family of highway
engineers in West Virginia. The only thing keeping it from being
perfect-for-me mind-candy is that part of the plot turns on making fun of
environmentalists; but you can't have everything. This is the second book in a
series; I've not read the others but will look them up.
- Bent Jesper Christensen and Nicholas
M. Kiefer, Economic
Modeling and Inference
- Review: An Optimal Path to a Dead End.
- Joe Hill and Gabriel Rodriguez, Locke and Key, vol. 2: Head Games
- High-grade comic book mind-candy. Definitely needs the earlier
book.
- Chelsea Cain, Evil at Heart
- Great, if somewhat stomach-turning, mind-candy. (Probably
needs the earlier books.) I actually
wish Cain did more with the media-frenzy angle, however.
- Possible continuity error: Isn't Susan awfully
unconcerned about leaving her car alone in really dodgy neighborhoods, after
she broke out one of its windows?
- Thomas
Levenson, Newton and the Counterfeiter: The Unknown Detective Career
of the World's Greatest Scientist
- Or: Newton demands the noose. — A wonderfully readable little
biography of Newton, with the hook of looking at how he tackled his second
career as Warden of the Mint, in charge of actually producing the English
currency, and of catching and punishing counterfeiters. In particular,
Levenson focuses on Newton's pursuit of a counterfeiter of particular skill and
temerity,
one William
Chaloner, providing a great opportunity to explain the criminal underworld
in which such figures lived, and the vast opportunities opening up for them as
a result of the social transformations of which Newton was at once symbol,
beneficiary and further driver. (Any idiot understands stealing a hunk of
metal, and almost any idiot can grasp substituting pewter for silver, but the
higher reaches of monetary crime require numeracy and comfort with
sophisticated abstractions.) This is, in short, a portrait of the foundations
of our world being laid, from the intellectual system of rational scientific
explanation, to states powerful enough to enforce written laws on millions and
raise the funds needed to wage war across the world, to through global commerce
and flows of money, and stock-market Ponzi schemes in which geniuses lose
fortunes. Enthusiastically recommended if any of this sounds the least bit
appealing.
- Kat Richardson, Vanished
- Mind-candy: An American shaman in London. Ends in media res,
though not with a cliff-hanger.
- Halbert
White, Estimation, Inference, and Specification Analysis
- Review: How to Tell That
Your Model Is Wrong; and, What Happens Afterwards.
- House of Mystery: Love Stories for Dead People
- More tales from the bar,
plus a really unfortunate basement.
- Tiziano Scalvi et al., The Dylan Dog Case Files
- No purchase link because I actually dis-recommend it: predictable, tedious,
implausible, not scary, excruciating when it tries to be funny, ultimately
tiresome. (The drawing is I admit pretty good, but nowhere near the covers
Mignola provides for the translated edition.) Is this really that
popular in Italy? If so, does the original have virtues which did not survive
translation, or does the old country simply have no taste at all in
comics?
- Phil and Kaja Foglio, Agatha Heterodyne and the Circus of
Dreams and Agatha Heterodyne and the Clockwork Princess
- Volumes 4 and 5 of Girl Genius.
Go read.
- I. J. Parker, The Convict's Sword
- Converging murder cases in Heian-era Japan. Stands alone, but I enjoyed it
more for knowing the back-story. (Previous volumes in the
series: 1 and 2,
3, 4, 5.)
- Madeleine E. Robins, Point of Honour
- Your basic hard-boiled female private-eye detective novel, which also
happens to be a historical mystery and a Regency romance; the charming
love-child of Jane Austen, or perhaps Georgette Heyer, and Dashiell Hammett. I
read it in one sitting from the beginning — "It is a truth universally
acknowledged that a Fallen Woman of good family must, soon or late, descend to
whoredom" — to the end, and really want the sequel.
- (Read following up on an old review by Kate Nepveu.)
- Update: The sequel is as good.
Books to Read While the
Algae Grow in Your Fur;
Enigmas of Chance;
The Dismal Science;
Pleasures of Detection, Portraits of Crime;
Scientifiction and Fantastica;
Philosophy;
Writing for Antiquity;
The Great Transformation
Posted by crshalizi at September 30, 2009 23:59 | permanent link
September 25, 2009
Miniature Pearl
It would be wrong to say that Judea Pearl knows more
about causal inference than anyone
else — I can think of some
rivals very close to
where I'm writing this — but he certainly knows a lot, and has
worked tirelessly to formulate and spread the modern way of thinking about the
subject, centered around
graphical models and their associated structural equations. I remember
spending many happy hours with his book Causality when it came out
in 2000, and look forward to spending more with the new edition, which is
making its way to me through the mail now. In the meanwhile, however, there
is what he describes as "A new survey paper, gently summarizing everything I know about causation (in only 43 pages)":
- "Causal Inference in Statistics: An Overview", forthcoming
in Statistics Surveys 3 (2009): 96--146
[Free PDF]
- Abstract: This review presents empirical researchers with recent
advances in causal inference, and stresses the paradigmatic shifts that must be
undertaken in moving from traditional statistical analysis to causal analysis
of multivariate data. Special emphasis is placed on the assumptions that
underly all causal inferences, the languages used in formulating those
assumptions, the conditional nature of all causal and counterfactual claims,
and the methods that have been developed for the assessment of such
claims. These advances are illustrated using a general theory of causation
based on the Structural Causal Model (SCM) described in Pearl (2000a), which
subsumes and unifies other approaches to causation, and provides a coherent
mathematical foundation for the analysis of causes and counterfactuals. In
particular, the paper surveys the development of mathematical tools for
inferring (from a combination of data and assumptions) answers to three types
of causal queries: (1) queries about the effects of potential interventions,
(also called "causal effects" or "policy evaluation") (2) queries about
probabilities of counterfactuals, (including assessment of "regret,"
"attribution" or "causes of effects") and (3) queries about direct and indirect
effects (also known as "mediation"). Finally, the paper defines the formal and
conceptual relationships between the structural and potential-outcome
frameworks and presents tools for a symbiotic analysis that uses the strong
features of both.
The paper assumes a reader who's reasonably well-grounded in statistics,
though not necessarily in the causal-inference literature. (Of such readers, I
imagine applied economists might have more unlearning to do than most, because
they will keep asking "but when do I start estimating beta?") It's not ideally
calibrated for an reader coming from, say, machine learning.
One theme running through the paper is the futility of trying to define
causality in purely probabilistic terms, and the fact that cases where it looks
like one can do so are really cases where causal assumptions have been smuggled
in. Another is that once you realize counterfactual or mechanistic assumptions
are needed, the graphical-models/structural equation framework makes it
immensely easier to reason about them than does the rival "potential outcomes"
framework. In fact, the objects which the potential outcomes framework takes
as its primitives can be constructed within the structural framework,
so the correct part of the former is a subset of the latter. And by reasoning
on graphical models it is easy to see that confounding can be introducing by
"controlling for" the wrong variables, something explicitly denied by leading
members of the potential-outcomes school. (Pearl quotes them making this
mistake, and manages to pull off a more-in-sorrow-than-in-glee tone while doing
so.) Mostly, however, the paper is about showing off what can be done within
the new framework, which is really pretty impressive, and ought to be part of
the standard tool-kit of data analysis. If you are not already familiar with
it, this is an excellent place to begin, and if you are you will enjoy the
elegant and comprehensive presentation.
Looking back over what I write in this blog, I feel like, on the one hand,
there's too little of it lately, and on the other hand, it's too tilted towards
negative, critical stuff. While not regretting at all being negative and
critical about stupid ideas that need to be criticized (or, really,
pulverized), I will try to expand and balance my output by posting at least
once a week on some good science. We'll see how this goes.
Enigmas of Chance
Posted by crshalizi at September 25, 2009 10:12 | permanent link
|