November 19, 2009

"Statistical Analysis of Stellar Evolution" (Next Week at the Statistics Seminar)

In which the starry heavens above submit to statistical analysis:

David van Dyk, "Statistical Analysis of Stellar Evolution"
Abstract: Color-Magnitude Diagrams (CMDs) are plots that compare the magnitudes (luminosities) of stars in different wavelengths of light (colors). High non-linear correlations among the mass, color and surface temperature of newly formed stars induce a long narrow curved point cloud in a CMD known as the main sequence. Aging stars form new CMD groups of red giants and white dwarfs. The physical processes that govern this evolution can be described with mathematical models and explored using complex computer models. These calculations are designed to predict the plotted magnitudes as a function of parameters of scientific interest such as stellar age, mass, and metallicity. Here, we describe how we use the computer models as a component of a complex likelihood function in a Bayesian analysis that requires sophisticated computing, corrects for contamination of the data by field stars, accounts for complications caused by unresolved binary-star systems, and aims to compare competing physics-based computer models of stellar evolution.
This is joint work with Steven DeGennaro, Nathan Stein, William H. Jefferys, Ted von Hippel, and Elizabeth Jeffery.
Place and time: Doherty Hall A310, Monday, 23 November, 4--5 pm.

Enigmas of Chance; The Eternal Silence of These Infinite Spaces; Physics

Posted by crshalizi at November 19, 2009 12:02 | permanent link

November 13, 2009

"Some Things Statisticians Do at Google" (Next Week at the Statistics Seminar)

Attention conservation notice: Of no use to you unless (1) you want to know what statisticians do at search-engine companies and (2) you are in Pittsburgh.
Mike Meyer, "Some Things Statisticians Do at Google"
Abstract: I'll talk about a number of projects at Google where statisticians have made a large contribution. There will not be a lot of technical details. In some cases I will just describe the problem.
The major example will be a description of the statistical and engineering infrastructure to support live traffic experiments at Google.
A common theme of the problems is the importance of understanding basic statistical principles that can be applied and modified to handle new data and new circumstances.
Place and time: Monday, 16 November at 4 pm, in Doherty Hall A310

As always, the talk is free and open to the public.

Enigmas of Chance

Posted by crshalizi at November 13, 2009 15:09 | permanent link

November 08, 2009

The Shadow Price of Power

Attention conservation notice: Quasi-teaching note giving an economic interpretation of the Neyman-Pearson lemma on statistical hypothesis testing.

Suppose we want to pick out some sort of signal from a background of noise. As every schoolchild knows, any procedure for doing this, or test, divides the data space into two parts, the one where it says "noise" and the one where it says "signal".* Tests will make two kinds of mistakes: they can can take noise to be signal, a false alarm, or can ignore a genuine signal as noise, a miss. Both the signal and the noise are stochastic, or we can treat them as such anyway. (Any determinism distinguishable from chance is just insufficiently complicated.) We want tests where the probabilities of both types of errors are small. The probability of a false alarm is called the size of the test; it is the measure of the "say 'signal'" region under the noise distribution. The probability of a miss, as opposed to a false alarm, has no short name in the jargon, but one minus the probability of a miss — the probability of detecting a signal when it's present — is called power.

Suppose we know the probability density of the noise p and that of the signal is q. The Neyman-Pearson lemma, as many though not all schoolchildren know, says that then, among all tests off a given size s, the one with the smallest miss probability, or highest power, has the form "say 'signal' if q(x)/p(x) > t(s), otherwise say 'noise'," and that the threshold t varies inversely with s. The quantity q(x)/p(x) is the likelihood ratio; the Neyman-Pearson lemma says that to maximize power, we should say "signal" if its sufficiently more likely than noise.

The likelihood ratio indicates how different the two distributions — the two hypotheses — are at x, the data-point we observed. It makes sense that the outcome of the hypothesis test should depend on this sort of discrepancy between the hypotheses. But why the ratio, rather than, say, the difference q(x) - p(x), or a signed squared difference, etc.? Can we make this intuitive?

Start with the fact that we have an optimization problem under a constraint. Call the region where we proclaim "signal" R. We want to maximize its probability when we are seeing a signal, Q(R), while constraining the false-alarm probability, P(R) = s. Lagrange tells us that the way to do this is to minimize Q(R) - t[P(R) - s] over R and t jointly. So far the usual story; the next turn is usually "as you remember from the calculus of variations..."

Rather than actually doing math, let's think like economists. Picking the set R gives us a certain benefit, in the form of the power Q(R), and a cost, tP(R). (The ts term is the same for all R.) Economists, of course, tell us to equate marginal costs and benefits. What is the marginal benefit of expanding R to include a small neighborhood around the point x? Just, by the definition of "probability density", q(x). The marginal cost is likewise tp(x). We should include x in R if q(x) > tp(x), or q(x)/p(x) > t. The boundary of R is where marginal benefit equals marginal cost, and that is why we need the likelihood ratio and not the likelihood difference, or anything else. (Except for a monotone transformation of the ratio, e.g. the log ratio.) The likelihood ratio threshold t is, in fact, the shadow price of statistical power.

I am pretty sure I have not seen or heard the Neyman-Pearson lemma explained marginally before, but in retrospect it seems to simple to be new, so pointers would be appreciated.

Manual trackback: John Barrdear

*: Yes, you could have a randomized test procedure, but the situations where those actually help pretty much define "boring, merely-technical complications."

Enigmas of Chance

Posted by crshalizi at November 08, 2009 03:06 | permanent link

November 05, 2009

36-350, Data Mining: Course Materials (Fall 2009)

My lesson-plan having survived first contact with the enemy students, it's time to start posting the lecture handouts & c. This page will be updated as the semester goes on; the RSS feed for it should be here. The class homepage has more information.

  1. Introduction to the course (24 August) What is data mining? how is it used? where did it come from? Some themes.
  2. Information retrieval and similarity searching I (26 August) Finding the data you are looking for. Ideas we will avoid: meta-data and cataloging; meanings. Textual features. The bag-of-words representation; its vector form. Measuring similarity and distance for vectors. Example with the New York Times Annotated Corpus.
  3. IR continued (28 August). The trick to searching: queries are documents. Search evaluation: precision, recall, precision-recall curves; error rates. Classification: nearest neighbors and prototypes; classifier evaluation by mis-classification rate and by confusion matrices. Inverse document frequency weighting. Visualizing high-dimensional data by multi-dimensional scaling. Miscellaneous topics: stemming, incorporating user feedback.

    Homework 1, due 4 September: assignment, R, data; SOLUTIONS

  4. Page Rank (31 August). Links as pre-existing feedback. How to exploit link information? The random walk on the graph; using the ergodic theorem. Eigenvector formulation of page-rank. Combining page-rank with textual features. Other applications. Further reading on information retrieval.
  5. Image Search, Abstraction and Invariance (2 September). Similarity search for images. Back to representation design. The advantages of abstraction: simplification, recycling. The bag-of-colors representation. Examples. Invariants. Searching for images by searching text. An example in practice. Slides for this lecture.
  6. Information Theory I (4 September). Good features help us guess what we can't represent. Good features discriminate between different values of unobserved variables. Quantifying uncertainty with entropy. Quantifying reduction in uncertainty/ discrimination with mutual information. Ranking features based on mutual information. Examples, with code, of informative words for the Times. Code.
    Supplementary reading: David P. Feldman, Brief Tutorial on Information Theory, chapter 1

    Homework 2, due 11 September: assignment; SOLUTIONS TEXT; SOLUTIONS R

  7. Information Theory II (9 September). Dealing with multiple features. Joint entropy, the chain rule for entropy. Information in multiple features. Conditional information, chain rule for information, conditional independence. Interactions, positive and negative, and redundancy. Greedy feature selection with low redundancy. Example, with code, of selecting words for the Times. Sufficient statistics and the information bottleneck. Code.
    Supplementary reading; Aleks Jakulin and Ivan Bratko, "Quantifying and Visualizing Attribute Interactions", arxiv:cs.AI/0308002
  8. Categorization; Clustering I (11 September). Dividing the world up into categories. Classification: known categories with labeled examples. Taxonomy of learning problems (supervised, unsupervised, semi-supervised, feedback, ...). Clustering: discovering unknown categories from unlabeled data. Benefits of clustering, with an digression on where official classes come from. Basic criterion for good clusters: lots of information about features from little information about cluster. Practical considerations: compactness, separation, parsimony, balance. Doubts about parsimony and balance. The k-means clustering algorithm, or unlabeled prototype classification: analysis, geometry, search. Appendix: geometric aspects of the prototype and nearest-neighbor method.

    Homework 3, due 18 September: assignment; SOLUTIONS

  9. Clustering II (14 September). Distances between partitions; variation-of-information distance. Hierarchical clustering by agglomeration and its varieties. Picking the number of clusters by merging costs. Performance of different clustering methods on various doodles. Why we would like to pick the number of clusters by predictive performance, and why it is hard to do at this stage. Reifying clusters.
  10. Transformations: Rescaling and Low-Dimensional Summaries (16 September). Improving on our original features. Re-scaling, standardization, taking logs, etc., of individual features. Forcing things to be Gaussian considered harmful. Low-dimensional summaries by combining features. Exploiting geometry to eliminate redundancy. Projections on to linear subspaces. Searching for structure-preserving projections.
  11. Principal Components I (18 September). Principal components are the directions of maximum variance. Derivation of principal components as the best approximation to the data in a linear subspace. Equivalence to variance maximization. Avoiding explicit optimization by finding eigenvalues and eigenvectors of the covariance matrix. Example of principal components with cars; how to tell a sports car from a minivan. The standard recipe for doing PCA. Cautions in interpreting PCA. Data-set used in the notes.

    Homework 4, due 25 September: assignment; SOLUTIONS

  12. Principal Components II (21 September). PCA + information retrieval = latent semantic indexing; why LSI is a Good Idea. PCA and multidimensional scaling.
  13. Factor Analysis (23 and 25 September). From PCA to factor analysis by adding noise. Roots of factor analysis in causal discovery: Spearman's general factor model and the tetrad equations. Problems with estimating factor models: number of equations does not equal number of unknowns. Solution 1, "principal factors", a.k.a. estimation through heroic feats of linear algebra. Solution 2, maximum likelihood, a.k.a. estimation through imposing distributional assumptions. The rotation problem: the factor model is unidentifiable; the number of factors may be meaningful, but the individual factors are not.
  14. The Truth about PCA and Factor Analysis (28 September) PCA is data reduction without any probabilistic assumptions about where the data came from. Picking number of components. Faking predictions from PCA. Factor analysis makes stronger, probabilistic assumptions, and delivers stronger, predictive conclusions --- which could be wrong. Using probabilistic assumptions and/or predictions to pick how many factors. Factor analysis as a first, toy instances of a graphical causal model. The rotation problem once more with feeling. Factor models and mixture models. Factor models and Thomson's sampling model: an outstanding fit to a model with a few factors is actually evidence of a huge number of badly measured latent variables. Final advice: it all depends, but if you can only do one, try PCA. R code for the Thomson sampling model.
  15. Nonlinear Dimensionality Reduction I: Locally Linear Embedding (5 October). Failure of PCA and all other linear methods for nonlinear structures in data; spirals, for example. Approximate success of linear methods on small parts of nonlinear structures. Manifolds: smoothly curved surfaces embedded in higher-dimensional Euclidean spaces. Every manifold looks like a linear subspace on a sufficiently small scale, so we should be able to patch together many small local linear approximations into a global manifold. Local linear embedding: approximate each vector in the data as a weighted linear combination of its k nearest neighbors, then find the low-dimensional vectors best reconstructed by these weights. Solving the optimization problems by linear algebra. Coding up LLE. A spiral rainbow. R.
  16. Nonlinear Dimensionality Reduction II: Diffusion Maps (9 October). Making a graph from the data; random walks on this graph. The diffusion operator, a.k.a. Laplacian. How the Laplacian encodes the shape of the data. Eigenvectors of the Laplacian as coordinates. Connection to page-rank. Advantages when data are not actually on a manifold. Example.

    Pre-midterm review (12 October): highlights of the course to date; no handout.
    MIDTERM (14 October): exam, solutions

    Homework 5, due 23 October: assignment; solutions

  17. Regression I: Basics. Guessing a real-valued random variable; why expectation values are mean-square optimal point forecasts. The regression function; why its estimation must involve assumptions beyond the data. The bias-variance decomposition and the bias-variance trade-off. First example of improving prediction by introducing variance. Ordinary least squares linear regression as smoothing. Other linear smoothers: k-nearest-neighbors and kernel regression. How much should we smooth? R, data for running example
  18. Regression II: The Truth About Linear Regression (21 October). Linear regression is optimal linear (mean-square) prediction; we do this because we hope a linear approximation will work well enough over a small range. What linear regression does: decorrelate the input features, then correlate them separately with the response and add up. The extreme weakness of the probabilistic assumptions needed for this to make sense. Difficulties of linear regression; collinearity, errors in variables, shifting distributions of inputs, omitted variables. The usual extra probabilistic assumptions and their implications. Why you should always looking at residuals. Why you generally shouldn't use regression for causal inference. How to torment angels. Likelihood-ratio tests for restrictions of nice models.
  19. Regression III: Extending Linear Regression (23 October). Weighted least squares. Heteroskedasticity: variance is not the same everywhere. Going to consult the oracle. Weighted least squares as a solution to heteroskedasticity. Nonparametric estimation of the variance function. Local polynomial regression: local constants (= kernel regression), local linear regression, higher-order local polynomials. Lowess = locally-linear smoothing for scatter plots. The oracles fall silent.

    Homework 6, due Friday, 30 October: assignment, data set; solutions

  20. Evaluating Predictive Models (26 and 28 October). In-sample, out-of-sample and generalization loss or error; risk as expected loss on new data. Under-fitting, over-fitting, and examples with polynomials. Methods of model selection and controlling over-fitting: empirical risk minimization, penalization, constraints/sieves, formal learning theory, cross-validation. Limits of generalization. R for creating figures.
  21. Smoothing Methods in Regression (30 October). How much smoothing should we do? Approximation by local averaging. How much smoothing we should do to find the unknown curve depends on how smooth the curve really is, which is unknown. Adaptation as a partial substitute for actual knowledge. Cross-validation for adapting to unknown smoothness. Application: testing parametric regression models by comparing them to nonparametric fits. The bootstrap principle. Why ever bother with parametric regressions? R code for some of the examples.

    Homework 7, due Friday, 6 November: assignment

  22. Additive Models (2 November). A nice feature of linear models: partial responses, partial residuals, and backfitting estimations. Additive models: regression curve is a sum of partial response functions; partial residuals and the backfitting trick generalize. Parametric and non-parametric rates of convergence. The curse of dimensionality for unstructured nonparametric models. Additive models as a compromise, introducing bias to reduce variance. Example with the data from homework 6.
  23. Classification and Regression Trees (4 and 6 November). Prediction trees. A classification tree we can believe in. Prediction trees combine simple local models with recursive partitioning; adaptive nearest neighbors. Regression trees: example; a little math; pruning by cross-validation; more R mechanics. Classification trees: basics; measuring error by mis-classification; weighted errors; likelihood; Neyman-Pearson classifiers. Uncertainty for trees.

Corrupting the Young; Enigmas of Chance

Posted by crshalizi at November 05, 2009 22:45 | permanent link

November 04, 2009

Blosxom Fading in November

My old Blosxom installation (v. 2.0.2), after several years of working nicely, is growing increasingly cranky, and mulishly refusing to generate or update posts as the whim takes it. (I am not sure how much kicking and shoving it will need to produce this.) I'd appreciate a pointer to something which works similarly, but does work: I write posts in plain HTML in Emacs and drop them in a directory; it makes them look nice. If it handles tags and/or LaTeX nicely, so much the better.

Self-Centered

Posted by crshalizi at November 04, 2009 19:34 | permanent link

October 31, 2009

Books to Read While the Algae Grow in Your Fur, October 2009

Rosemary Kirstein, The Lost Steersman
Sequel to Steerswoman's Road (below); excellent and perfectly continuous, despite a long gap in the writing. The trick of celebrating intelligence while maintaining the tone and color of a good fantasy novel is not something I have encountered elsewhere, and find deeply addictive.
Everything else I have to say is a spoiler: This owes a massive debt to Lovecraft's At the Mountains of Madness. The plot-hinge mystery here has to do with "demons", amphibious barrel-shaped creatures with quadrilateral symmetry, very like (though not exactly the same as) Lovecraft's Antarctic Old Ones. There are scenes of dissecting demons under the impression that they are just animals, and realizing they belong to some radically different division of life than familiar terrestrial organisms; an exploring expedition to an unknown part of the world where the demons are found; explorations of demons' cities and observations of their customs, including subterranean chambers used for their rituals, etc.; and the dawning realization that the creatures are in fact sapient. (HPL: "Radiates, vegetables, monstrosities, star spawn — whatever they had been, they were men!" RK does not put such florid outbursts in her characters' mouths; she just has Rowan come to see that the demons are "people".) Kirstein does a better job, in my view, of making the creatures actually alien, in particular starting from giving them a very inhuman sensorium (continual sonar, without any vision) and means of communication (excreting specially shaped lumps of organic material, reminiscent of the pieces of carved soapstone Lovecraft associated with his Old Ones), and building out logically from there. Needless to say, this complicates the ethics of terraforming the steerswomen's planet considerably. — Janus's speeches (pp. 43--44 and p. 356) about the dangers of learning too much about the world also seems drawn from Lovecraft, though they bring to mind the opening of "The Call of Cthulhu" more than Mountains.
Jeffrey D. Hart, Nonparametric Smoothing and Lack-of-Fit Tests
A sound, friendly but reasonably theoretical introduction to nonparametric regression, giving about equal attention to kernel-based methods and to series expansions (Fourier series, orthogonal polynomials, etc.). The first half of the book, through ch. 4, introduces these methods, considers their ability to predict new data (emphasizing, naturally, the bias-variance trade-off), and looks at methods for selecting how much smoothing to do based on the data being smoothed, with a fondness for leave-one-out cross-validation and its variants. (I can't recall if k-fold CV is even mentioned.) The second half is about testing parametric regression specifications. Chapter 5 reviews some classical tests for fully-specified and especially for linear-in-the-parameters parametric models, including Neyman's smooth tests: the latter involve, roughly speaking, fitting an orthogonal series to the deviations from the null model, and checking that all the coefficients are small, and so form a bridge to the smoothing-based tests used in the rest of the book. Basically, one can either smooth the parametric residuals, which should have mean zero and constant variance under the null hypothesis, or compare the parametric estimate to the nonparametric smooth. Hart prefers the former approach, and develops tests for regression functions being constants in chapter 7, which in chapter 8 are turned into tests for departures from arbitrary parametric regression models. The distribution of these test statistics is too complicated for anything except bootstrapping, which needs to be done carefully to preserve power. To simplify the math, up to this point Hart assumes that the input variable takes values at a deterministic set of points on the unit interval ("fixed-design univariate regression"); chapter 9 generalizes to random-design and multivariate regressions, as well as lifting some other restrictions. Chapter 10 contains some case studies of real data.
This book should be accessible to anyone who understands parametric inference at the level of, say, All of Statistics; no prior exposure to smoothing methods is really needed. The series-expansion methods will probably go down more easily with some priori exposure to Fourier analysis. People who are serious about using parametric regression models in the real world (cough econometricians cough) owe it to themselves to test them with these methods.
Rosemary Kirstein, Steerswoman's Road
First two books in an epic fantasy series about the scientific method, reprinted in one volume. There are more books, which I now covet powerfully, but the series is not finished.
Spoilers: "Epic fantasy" here is, I am pretty sure, totally misleading. Initially, and from most of the characters' perspectives, the world looks like a bog-standard medieval fantasyland, only with the addition of an itinerant semi-monastic order of geographers and natural philosophers, the eponymous steerswomen. By the end of this volume, however, I am pretty sure that the setting is actually another planet in this universe, with no magic at all. The steerswomens' world is being terraformed; the Guidestars are satellites in geosynchronous orbit. The native ecology (based on "blackgrass" and "redgrass") is being systematically destroyed (by microwave heating from the orbiting Guidestars ("the spell Routine Bioform Clearance"), and by the Outskirters' goats [which may be genetically modified?]) and replaced by terrestrial flora ("greengrass"), microbes and fauna. Wizards are simply the inhabitants of the planet who retain the old technology, such as electricity and explosives. Why most of the colonists have regressed to medieval technology, and why, having done so, they have an institution like the steerswomen, I couldn't tell you. (I can tell you that sailors and steerswomen are immune to some "curses" because they wear rubber-soled, i.e. electrically insulating, boots.) But I am dying to find out.
J. K. Ghosh and R. V. Ramamoorthi, Bayesian Nonparametrics
I have written extensively about the general subject of Bayesian nonparametrics and especially of its consistency elsewhere (here, here, or, indeed, here), so I'll just plunge in. This 2003 monograph is the best overview of Bayesian nonparametrics from the viewpoint of theoretical statistics which I've found, though there has been a great deal of work since it was written, and I know that a number of new books are coming out soon.
The author begin (ch. 1) by reviewing* results on the consistency of Bayesian learning on finite sample spaces and Dirichlet prior distributions. They then carefully (ch. 2) consider the measure-theoretic issues involved in constructing prior probability distributions over infinite-dimensional spaces, especially priors over all probability measures (or all probability densities) on the real line. Chapter 3 describes in detail the properties of Dirichlet-process priors and of Polya tree priors. Chapter 4 is concerned with consistency for Bayesian updating with IID data, emphasizing the "Kullback-Leibler property" (the prior must put sufficient weight on distributions with small relative entropy from the truth) and the exponentially-consistent-testing conditions which go back to Schwartz. Chapter 5 specializes to inferring probability densities; this is the only place they use Gausian process priors. Chapter 6 considers inferring the location parameter of distributions of unknown shape, and outlines (without full detail) the notorious examples, due to Freedman and Diaconis, of how Bayesian learning can fail to be consistent. Chapter 7 considers linear regression with an unknown noise distribution; this is the only departure from assuming IID data made here. The remaining chapters try to construct uniform distributions on infinite-dimensional spaces, look at some issues in survival analysis, and technical aspects of "neutral to the right" priors, ones whose cumulative hazard functions have independent increments. It is assumed throughout that the true, data-generating distribution lies within the support of the prior.
Ghosh and Ramamoorthi focus on mathematical issues, to the exclusion of computational and statistical considerations. (There are no applications to data, or even to elaborate simulations.) The writing is adequate for a work of "theorem-proof, theorem-proof" math, but no more. Those proofs, however, are really clear and clean, without tricks or complications. I recommend the book for those who want to understand, in depth, the technicalities of constructing priors on infinite-dimensional spaces, and of establishing their consistency when updated with IID data. There are a handful of exercises at the end of the book, but I do not think it would be suitable as a classroom textbook. It could work as the first part of an advanced graduate seminar, or for self-study for motivated and mathematically mature readers.
*: Actually, it's my impression that lots of introductions to Bayesian statistics, even at the graduate level, do not cover these results. This is, I think, something of a scandal for the profession. That goes double if it's due to the attitude which Ghosh and Ramamoorthi (p. 122) paraphrase as "the prior and the posterior given by Bayes theorem [sic] are imperatives arising out of axioms of rational behavior --- and since we are already rational why worry about one more" criterion, namely convergence to the truth. One does indeed find pernicious relativism and epistemic nihilism everywhere these days!
Nadia Gordon, Lethal Vintage
Continuing amateur-sleuthing adventures of a Napa Valley restaurant-owner and her foodie (and wine-y) friends. No prior acquaintance with the series [1, 2, 3] needed.
Larry Gonick, The Cartoon History of the Modern World, Part II: From the Bastille to Baghdad
My parents got the first part of the Cartoon History of the Universe (in its original, shorter edition) for my brother and I in 1981, when I was seven. We loved it so much they ended up having to get us two copies. I have thus been reading the History, as it came out, all my conscious life. (And re-reading it, without any visitations from the Suck Fairy so far.) This latest volume is, as always a delight, but not a pure one, because it's also the last. I can understand wanting to be finished with the work of a lifetime, especially one which in the nature of things could be spun out indefinitely; but I can't help wishing for more.
G. Willow Wilson and M. K. Perker, Air, vol. 2: Flying Machine
James Sethna, Statistical Mechanics: Entropy, Order Parameters, and Complexity
The best introductory statistical mechanics book I have ever seen. (Meaning: advanced undergraduates, not the graduate level of Landau and Lifshitz.) The reader is supposed to have some familiarity with classical and quantum mechanics, a little electromagnetism, and the very barest rudiments of thermodynamics, the latter not going beyond what's in a good first-year physics course. Beyond the basics of differential equations and linear algebra, the only real pieces of math used here are Fourier transforms and elementary probability (such as one sees in undergraduate quantum mechanics). On this basis Sethna erects classical (and, in one chapter, quantum) statistical mechanics, emphasizing the modern applications of the theory and physical intuition.
The exposition begins with random walks, including diffusion and the central limit theorem. The micro-canonical ensemble comes next, along with a very nice chapter on its ergodic basis and failures of ergodicity (such as KAM theory). The other ensembles are derived from imposing the micro-canonical ensemble on the whole system, and looking at marginal distribution of sub-systems. The elaborate axiomatic structure of pure thermodynamics is touched on only briefly; thermodynamic quantities are seen, quite properly, as derivative of statistical-mechanical ones. The question of what macroscopic variables need to be included in the free energy leads naturally to a superb chapter on the meaning and identification of order parameters. This in turn is followed by a really lucid treatment of the connections between spontaneous fluctuations, the decay correlations, response to external forces, and the dissipative approach to equilibrium. The whole is capped off by chapters on abrupt (e.g., ice-water, water-steam) and continuous (e.g., magnetic) phase transitions, including a nice hand-waving discussion of the renormalization group. In addition to the main thread of exposition, each chapter has a large collection of problems, ranging from mathematical proofs through calculations to simulation challenges, which contain a lot of neat applications and special topics, and should at least be read if not attempted.
There are a few places where I would quibble — per Lebowitz, surely the Boltzmann entropy is more useful out of equilibrium than the Gibbs?; couldn't he have been more explicit about the probabilistic foundations of renormalization? — but mostly I just wish this book had been written sixteen years ago when I was taking stat. mech.
Disclaimer: Friends of mine used to work for Sethna, and he's lectured at the SFI summer school (the chapter on order parameters began as a lecture there in 1991), but I've never met him, and have no stake in the success of the book.
Update: Thanks to T. A. Abinandanan for alerting me to the fact that there's a free PDF of the whole book!
Laura E. Reeve, Vigilante
Sequel to Peacekeeper, with an even more awful and totally-misleading cover. (The synposes at the link are accurate, however.) Tasty mind-candy.
C. L. Anderson, Bitter Angels
Space opera about active struggles to prevent war, and other morally-compromising endeavors; military science fiction that lets me respect myself in the morning. The climax, where it becomes clear what is going on, and why and how, and what the peace-keepers will do about it, with what consequences, was very fine indeed. Picked up after reading the author's self-advertisement on Scalzi's blog, which has more.
Madeleine E. Robins, Petty Treason
More alternate-history Regency England private-eye detection (romance-free this time). Very enjoyable; I wish there were more.

Books to Read While the Algae Grow in Your Fur; Enigmas of Chance; Scientifiction and Fantastica; Pleasures of Detection, Portraits of Crime; Writing for Antiquity; The Great Transformation; Physics; Complexity; Cthulhiana

Posted by crshalizi at October 31, 2009 23:59 | permanent link

October 13, 2009

The Professions Considered as Pitchers of Icy Refreshing Lemonade

Attention conservation notice: Idle economic musings of a non-economist. Sparked by recent developments, but if you're interested in that you'd be better off elsewhere.

The usual libertarian story about professional licensing requirements — e.g., requiring someone who wants to practice medicine to go to medical school and pass exams, on pain of fines or jail — is that these are simply professionals conspiring in restraint of trade. Licensing simply erects a barrier to entry into the market for medical services, restricting supply and driving up price. Eliminate it, they say, and supply will expand and prices fall.

This presumes, however, that the demand for unlicensed professionals will be equal to the demand for licensed ones. It seems to me very easy to tell a "market for lemons" story here: someone in the market for professional services generally knows very little about how skilled various potential providers actually are. The sellers, however, generally know a lot about their own skill level, or at least more than the potential clients do. (There are no doubt exceptions, such as sincere quacks and the Dunning-Kreuger effect, but I don't think matters for the story.) This is the classic asymmetric information problem from Akerlof, with the usual result: the skilled providers demand more, but the clients have no way of telling them from the unskilled ones, so the only equilibrium is for only unskilled providers to be on the market and for trade to be depressed, or indeed absent. By putting a floor on the incompetence of professionals, licensing requirements stop the unraveling of the market and increase demand. They get us out of the market for lemons.

This occurred to me the other day, but it's obvious enough that I'm sure someone wrote it up long ago; where? (And did I read it and forget about it?)

(After-notes: 1. Of course, having told the story I have no idea if it's true of actual markets for professional services; learning that would require rather delicate empirical investigations. Checking the restraint-of-trade fable from Milton Friedman would, naturally, require those same investigations. 2. This doesn't rationalize why professions should be so largely self-governing, nor does it rule out the idea that some licensing requirements are counter-productive barriers to entry. 3. Replacing professional certification with some sort of market-based entity telling consumers about the quality of professional service-sellers won't work, for all the usual reasons that competitive markets are incapable of adequately providing information — to say nothing of the difficulty of telling whether the raters know what they're talking about. 4. Universities are accredited because students and parents would otherwise be in a market for lemons. Universities themselves, however, can tell how skilled those selling academic services are — or at least they're supposed to have that ability. 5. I should re-read Phil Agre on the professionalization of everything and see if it holds up.)

The Dismal Science

Posted by crshalizi at October 13, 2009 22:26 | permanent link

October 09, 2009

Twilight of the Market Gods

My review of Justin Fox's Myth of the Rational Market in American Scientist is out. (Shorter me: read the book.) Sometime soon I'll put up a version with links, which alas don't work in print.

Manual trackback: 3 Quarks Daily

The Dismal Science

Posted by crshalizi at October 09, 2009 00:54 | permanent link

October 08, 2009

Wit and Wisdom of Pittsburgh Bar Patrons (Part 1)

"They [= the Steelers] are like this utterly adorable, totally hot girl next door, who you suddenly realize is everything you've ever wanted in a football team — I mean, girlfriend."

Heard About Pittsburgh PA

Posted by crshalizi at October 08, 2009 23:00 | permanent link

"Completely Random Measures for Bayesian Nonparametrics" (This Year at the DeGroot Lecture)

Attention conservation notice: Only of interest if you (1) care about specifying probability distributions on infinite-dimensional spaces for use in nonparametric Bayesian inference, and (2) are in Pittsburgh.

The CMU statistics department sponsors an annual distinguished lecture series in memory of our sainted founder, Morris H. DeGroot. This year, the lecturer is Michael Jordan. (I realize that's a common name; I mean the one my peers and I wanted to be when we grew up.)

"Completely Random Measures for Bayesian Nonparametrics"
Abstract: Bayesian nonparametric modeling and inference are based on using general stochastic processes as prior distributions. Despite the great generality of this definition, the great majority of the work in Bayesian nonparametrics is based on only two stochastic processes: the Gaussian process and the Dirichlet process. Motivated by the needs of applications, I present a broader approach to Bayesian nonparametrics in which priors are obtained from a class of stochastic processes known as "completely random measures" (Kingman, 1967). In particular I will present models based on the beta process and the Bernoulli process, and will discuss an application of these models to the analysis of motion capture data in computational vision.
(Joint work with Emily Fox, Erik Sudderth and Romain Thibaux.)
Time and place: 4:15 pm on Friday, 16 October 2009, in the Giant Eagle Auditorium in Baker Hall (room A51)

Update: I counted over 210 people in the audience.

Enigmas of Chance

Posted by crshalizi at October 08, 2009 15:02 | permanent link

"High Dimensional Nonlinear Learning using Local Coordinate Coding" (Next Week at the Statistics Seminar)

Attention conservation notice: Only of interest if you (1) care about statistical learning in high-dimensional spaces and (2) are in Pittsburgh.

Since manifold learning has been on my mind this week, owing to trying to teach it in data-mining, I am extra pleased by the scheduling of this talk:

"High Dimensional Nonlinear Learning using Local Coordinate Coding"
Prof. Tong Zhang, Rutgers University
Abstract: We present a new method for learning nonlinear functions in high dimension using semisupervised learning. Our method includes a phase of unsupervised basis learning and a phase of supervised function learning. The learned bases provide a set of anchor points to form a local coordinate system, such that each data point on a high dimensional manifold can be locally approximated by a linear combination of its nearby anchor points, with the linear weights offering its local-coordinate coding. We show that a high dimensional nonlinear function can be approximated by a global linear function with respect to this coding scheme, and the approximation quality is ensured by the locality of such coding. The method turns a difficult nonlinear learning problem into a simple global linear learning problem, which overcomes some drawbacks of traditional local learning methods. The empirical success of our approach has been demonstrated in a recent pascal image classification competition, where the top performance was achieved by an NEC system using this idea.
(Joint work with Kai Yu at NEC Lab America.)
Time and place: 4 pm on Monday, 12 October 2009, in Doherty Hall 310

As always, the seminar is free and open to the public.

Enigmas of Chance

Posted by crshalizi at October 08, 2009 15:01 | permanent link

October 05, 2009

In re John Holland

Having vowed two weeks ago to post something positive at least once a week, I missed last week, with the excuse of being back in Ann Arbor for the celebration of John Holland's 80th birthday at the Center for the Study of Complex Systems. There was no time to post, or even to see everyone I wanted to, but I did actually start writing something about Holland's scientific work, only to realize yesterday I was merely engaged in self-plagiarism, from this, this and this, and probably other things I'd written too, because reading Holland has quite profoundly shaped my thinking. So I'll just point you to the back-catalogue, as it were, and get back to revising a paper I'd never have written if I hadn't read Adaptation in Natural and Artificial Systems.

(So long as I'm talking about the workshop, and without any slight to the other presentations, the neatest work was that by Stephanie Forrest et al. on using genetic programming to evolve bug fixes.)

Complexity; Minds, Brains, and Neurons; Enigmas of Chance

Posted by crshalizi at October 05, 2009 14:30 | permanent link

October 02, 2009

"Analyzing Networks and Learning with Graphs"

See you in Whistler?

Analyzing Networks and Learning with Graphs


a workshop in conjunction with

23nd Annual Conference on Neural Information Processing Systems (NIPS 2009)


December 11 or 12, 2009 (exact date TBD) Whistler, BC, Canada

Deadline for Submissions: Friday, October 30, 2009
Notification of Decision: Monday, November 9, 2009

Overview:

Recent research in machine learning and statistics has seen the proliferation of computational methods for analyzing networks and learning with graphs. These methods support progress in many application areas, including the social sciences, biology, medicine, neuroscience, physics, finance, and economics.

The primary goal of the workshop is to actively promote a concerted effort to address statistical, methodological and computational issues that arise when modeling and analyzing large collection of data that are largely represented as static and/or dynamic graphs. To this end, we aim at bringing together researchers from applied disciplines such as sociology, economics, medicine and biology, together with researchers from more theoretical disciplines such as mathematics and physics, within our community of statisticians and computer scientists. Different communities use diverse ideas and mathematical tools; our goal is to to foster cross-disciplinary collaborations and intellectual exchange.

Presentations will include novel graph models, the application of established models to new domains, theoretical and computational issues, limitations of current graph methods and directions for future research.

Online Submissions

We welcome the following types of papers:
  1. Research papers that introduce new models or apply established models to novel domains,
  2. Research papers that explore theoretical and computational issues, or
  3. Position papers that discuss shortcomings and desiderata of current approaches, or propose new directions for future research.
All submissions will be peer-reviewed; exceptional work will be considered for oral presentation. We encourage authors to emphasize the role of learning and its relevance to the application domains at hand. In addition, we hope to identify current successes in the area, and will therefore consider papers that apply previously proposed models to novel domains and data sets.

Submissions should be 4-to-8 pages long, and adhere to NIPS format. Please email your submissions to: nipsgraphs2009 [at] gmail [dot] com

Workshop Format

This is a one-day workshop. The program will feature invited talks, poster sessions, poster spotlights, and a panel discussion. All submissions will be peer-reviewed; exceptional work will be considered for oral presentation. More details about the program will be announced soon.

Organizers

Networks; Enigmas of Chance; Incestuous Amplification

Posted by crshalizi at October 02, 2009 10:24 | permanent link

September 30, 2009

Books to Read While the Algae Grow in Your Fur, September 2009

Alexander Rosenberg, Economics: Mathematical Politics or Science of Diminishing Returns?
Rosenberg is a philosopher of science focused on biology and neo-classical economics. This is his second attempt at assessing the latter. (I haven't read his first go-round and it doesn't seem to be necessary.) Ordinarily, he says, the goal of science is to make increasing accurate and precise predictions about the world. (This includes the historical sciences like geology and paleontology, which predict new evidence rather than new events at definite future times. [Different branches of astronomy actually make both kinds of predictions.]) Economics, however, is incredibly bad at prediction — certainly at the improving-precision part. Rosenberg does not argue this point very strongly, taking it to be more or less obvious to anyone familiar with the state of economics, and I'm inclined to agree. This picture might be complicated by a detailed consideration of applied econometric models, but even when those work, they are very poorly grounded in economic theory*. (Incidentally, one of the pleasures of reading this was seeing Rosenberg assault Friedman's "Methodology of Positive Economics" essay, whose influence has been profound and utterly malign.) Rosenberg then has two questions: (1) if economics does share the usual goal of science, what are its prospects for achieving it? (2) if it does not have that goal, what is it trying to achieve — or, perhaps, better, what is the kind of thing economists do and want to keep doing fitted to achieve?
As to (1) he is intensely skeptical, because he sees microeconomic explanations as grounded in intentional explanations, a not-too-compelling formalization of folk psychology (desires mapping to utility functions and beliefs to subject probability distributions). He is extremely skeptical, on strictly philosophical grounds, about intentional explanations being made much better in the future than they have been through recorded history. His also skeptical that we could ever have something like a cognitive-scientific or neuro-scientific theory which explains behavior and recovers folk psychology as a useful approximation in certain domains; I completely fail to follow his argument here. What I seem to understand would imply that he thinks thermostats and self-guided missiles are impossible, so if I am right he should really listen to Uncle Norbert, especially before the inevitable robot uprising (I can just see his last words being, a la pp. 142--143, "this robot can't really be trying to kill me, because if that intention were represented in one part of its computational system, l1, who is the interpreter who treats the configuration of memory registers in l1 as expressing this goal? Surely it must be some other sub-system, call it l2, which reads l1, but then we face the same question all over again for l2 — urk!"); but no doubt I am wrong and he has some more reasonable idea which he does not, however, convey. The resounding experimental failure of maximizing-subjectively-expected-utility theory is not addressed. (Perhaps this was less clear in 1994 than it is now, but I doubt it.) I suspect that he would feel any of the models of choice proposed in behavioral economics are subject to the same critique, mutatis mutandis, that he makes of conventional microeconomics, because they're basically intentional.
As to (2), Rosenberg argues as follows. There is a three-way relationship between a discipline's goals, its theories, and its methods: given the goals (say, maximizing predictive accuracy), the theories tell us something about how well different methods will meet the goals. Likewise if you fix the goals and methods, only certain kinds of theories will be acceptable or reachable. And if you fix the theories and methods, you constrain the goals you can attain. (Rosenberg's argument here is very close to that of Larry Laudan in his great book Science and Values, and I think it's correct.) If we take neo-classical methods and theories as given, what might economics be successfully aiming at? Clearly not, by the previous argument, scientific prediction. Rather, Rosenberg offers two possibilities, not mutually exclusive. On the one hand, maybe it's really a species of hyper-formalized social-contract theory from political philosophy, with (as he says) the Walrasian auctioneer in the role of Hobbes's sovereign. Or: maybe it's a species of applied mathematics, interested in the implications of interacting transitive preference orderings. As he says, applied mathematicians are rarely interested in whether their math can, in fact, be applied to the real world — that's not their department.
Excusing economics's poor track-record as an empirical science by saying it's really political philosophy and/or applied math may be a defense worse than the original accusation. As Rosenberg notes, it makes the idea of attending to what economists have to say about policy matters rather odd; at best one should listen to them as much as to any other sect of political philosophers. I would suspect that Rosenberg was proposing this maliciously, but he seems to be sincere and not just good at writing with a straight face. I don't think economics is in quite such a plight as he does, but having just put the book down I admit I'm hard-pressed to articulate why.
*: For instance, when real business cycle theorists and their kin fit dynamic stochastic "general equilibrium"** models to empirical time-series of macro-economic quantities, these time series are first de-trended, i.e., made stationary. This has no justification in the representative-agent story underlying the models, but seems, at present, to be essential to actually getting estimates. Typically the de-trending is done through the "Hodrick-Prescott" filter***, again with no theoretical justification, and the business cycle is operationally defined as "the residuals of the filter". I suspect that most of the predictive ability of DSGEs comes from the filter, plus implicitly doing a moving-average smoothing of the residuals. It would be interesting to pit them against naive nonparametric forecasting (along say these lines).
**: I use the scare-quotes because I don't agree that representative agent models are general equilibrium models.
***: Known in statistics decades before Hodrick and Prescott as a "smoothing spline". (The word "spline" does not appear in their paper, and they are entirely innocent of the vast literature on how much smoothing to do.)
Sarah Graves, Wicked Fix
More cozy comfort-reading about sordid multiple homicide. (But whatever happened to Sam's girlfriend from the previous book?)
John Billheimer, Highway Robbery
Well-written, amusing and absorbing mystery novel about a family of highway engineers in West Virginia. The only thing keeping it from being perfect-for-me mind-candy is that part of the plot turns on making fun of environmentalists; but you can't have everything. This is the second book in a series; I've not read the others but will look them up.
Bent Jesper Christensen and Nicholas M. Kiefer, Economic Modeling and Inference
Review: An Optimal Path to a Dead End.
Joe Hill and Gabriel Rodriguez, Locke and Key, vol. 2: Head Games
High-grade comic book mind-candy. Definitely needs the earlier book.
Chelsea Cain, Evil at Heart
Great, if somewhat stomach-turning, mind-candy. (Probably needs the earlier books.) I actually wish Cain did more with the media-frenzy angle, however.
Possible continuity error: Isn't Susan awfully unconcerned about leaving her car alone in really dodgy neighborhoods, after she broke out one of its windows?
Thomas Levenson, Newton and the Counterfeiter: The Unknown Detective Career of the World's Greatest Scientist
Or: Newton demands the noose. — A wonderfully readable little biography of Newton, with the hook of looking at how he tackled his second career as Warden of the Mint, in charge of actually producing the English currency, and of catching and punishing counterfeiters. In particular, Levenson focuses on Newton's pursuit of a counterfeiter of particular skill and temerity, one William Chaloner, providing a great opportunity to explain the criminal underworld in which such figures lived, and the vast opportunities opening up for them as a result of the social transformations of which Newton was at once symbol, beneficiary and further driver. (Any idiot understands stealing a hunk of metal, and almost any idiot can grasp substituting pewter for silver, but the higher reaches of monetary crime require numeracy and comfort with sophisticated abstractions.) This is, in short, a portrait of the foundations of our world being laid, from the intellectual system of rational scientific explanation, to states powerful enough to enforce written laws on millions and raise the funds needed to wage war across the world, to through global commerce and flows of money, and stock-market Ponzi schemes in which geniuses lose fortunes. Enthusiastically recommended if any of this sounds the least bit appealing.
Kat Richardson, Vanished
Mind-candy: An American shaman in London. Ends in media res, though not with a cliff-hanger.
Halbert White, Estimation, Inference, and Specification Analysis
Review: How to Tell That Your Model Is Wrong; and, What Happens Afterwards.
House of Mystery: Love Stories for Dead People
More tales from the bar, plus a really unfortunate basement.
Tiziano Scalvi et al., The Dylan Dog Case Files
No purchase link because I actually dis-recommend it: predictable, tedious, implausible, not scary, excruciating when it tries to be funny, ultimately tiresome. (The drawing is I admit pretty good, but nowhere near the covers Mignola provides for the translated edition.) Is this really that popular in Italy? If so, does the original have virtues which did not survive translation, or does the old country simply have no taste at all in comics?
Phil and Kaja Foglio, Agatha Heterodyne and the Circus of Dreams and Agatha Heterodyne and the Clockwork Princess
Volumes 4 and 5 of Girl Genius. Go read.
I. J. Parker, The Convict's Sword
Converging murder cases in Heian-era Japan. Stands alone, but I enjoyed it more for knowing the back-story. (Previous volumes in the series: 1 and 2, 3, 4, 5.)
Madeleine E. Robins, Point of Honour
Your basic hard-boiled female private-eye detective novel, which also happens to be a historical mystery and a Regency romance; the charming love-child of Jane Austen, or perhaps Georgette Heyer, and Dashiell Hammett. I read it in one sitting from the beginning — "It is a truth universally acknowledged that a Fallen Woman of good family must, soon or late, descend to whoredom" — to the end, and really want the sequel.
(Read following up on an old review by Kate Nepveu.)
Update: The sequel is as good.

Books to Read While the Algae Grow in Your Fur; Enigmas of Chance; The Dismal Science; Pleasures of Detection, Portraits of Crime; Scientifiction and Fantastica; Philosophy; Writing for Antiquity; The Great Transformation

Posted by crshalizi at September 30, 2009 23:59 | permanent link

September 25, 2009

Miniature Pearl

It would be wrong to say that Judea Pearl knows more about causal inference than anyone else — I can think of some rivals very close to where I'm writing this — but he certainly knows a lot, and has worked tirelessly to formulate and spread the modern way of thinking about the subject, centered around graphical models and their associated structural equations. I remember spending many happy hours with his book Causality when it came out in 2000, and look forward to spending more with the new edition, which is making its way to me through the mail now. In the meanwhile, however, there is what he describes as "A new survey paper, gently summarizing everything I know about causation (in only 43 pages)":

"Causal Inference in Statistics: An Overview", forthcoming in Statistics Surveys 3 (2009): 96--146 [Free PDF]
Abstract: This review presents empirical researchers with recent advances in causal inference, and stresses the paradigmatic shifts that must be undertaken in moving from traditional statistical analysis to causal analysis of multivariate data. Special emphasis is placed on the assumptions that underly all causal inferences, the languages used in formulating those assumptions, the conditional nature of all causal and counterfactual claims, and the methods that have been developed for the assessment of such claims. These advances are illustrated using a general theory of causation based on the Structural Causal Model (SCM) described in Pearl (2000a), which subsumes and unifies other approaches to causation, and provides a coherent mathematical foundation for the analysis of causes and counterfactuals. In particular, the paper surveys the development of mathematical tools for inferring (from a combination of data and assumptions) answers to three types of causal queries: (1) queries about the effects of potential interventions, (also called "causal effects" or "policy evaluation") (2) queries about probabilities of counterfactuals, (including assessment of "regret," "attribution" or "causes of effects") and (3) queries about direct and indirect effects (also known as "mediation"). Finally, the paper defines the formal and conceptual relationships between the structural and potential-outcome frameworks and presents tools for a symbiotic analysis that uses the strong features of both.

The paper assumes a reader who's reasonably well-grounded in statistics, though not necessarily in the causal-inference literature. (Of such readers, I imagine applied economists might have more unlearning to do than most, because they will keep asking "but when do I start estimating beta?") It's not ideally calibrated for an reader coming from, say, machine learning.

One theme running through the paper is the futility of trying to define causality in purely probabilistic terms, and the fact that cases where it looks like one can do so are really cases where causal assumptions have been smuggled in. Another is that once you realize counterfactual or mechanistic assumptions are needed, the graphical-models/structural equation framework makes it immensely easier to reason about them than does the rival "potential outcomes" framework. In fact, the objects which the potential outcomes framework takes as its primitives can be constructed within the structural framework, so the correct part of the former is a subset of the latter. And by reasoning on graphical models it is easy to see that confounding can be introducing by "controlling for" the wrong variables, something explicitly denied by leading members of the potential-outcomes school. (Pearl quotes them making this mistake, and manages to pull off a more-in-sorrow-than-in-glee tone while doing so.) Mostly, however, the paper is about showing off what can be done within the new framework, which is really pretty impressive, and ought to be part of the standard tool-kit of data analysis. If you are not already familiar with it, this is an excellent place to begin, and if you are you will enjoy the elegant and comprehensive presentation.


Looking back over what I write in this blog, I feel like, on the one hand, there's too little of it lately, and on the other hand, it's too tilted towards negative, critical stuff. While not regretting at all being negative and critical about stupid ideas that need to be criticized (or, really, pulverized), I will try to expand and balance my output by posting at least once a week on some good science. We'll see how this goes.


Enigmas of Chance

Posted by crshalizi at September 25, 2009 10:12 | permanent link

Three-Toed Sloth:   Hosted, but not endorsed, by the Center for the Study of Complex Systems