<?xml version="1.0"?>
<!-- name="generator" content="blosxom/2.0.2" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
  <channel>
    <title>Three-Toed Sloth   </title>
    <link>http://bactra.org/weblog</link>
    <description>Slow Takes from the Canopy (My Very Own Internet Tradition)</description>
    <language>en</language>

  <item>
    <title>(I Don't Care About My) Bad Reputation</title>
    <link>http://bactra.org/weblog/634.html</link>
    <description>
&lt;blockquote&gt;&lt;em&gt;Attention conservation notice:&lt;/em&gt; Unskillful nattering about
pop-culture ephemera.&lt;/blockquote&gt;

For the sake of my own sanity, I prefer to remain ignorant of the occult
processes by which the direct mail gods decide to which catalogues to send to
which people.
(There's &lt;a href=&quot;http://ebusiness.mit.edu/research/papers/180_Simester_Catalog.pdf&quot;&gt;too
much dynamic programming involved&lt;/a&gt;.)  Today, for instance, they decided to
inflict upon me the official Barbie doll spring 2010 collection preview, and
like a fool I couldn't resist looking through it.  Thus my life is made that
much worse by learning that there is
a &lt;a href=&quot;http://www.barbiecollector.com/shop/product.aspx?sku=R4461&amp;shelfID=150007&quot;&gt;Joan
Jett Barbie doll&lt;/a&gt;.  (I thought about embedding an image, but in this case
pain shared is &lt;em&gt;not&lt;/em&gt; pain eased.)  I think I finally grasp what
people mean when they talk about later cultural products
assaulting &lt;a href=&quot;http://www.youtube.com/watch?v=mPWFgJGjPOM&quot;&gt;parts of their
childhood&lt;/a&gt;, in this case one I didn't even realize I valued.

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_linkage.html&quot;&gt;Linkage&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>&quot;Homophily, Contagion, Confounding: Pick Any Three&quot;</title>
    <link>http://bactra.org/weblog/633.html</link>
    <description>
&lt;P&gt;A number of people have asked for my slides from
the &lt;a href=&quot;http://www.iq.harvard.edu/mersih_conference&quot;&gt;MERSIH&lt;/a&gt; conference
the other week.  So,
&lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/MERSIH.pdf&quot;&gt;here they are&lt;/a&gt;.
(Anyone who was at my talk at SFI about a year ago will recognize the title,
and much of the content.)  I'm presently turning this into a proper manuscript,
so comments are welcome.  Please don't rip it off; I'll become very cross and
may even hold my breath until I turn blue and pass out, and won't you be sorry
then?

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_networks.html&quot;&gt;Networks&lt;/a&gt;;
&lt;a href=&quot;cat_enigmas_of_chance.html&quot;&gt;Enigmas of Chance&lt;/a&gt;;
&lt;a href=&quot;cat_complexity.html&quot;&gt;Complexity&lt;/a&gt;;
&lt;a href=&quot;cat_selfcentered.html&quot;&gt;Self-Centered&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>&quot;Statistical Analysis of Stellar Evolution&quot; (Next Week at the Statistics Seminar)</title>
    <link>http://bactra.org/weblog/632.html</link>
    <description>
&lt;P&gt;In which the starry heavens above submit to statistical analysis:

&lt;dl&gt;
&lt;dt&gt;&lt;a href=&quot;http://www.ics.uci.edu/~dvd/&quot;&gt;David van Dyk&lt;/a&gt;, &quot;Statistical
Analysis of Stellar Evolution&quot;&lt;/dt&gt;
&lt;dd&gt;&lt;em&gt;Abstract:&lt;/em&gt; Color-Magnitude Diagrams (CMDs) are plots that compare
the magnitudes (luminosities) of stars in different wavelengths of light
(colors).  High non-linear correlations among the mass, color and surface
temperature of newly formed stars induce a long narrow curved point cloud in a
CMD known as the main sequence. Aging stars form new CMD groups of red giants
and white dwarfs.  The physical processes that govern this evolution can be
described with mathematical models and explored using complex computer
models. These calculations are designed to predict the plotted magnitudes as a
function of parameters of scientific interest such as stellar age, mass, and
metallicity.  Here, we describe how we use the computer models as a component of
a complex likelihood function in a Bayesian analysis that requires
sophisticated computing, corrects for contamination of the data by field stars,
accounts for complications caused by unresolved binary-star systems, and aims
to compare competing physics-based computer models of stellar evolution. &lt;/dd&gt;
&lt;dd&gt;This is joint work with Steven DeGennaro, Nathan Stein, William
H. Jefferys, Ted von Hippel, and Elizabeth Jeffery.&lt;/dd&gt;
&lt;dd&gt;&lt;em&gt;Place and time:&lt;/em&gt; Doherty Hall A310, Monday, 23 November, 4--5 pm.&lt;/dd&gt;
&lt;/dl&gt;

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_enigmas_of_chance.html&quot;&gt;Enigmas of Chance&lt;/a&gt;;
&lt;a href=&quot;cat_the_eternal_silence_of_these_infinite_spaces.html&quot;&gt;The Eternal Silence of These Infinite Spaces&lt;/a&gt;;
&lt;a href=&quot;cat_physics.html&quot;&gt;Physics&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>&quot;Some Things Statisticians Do at Google&quot; (Next Week at the Statistics Seminar)</title>
    <link>http://bactra.org/weblog/631.html</link>
    <description>
&lt;P&gt;&lt;blockquote&gt;&lt;em&gt;Attention conservation notice:&lt;/em&gt; Of no use to you unless
(1) you want to know what statisticians do at search-engine companies
and (2) you are in Pittsburgh.&lt;/blockquote&gt;

&lt;dl&gt;
&lt;dt&gt;Mike Meyer, &quot;Some Things Statisticians Do at Google&quot;&lt;/dt&gt;
&lt;dd&gt;&lt;em&gt;Abstract:&lt;/em&gt; I'll talk about a number of projects at Google where statisticians
have made a large contribution.  There will not be a lot of technical
details.  In some cases I will just describe the problem.&lt;/dd&gt;
&lt;dd&gt;The major example will be a description of the statistical and
engineering infrastructure to support live traffic experiments
at Google.&lt;/dd&gt;
&lt;dd&gt;A common theme of the problems is the importance of understanding
basic statistical principles that can be applied and modified to
handle new data and new circumstances.&lt;/dd&gt;
&lt;dd&gt;&lt;em&gt;Place and time:&lt;/em&gt; Monday, 16 November at 4 pm, in Doherty Hall
A310&lt;/dd&gt;
&lt;/dl&gt;

&lt;P&gt;As always, the talk is free and open to the public.

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_enigmas_of_chance.html&quot;&gt;Enigmas of Chance&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>The Shadow Price of Power</title>
    <link>http://bactra.org/weblog/630.html</link>
    <description>
&lt;blockquote&gt;&lt;em&gt;Attention conservation notice:&lt;/em&gt; Quasi-teaching note giving
an economic interpretation of the Neyman-Pearson lemma on statistical
hypothesis testing.&lt;/blockquote&gt;

&lt;P&gt;Suppose we want to pick out some sort of signal from a background of noise.
As every schoolchild knows, any procedure for doing this,
or &lt;strong&gt;test&lt;/strong&gt;, divides the data space into two parts, the one where
it says &quot;noise&quot; and the one where it says &quot;signal&quot;.* Tests will make two kinds
of mistakes: they can can take noise to be signal, a &lt;strong&gt;false
alarm&lt;/strong&gt;, or can ignore a genuine signal as noise,
a &lt;strong&gt;miss&lt;/strong&gt;.  Both the signal and the noise are stochastic, or we
can treat them as such anyway.  (Any determinism distinguishable from chance is
just insufficiently complicated.)  We want tests where
the &lt;em&gt;probabilities&lt;/em&gt; of both types of errors are small.  The probability
of a false alarm is called the &lt;strong&gt;size&lt;/strong&gt; of the test; it is the
measure of the &quot;say 'signal'&quot; region under the noise distribution.  The
probability of a miss, as opposed to a false alarm, has no short name in the
jargon, but one minus the probability of a miss &amp;mdash; the probability of
detecting a signal when it's present &amp;mdash; is called &lt;strong&gt;power&lt;/strong&gt;.

&lt;P&gt;Suppose we know the probability density of the noise &lt;i&gt;p&lt;/i&gt; and that of
the signal is &lt;i&gt;q&lt;/i&gt;.  The Neyman-Pearson lemma, as many though not all
schoolchildren know, says that then, among all tests off a given size &lt;i&gt;s&lt;/i&gt;,
the one with the smallest miss probability, or highest power, has the form &quot;say
'signal' if &lt;i&gt;q&lt;/i&gt;(&lt;i&gt;x&lt;/i&gt;)/&lt;i&gt;p&lt;/i&gt;(&lt;i&gt;x&lt;/i&gt;) &gt; &lt;i&gt;t&lt;/i&gt;(&lt;i&gt;s&lt;/i&gt;),
otherwise say 'noise',&quot; and that the threshold &lt;i&gt;t&lt;/i&gt; varies inversely
with &lt;i&gt;s&lt;/i&gt;.  The quantity &lt;i&gt;q&lt;/i&gt;(&lt;i&gt;x&lt;/i&gt;)/&lt;i&gt;p&lt;/i&gt;(&lt;i&gt;x&lt;/i&gt;) is
the &lt;strong&gt;likelihood ratio&lt;/strong&gt;; the Neyman-Pearson lemma says that to
maximize power, we should say &quot;signal&quot; if its sufficiently &lt;em&gt;more likely&lt;/em&gt;
than noise.

&lt;P&gt;The likelihood ratio indicates how different the two distributions &amp;mdash;
the two &lt;strong&gt;hypotheses&lt;/strong&gt; &amp;mdash; are at &lt;i&gt;x&lt;/i&gt;, the data-point we
observed.  It makes sense that the outcome of the hypothesis test should depend
on this sort of discrepancy between the hypotheses.  But why
the &lt;em&gt;ratio&lt;/em&gt;, rather than, say, the difference &lt;i&gt;q&lt;/i&gt;(&lt;i&gt;x&lt;/i&gt;)
- &lt;i&gt;p&lt;/i&gt;(&lt;i&gt;x&lt;/i&gt;), or a signed squared difference, etc.?  Can we make this
intuitive?

&lt;P&gt;Start with the fact that we have an optimization problem under a constraint.
Call the region where we proclaim &quot;signal&quot; &lt;i&gt;R&lt;/i&gt;.  We want to maximize its
probability when we are seeing a signal, &lt;i&gt;Q&lt;/i&gt;(&lt;i&gt;R&lt;/i&gt;), while constraining
the false-alarm probability, &lt;i&gt;P&lt;/i&gt;(&lt;i&gt;R&lt;/i&gt;)
= &lt;i&gt;s&lt;/i&gt;.  &lt;a href=&quot;http://dbpubs.stanford.edu:8091/~klein/lagrange-multipliers.pdf&quot;&gt;Lagrange&lt;/a&gt;
tells us that the way to do this is to minimize &lt;i&gt;Q&lt;/i&gt;(&lt;i&gt;R&lt;/i&gt;)
- &lt;i&gt;t&lt;/i&gt;[&lt;i&gt;P&lt;/i&gt;(&lt;i&gt;R&lt;/i&gt;) - &lt;i&gt;s&lt;/i&gt;] over &lt;i&gt;R&lt;/i&gt; and &lt;i&gt;t&lt;/i&gt; jointly.
So far the usual story; the next turn is usually &quot;as you remember from the
calculus of variations...&quot;

&lt;P&gt;Rather than actually doing math, let's think like economists.  Picking the
set &lt;i&gt;R&lt;/i&gt; gives us a certain benefit, in the form of the
power &lt;i&gt;Q&lt;/i&gt;(&lt;i&gt;R&lt;/i&gt;), and a cost, &lt;i&gt;t&lt;/i&gt;&lt;i&gt;P&lt;/i&gt;(&lt;i&gt;R&lt;/i&gt;).
(The &lt;i&gt;ts&lt;/i&gt; term is the same for all &lt;i&gt;R&lt;/i&gt;.)  Economists, of course, tell
us to equate &lt;em&gt;marginal&lt;/em&gt; costs and benefits.  What is the marginal
benefit of expanding &lt;i&gt;R&lt;/i&gt; to include a small neighborhood around the point
&lt;i&gt;x&lt;/i&gt;?  Just, by the definition of &quot;probability
density&quot;, &lt;i&gt;q&lt;/i&gt;(&lt;i&gt;x&lt;/i&gt;).  The marginal cost is
likewise &lt;i&gt;t&lt;/i&gt;&lt;i&gt;p&lt;/i&gt;(&lt;i&gt;x&lt;/i&gt;).  We should include &lt;i&gt;x&lt;/i&gt; in &lt;i&gt;R&lt;/i&gt;
if &lt;i&gt;q&lt;/i&gt;(&lt;i&gt;x&lt;/i&gt;) &gt; &lt;i&gt;t&lt;/i&gt;&lt;i&gt;p&lt;/i&gt;(&lt;i&gt;x&lt;/i&gt;),
or &lt;i&gt;q&lt;/i&gt;(&lt;i&gt;x&lt;/i&gt;)/&lt;i&gt;p&lt;/i&gt;(&lt;i&gt;x&lt;/i&gt;) &gt; &lt;i&gt;t&lt;/i&gt;.  The boundary of &lt;i&gt;R&lt;/i&gt;
is where marginal benefit equals marginal cost, and that is why we need the
likelihood &lt;em&gt;ratio&lt;/em&gt; and not the likelihood &lt;em&gt;difference&lt;/em&gt;, or
anything else.  (Except for a monotone transformation of the ratio, e.g. the
log ratio.)  The likelihood ratio threshold &lt;i&gt;t&lt;/i&gt; is, in fact, the
&lt;a href=&quot;http://en.wikipedia.org/wiki/Shadow_price&quot;&gt;shadow price&lt;/a&gt; of
statistical power.

&lt;P&gt;I am pretty sure I have not seen or heard the Neyman-Pearson lemma explained
marginally before, but in retrospect it seems too simple to be new, so pointers
would be appreciated.

&lt;P&gt;&lt;em&gt;Manual trackback&lt;/em&gt;: &lt;a href=&quot;http://barrdear.com/john/2009/11/09/the-likelihood-ratio-threshold-is-the-shadow-price-of-statistical-power/&quot;&gt;John Barrdear&lt;/a&gt;

&lt;P&gt;&lt;em&gt;Updates&lt;/em&gt;: Thanks to David Kane for spotting a typo.

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;*: Yes, you could have a randomized test procedure,
but the situations where those actually help pretty much define &quot;boring,
merely-technical complications.&quot;&lt;/span&gt;

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_enigmas_of_chance.html&quot;&gt;Enigmas of Chance&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>36-350, Data Mining: Course Materials (Fall 2009)</title>
    <link>http://bactra.org/weblog/617.html</link>
    <description>
&lt;P&gt;My &lt;a href=&quot;615.html&quot;&gt;lesson-plan&lt;/a&gt; having survived first contact with
the &lt;strike&gt;enemy&lt;/strike&gt; students, it's time to start posting the lecture
handouts &amp;amp; c.  This page will be updated as the semester goes on; the RSS
feed for it should be &lt;a href=&quot;617.rss&quot;&gt;here&lt;/a&gt;.
The &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/&quot;&gt;class homepage&lt;/a&gt; has more
information.

&lt;ol start=&quot;0&quot;&gt;
&lt;li&gt; &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/00/00.html&quot;&gt;Introduction
to the course&lt;/a&gt; (24 August) What is data mining? how is it used? where did it
come from? Some themes.
&lt;li&gt; &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/01/lecture-01.pdf&quot;&gt;Information
retrieval and similarity searching I&lt;/a&gt; (26 August) Finding the data you are
looking for.  Ideas we will avoid: meta-data and cataloging; meanings.  Textual
features.  The bag-of-words representation; its vector form.  Measuring
similarity and distance for vectors.  Example with the &lt;cite&gt;&lt;a href=&quot;http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T19&quot;&gt;New York Times
Annotated Corpus&lt;/a&gt;&lt;/cite&gt;.
&lt;li&gt; &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/02/lecture-02.pdf&quot;&gt;IR continued&lt;/a&gt; (28 August).  The
trick to searching: queries are documents.  Search evaluation: precision,
recall, precision-recall curves; error rates.  Classification: nearest
neighbors and prototypes; classifier evaluation by mis-classification rate and
by confusion matrices.  Inverse document frequency weighting.  Visualizing
high-dimensional data by multi-dimensional scaling.  Miscellaneous topics:
stemming, incorporating user feedback.

&lt;P&gt;Homework 1, due 4 September: &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/hw/01/01.pdf&quot;&gt;assignment&lt;/a&gt;,
&lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/hw/01/01.R&quot;&gt;R&lt;/a&gt;, &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/hw/01/nyt_corpus.zip&quot;&gt;data&lt;/a&gt;; &lt;strong&gt;&lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/hw/01/solution-01.pdf&quot;&gt;SOLUTIONS&lt;/A&gt;&lt;/strong&gt;

&lt;li&gt; &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/03/03.pdf&quot;&gt;Page
Rank&lt;/a&gt; (31 August).  Links as pre-existing feedback.  How to exploit link
information?  The random walk on the graph; using the ergodic theorem.
Eigenvector formulation of page-rank.  Combining page-rank with textual
features.  Other applications.  Further reading on information retrieval.
&lt;li&gt; &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/04/lecture-04.pdf&quot;&gt;Image
Search, Abstraction and Invariance&lt;/a&gt; (2 September).  Similarity search for
images.  Back to representation design.  The advantages of abstraction:
simplification, recycling.  The bag-of-colors representation.  Examples.
Invariants.  Searching for images by searching text.  An example in practice.
&lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/04/Lecture_04_Image_Search.pdf&quot;&gt;Slides for this lecture&lt;/a&gt;.
&lt;li&gt; &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/05/lecture-05.pdf&quot;&gt;Information
Theory I&lt;/a&gt; (4 September).  Good features help us guess what we can't
represent.  Good features discriminate between different values of unobserved
variables.  Quantifying uncertainty with entropy.  Quantifying reduction in
uncertainty/ discrimination with mutual information.  Ranking features based on
mutual information.  Examples, with code, of informative words for
the &lt;cite&gt;Times&lt;/cite&gt;.  &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/05/lecture-05.R&quot;&gt;Code&lt;/a&gt;.
&lt;br&gt;Supplementary reading: David
P. Feldman, &lt;a href=&quot;http://hornacek.coa.edu/dave/Tutorial/&quot;&gt;Brief Tutorial on
Information Theory&lt;/a&gt;, chapter 1

&lt;P&gt;Homework 2, due 11 September: &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/hw/02/hw-02.pdf&quot;&gt;assignment&lt;/a&gt;;
&lt;strong&gt;&lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/hw/02/solutions-02.pdf&quot;&gt;SOLUTIONS
TEXT&lt;/A&gt;; &lt;A HREF=&quot;http://www.stat.cmu.edu/~cshalizi/350/hw/02/solutions-02.R&quot;&gt;SOLUTIONS R&lt;/A&gt;&lt;/strong&gt;

&lt;li&gt; &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/06/lecture-06.pdf&quot;&gt;Information Theory II&lt;/a&gt; (9
September).  Dealing with multiple features.  Joint entropy, the chain rule for
entropy.  Information in multiple features.  Conditional information, chain
rule for information, conditional independence.  Interactions, positive and
negative, and redundancy.  Greedy feature selection with low redundancy.
Example, with code, of selecting words for the &lt;cite&gt;Times&lt;/cite&gt;.  Sufficient
statistics and the information
bottleneck. &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/06/lecture-06.R&quot;&gt;Code&lt;/a&gt;.
&lt;br&gt;Supplementary reading; Aleks Jakulin and Ivan Bratko, &quot;Quantifying and
Visualizing Attribute
Interactions&quot;, &lt;a href=&quot;http://arxiv.org/abs/cs.AI/0308002&quot;&gt;arxiv:cs.AI/0308002&lt;/a&gt;
&lt;li&gt; &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/07/lecture-07.pdf&quot;&gt;Categorization;
Clustering I&lt;/a&gt; (11 September).  Dividing the world up into categories.
Classification: known categories with labeled examples.  Taxonomy of learning
problems (supervised, unsupervised, semi-supervised, feedback, ...).
Clustering: discovering unknown categories from unlabeled data.  Benefits of
clustering, with an digression on where official classes come from.  Basic
criterion for good clusters: lots of information about features from little
information about cluster.  Practical considerations: compactness, separation,
parsimony, balance.  Doubts about parsimony and balance.  The &lt;em&gt;k&lt;/em&gt;-means
clustering algorithm, or unlabeled prototype classification: analysis,
geometry, search.  Appendix: geometric aspects of the prototype and
nearest-neighbor method.

&lt;P&gt;Homework 3, due 18
September: &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/hw/03/hw-03.pdf&quot;&gt;assignment&lt;/a&gt;; &lt;bf&gt;&lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/hw/03/solutions-03.pdf&quot;&gt;SOLUTIONS&lt;/a&gt;&lt;/bf&gt;

&lt;li&gt; &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdf&quot;&gt;Clustering II&lt;/a&gt; (14 September).
Distances between partitions; variation-of-information distance.
Hierarchical clustering by agglomeration and its varieties.  Picking the
number of clusters by merging costs.  Performance of different clustering
methods on various doodles.  Why we would like to pick the number of clusters
by predictive performance, and why it is hard to do at this stage.  Reifying clusters.
&lt;li&gt; &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/09/lecture-09.pdf&quot;&gt;Transformations: Rescaling and
Low-Dimensional Summaries&lt;/a&gt; (16 September).  Improving on our original
features.  Re-scaling, standardization, taking logs, etc., of individual
features.  Forcing things to be Gaussian considered harmful.  Low-dimensional
summaries by combining features.  Exploiting geometry to eliminate redundancy.
Projections on to linear subspaces.  Searching for structure-preserving
projections.
&lt;li&gt; &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/10/lecture-10.pdf&quot;&gt;Principal Components I&lt;/a&gt; (18
September).  Principal components are the directions of maximum variance.
Derivation of principal components as the best approximation to the data in a
linear subspace.  Equivalence to variance maximization.  Avoiding explicit
optimization by finding eigenvalues and eigenvectors of the covariance matrix.
Example of principal components with cars; how to tell a sports car from a
minivan.  The standard recipe for doing PCA.  Cautions in interpreting
PCA.  &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/10/cars-fixed04.dat&quot;&gt;Data-set used in the notes&lt;/a&gt;.

&lt;P&gt;Homework 4, due 25
September: &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/hw/04/hw-04.pdf&quot;&gt;assignment&lt;/a&gt;; &lt;strong&gt;&lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/hw/04/solutions-04.pdf&quot;&gt;SOLUTIONS&lt;/a&gt;&lt;/strong&gt;

&lt;li&gt; &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/11/lecture-11.pdf&quot;&gt;Principal
Components II&lt;/a&gt; (21 September).  PCA + information retrieval = latent
semantic indexing; why LSI is a Good Idea.  PCA and multidimensional scaling.
&lt;li&gt; &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/12/lecture-12.pdf&quot;&gt;Factor
Analysis&lt;/a&gt; (23 and 25 September).  From PCA to factor analysis by adding
noise.  Roots of factor analysis in causal discovery: Spearman's general factor
model and the tetrad equations.  Problems with estimating factor models: number
of equations does not equal number of unknowns.  Solution 1, &quot;principal
factors&quot;, a.k.a. estimation through heroic feats of linear algebra.  Solution
2, maximum likelihood, a.k.a. estimation through imposing distributional
assumptions.  The rotation problem: the factor model &lt;em&gt;is&lt;/em&gt;
unidentifiable; the number of factors may be meaningful, but the individual
factors are not.
&lt;li&gt; &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/13/lecture-13.pdf&quot;&gt;The
Truth about PCA and Factor Analysis&lt;/a&gt; (28 September) PCA is data reduction
without any probabilistic assumptions about where the data came from.  Picking
number of components.  Faking predictions from PCA.  Factor analysis makes
stronger, probabilistic assumptions, and delivers stronger, predictive
conclusions --- which could be wrong.  Using probabilistic assumptions and/or
predictions to pick how many factors.  Factor analysis as a first, toy
instances of a graphical causal model.  The rotation problem once more with
feeling.  Factor models and mixture models.  Factor models and Thomson's
sampling model: an outstanding fit to a model with a few factors is actually
evidence of a &lt;em&gt;huge&lt;/em&gt; number of &lt;em&gt;badly measured&lt;/em&gt; latent variables.
Final advice: it all depends, but if you can only do one, try PCA.
&lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/13/thomson-model.R&quot;&gt;R
code for the Thomson sampling model&lt;/a&gt;.
&lt;li&gt; &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/14/lecture-14.pdf&quot;&gt;Nonlinear
Dimensionality Reduction I: Locally Linear Embedding&lt;/a&gt; (5 October).  Failure
of PCA and all other linear methods for nonlinear structures in data; spirals,
for example.  Approximate success of linear methods on small parts of nonlinear
structures.  Manifolds: smoothly curved surfaces embedded in higher-dimensional
Euclidean spaces.  Every manifold looks like a linear subspace on a
sufficiently small scale, so we should be able to patch together many small
local linear approximations into a global manifold.  Local linear embedding:
approximate each vector in the data as a weighted linear combination of
its &lt;em&gt;k&lt;/em&gt; nearest neighbors, then find the low-dimensional vectors best
reconstructed by these weights.  Solving the optimization problems by linear
algebra.  Coding up LLE.  A spiral
rainbow.  &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/14/lecture-14.R&quot;&gt;R&lt;/a&gt;.

&lt;li&gt;&lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/15/lecture-15.pdf&quot;&gt;Nonlinear
Dimensionality Reduction II: Diffusion Maps&lt;/a&gt; (9 October).  Making a graph
from the data; random walks on this graph.  The diffusion operator,
a.k.a. Laplacian.  How the Laplacian encodes the shape of the data.
Eigenvectors of the Laplacian as coordinates.  Connection to page-rank.
Advantages when data are not actually on a manifold.  Example.

&lt;P&gt;Pre-midterm review (12 October): highlights of the course to date; no
handout.
&lt;br&gt;MIDTERM (14
October): &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/exams/midterm/midterm.pdf&quot;&gt;exam&lt;/a&gt;, &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/exams/midterm/midterm-solutions.pdf&quot;&gt;solutions&lt;/a&gt;

&lt;P&gt;Homework 5, due 23 October:
&lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/hw/05/hw-05.pdf&quot;&gt;assignment&lt;/a&gt;;
&lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/hw/05/solutions-05.pdf&quot;&gt;solutions&lt;/a&gt;

&lt;li&gt;&lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/16/lecture-16.pdf&quot;&gt;Regression
I: Basics&lt;/a&gt;.  Guessing a real-valued random variable; why expectation values
are mean-square optimal point forecasts.  The regression function; why its
estimation must involve assumptions beyond the data.  The bias-variance
decomposition and the bias-variance trade-off.  First example of improving
prediction by introducing variance.  Ordinary least squares linear regression
as smoothing.  Other linear smoothers: &lt;em&gt;k&lt;/em&gt;-nearest-neighbors and kernel
regression.  How much should we
smooth?  &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/16/lecture-16-for-students.R&quot;&gt;R&lt;/a&gt;, &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/16/examples.dat&quot;&gt;data
for running example&lt;/a&gt;
&lt;li&gt; &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/17/lecture-17.pdf&quot;&gt;Regression
II: The Truth About Linear Regression&lt;/a&gt; (21 October).  Linear regression is
optimal linear (mean-square) prediction; we do this because we hope a linear
approximation will work well enough over a small range.  What linear regression
does: decorrelate the input features, then correlate them separately with the
response and add up.  The extreme weakness of the probabilistic assumptions
needed for this to make sense.  Difficulties of linear regression;
collinearity, errors in variables, shifting distributions of inputs, omitted
variables.  The usual extra probabilistic assumptions and their implications.
Why you should always looking at residuals.  Why you generally shouldn't use
regression for causal inference.  How to torment angels.  Likelihood-ratio
tests for restrictions of nice models.
&lt;li&gt; &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/18/lecture-18.pdf&quot;&gt;Regression III: Extending Linear
Regression&lt;/a&gt; (23 October).  Weighted least squares.  Heteroskedasticity:
variance is not the same everywhere.  Going to consult the oracle.  Weighted
least squares as a solution to heteroskedasticity.  Nonparametric estimation of
the variance function.  Local polynomial regression: local constants (= kernel
regression), local linear regression, higher-order local polynomials.  Lowess =
locally-linear smoothing for scatter plots.  The oracles fall silent.

&lt;P&gt;Homework 6, due Friday, 30 October: &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/hw/06/hw-06.pdf&quot;&gt;assignment&lt;/a&gt;, &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/hw/06/cadata.dat&quot;&gt;data set&lt;/a&gt;; &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/hw/06/solutions-06.pdf&quot;&gt;solutions&lt;/a&gt;

&lt;li&gt; &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/19/lecture-19.pdf&quot;&gt;Evaluating Predictive Models&lt;/a&gt; (26
and 28 October).  In-sample, out-of-sample and generalization loss or error;
risk as expected loss on new data.  Under-fitting, over-fitting, and examples
with polynomials.  Methods of model selection and controlling over-fitting:
empirical risk minimization, penalization, constraints/sieves, formal learning
theory, cross-validation.  Limits of
generalization.  &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/19/lecture-19.R&quot;&gt;R&lt;/a&gt; for creating figures.
&lt;li&gt; &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/20/lecture-20.pdf&quot;&gt;Smoothing
Methods in Regression&lt;/a&gt; (30 October).  How much smoothing should we do?
Approximation by local averaging.  How much smoothing we &lt;em&gt;should&lt;/em&gt; do to
find the unknown curve depends on how smooth the curve &lt;em&gt;really&lt;/em&gt; is,
which is unknown.  Adaptation as a partial substitute for actual knowledge.
Cross-validation for adapting to unknown smoothness.  Application: testing
parametric regression models by comparing them to nonparametric fits.  The
bootstrap principle.  Why ever bother with parametric
regressions?  &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/20/lecture-20.R&quot;&gt;R
code for some of the examples&lt;/a&gt;.

&lt;P&gt;Homework 7, due Friday, 6 November: &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/hw/07/hw-07.pdf&quot;&gt;assignment&lt;/a&gt;

&lt;li&gt; &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/21/lecture-21.pdf&quot;&gt;Additive
Models&lt;/a&gt; (2 November).  A nice feature of linear models: partial responses,
partial residuals, and backfitting estimations.  Additive models: regression
curve is a sum of partial response functions; partial residuals and the
backfitting trick generalize.  Parametric and non-parametric rates of
convergence.  The curse of dimensionality for unstructured nonparametric
models.  Additive models as a compromise, introducing bias to reduce variance.
Example with the data from homework 6.
&lt;li&gt; &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/350/lectures/22/lecture-22.pdf&quot;&gt;Classification
and Regression Trees&lt;/a&gt; (4 and 6 November).  Prediction trees.  A
classification tree we can believe in.  Prediction trees combine simple local
models with recursive partitioning; adaptive nearest neighbors.  Regression
trees: example; a little math; pruning by cross-validation; more R mechanics.
Classification trees: basics; measuring error by mis-classification; weighted
errors; likelihood; Neyman-Pearson classifiers.  Uncertainty for trees.
&lt;/ol&gt;

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_corrupting_the_young.html&quot;&gt;Corrupting the Young&lt;/a&gt;;
&lt;a href=&quot;cat_enigmas_of_chance.html&quot;&gt;Enigmas of Chance&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>Blosxom Fading in November</title>
    <link>http://bactra.org/weblog/629.html</link>
    <description>
&lt;P&gt;My old &lt;a href=&quot;http://blosxom.com/&quot;&gt;Blosxom&lt;/a&gt; installation (v. 2.0.2),
after several years of working nicely, is growing increasingly cranky, and
mulishly refusing to generate or update posts as the whim takes it.  (I am not
sure how much kicking and shoving it will need to produce this.)  I'd
appreciate a pointer to something which works similarly, but &lt;em&gt;does&lt;/em&gt;
work: I write posts in plain HTML in Emacs and drop them in a directory; it
makes them look nice.  If it handles tags and/or LaTeX nicely, so much the
better.

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_selfcentered.html&quot;&gt;Self-Centered&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>Books to Read While the Algae Grow in Your Fur, October 2009</title>
    <link>http://bactra.org/weblog/algae-2009-10.html</link>
    <description>
&lt;dl&gt;
&lt;dt&gt;Rosemary Kirstein, &lt;cite&gt;&lt;a href=&quot;http://www.powells.com/partner/27627/biblio/0345462297&quot; name=&quot;lost-steersman&quot;&gt;The Lost Steersman&lt;/a&gt;&lt;/cite&gt;&lt;/dt&gt;
&lt;dd&gt;Sequel to &lt;cite&gt;Steerswoman's Road&lt;/cite&gt; (below); excellent and perfectly
continuous, despite a long gap in the writing.  The trick of celebrating
intelligence while maintaining the tone and color of a good fantasy novel is
not something I have encountered elsewhere, and find deeply addictive.&lt;/dd&gt;
&lt;dd&gt;Everything else I have to say is a spoiler: &lt;font color=&quot;#ffffff&quot;&gt;This owes
a massive debt to Lovecraft's &lt;cite&gt;At the Mountains of Madness&lt;/cite&gt;.  The
plot-hinge mystery here has to do with &quot;demons&quot;, amphibious barrel-shaped
creatures with quadrilateral symmetry, very like (though not exactly the same
as) Lovecraft's Antarctic Old Ones.  There are scenes of dissecting demons
under the impression that they are just animals, and realizing they belong to
some radically different division of life than familiar terrestrial organisms;
an exploring expedition to an unknown part of the world where the demons are
found; explorations of demons' cities and observations of their customs,
including subterranean chambers used for their rituals, etc.; and the dawning
realization that the creatures are in fact sapient.  (HPL: &quot;Radiates,
vegetables, monstrosities, star spawn &amp;mdash; whatever they had been, they were
men!&quot; RK does not put such florid outbursts in her characters' mouths; she just
has Rowan come to see that the demons are &quot;people&quot;.) Kirstein does a better
job, in my view, of making the creatures actually alien, in particular starting
from giving them a &lt;em&gt;very&lt;/em&gt; inhuman sensorium (continual sonar, without
any vision) and means of communication (excreting specially shaped lumps of
organic material, reminiscent of the pieces of carved soapstone Lovecraft
associated with his Old Ones), and building out logically from there.  Needless
to say, this complicates the ethics of terraforming the steerswomen's planet
considerably. &amp;mdash; Janus's speeches (pp. 43--44 and p. 356) about the
dangers of learning too much about the world also seems drawn from Lovecraft,
though they bring to mind the opening of &quot;The Call of Cthulhu&quot; more
than &lt;cite&gt;Mountains&lt;/cite&gt;.&lt;/font&gt;&lt;/dd&gt;
&lt;dt&gt;Jeffrey
D. Hart, &lt;cite&gt;&lt;a href=&quot;http://www.powells.com/partner/27627/biblio/0387949801&quot;
name=&quot;hart-on-smoothing&quot;&gt;Nonparametric Smoothing and Lack-of-Fit
Tests&lt;/a&gt;&lt;/cite&gt;&lt;/dt&gt;
&lt;dd&gt;A sound, friendly but reasonably theoretical introduction to nonparametric
regression, giving about equal attention to kernel-based methods and to series
expansions (Fourier series, orthogonal polynomials, etc.).  The first half of
the book, through ch. 4, introduces these methods, considers their ability to
predict new data (emphasizing, naturally, the bias-variance trade-off), and
looks at methods for selecting how much smoothing to do based on the data being
smoothed, with a fondness for leave-one-out cross-validation and its variants.
(I can't recall if &lt;em&gt;k&lt;/em&gt;-fold CV is even mentioned.)  The second half is
about testing parametric regression specifications.  Chapter 5 reviews some
classical tests for fully-specified and especially for linear-in-the-parameters
parametric models, including Neyman's smooth tests: the latter involve, roughly
speaking, fitting an orthogonal series to the deviations from the null model,
and checking that all the coefficients are small, and so form a bridge to the
smoothing-based tests used in the rest of the book.  Basically, one can either
smooth the parametric residuals, which should have mean zero and constant
variance under the null hypothesis, or compare the parametric estimate to the
nonparametric smooth.  Hart prefers the former approach, and develops tests for
regression functions being constants in chapter 7, which in chapter 8 are
turned into tests for departures from arbitrary parametric regression models.
The distribution of these test statistics is too complicated for anything
except bootstrapping, which needs to be done carefully to preserve power.  To
simplify the math, up to this point Hart assumes that the input variable takes
values at a deterministic set of points on the unit interval (&quot;fixed-design
univariate regression&quot;); chapter 9 generalizes to random-design and
multivariate regressions, as well as lifting some other restrictions.  Chapter
10 contains some case studies of real data.&lt;/dd&gt;
&lt;dd&gt;This book should be accessible to anyone who understands parametric
inference at the level of,
say, &lt;cite&gt;&lt;a href=&quot;http://www.powells.com/cgi-bin/partner?partner_id=27627&amp;cgi=product&amp;isbn=0387251456&quot;&gt;All
of Statistics&lt;/a&gt;&lt;/cite&gt;; no prior exposure to smoothing methods is really
needed.  The series-expansion methods will probably go down more easily with
some priori exposure to Fourier analysis.  People who are serious about using
parametric regression models in the real world (&lt;em&gt;cough&lt;/em&gt;
econometricians &lt;em&gt;cough&lt;/em&gt;) owe it to themselves to test them with these
methods.&lt;/dd&gt;
&lt;dt&gt;Rosemary Kirstein, &lt;cite&gt;&lt;a href=&quot;http://www.powells.com/partner/27627/biblio/0345461053&quot; name=&quot;steerswomans-road&quot;&gt;Steerswoman's Road&lt;/a&gt;&lt;/cite&gt;&lt;/dt&gt;
&lt;dd&gt;First two books in an epic fantasy series about the scientific method,
reprinted in one volume.  There are more books, which I now covet powerfully,
but the series is &lt;em&gt;not finished&lt;/em&gt;.&lt;/dd&gt;
&lt;dd&gt;Spoilers: &lt;font color=&quot;#ffffff&quot;&gt;&quot;Epic fantasy&quot; here is, I am pretty sure,
totally misleading.  Initially, and from most of the &lt;em&gt;characters'&lt;/em&gt;
perspectives, the world looks like a bog-standard medieval fantasyland, only
with the addition of an itinerant semi-monastic order of geographers and
natural philosophers, the eponymous steerswomen.  By the end of this volume,
however, I am pretty sure that the setting is actually another planet in this
universe, with no magic at all.  The steerswomens' world is being terraformed;
the Guidestars are satellites in geosynchronous orbit.  The native ecology
(based on &quot;blackgrass&quot; and &quot;redgrass&quot;) is being systematically destroyed (by
microwave heating from the orbiting Guidestars (&quot;the spell Routine Bioform
Clearance&quot;), and by the Outskirters' goats [which may be genetically
modified?]) and replaced by terrestrial flora (&quot;greengrass&quot;), microbes and
fauna.  Wizards are simply the inhabitants of the planet who retain the old
technology, such as electricity and explosives.  &lt;em&gt;Why&lt;/em&gt; most of the
colonists have regressed to medieval technology, and why, having done so, they
have an institution like the steerswomen, I couldn't tell you.  (I can tell you
that sailors and steerswomen are immune to some &quot;curses&quot; because they wear
rubber-soled, i.e. electrically insulating, boots.)  But I am dying to find
out.&lt;/font&gt;&lt;/dd&gt;
&lt;dt&gt;J. K. Ghosh and R. V. Ramamoorthi, &lt;cite&gt;&lt;a href=&quot;http://www.powells.com/partner/27627/biblio/0387955372&quot; name=&quot;ghosh-ramamoorthi&quot;&gt;Bayesian Nonparametrics&lt;/a&gt;&lt;/cite&gt;&lt;/dt&gt;
&lt;dd&gt;I have written extensively about the general subject of Bayesian
nonparametrics and especially of its consistency elsewhere
(&lt;a href=&quot;../notebooks/bayesian-consistency.html&quot;&gt;here&lt;/a&gt;, &lt;a href=&quot;601.html&quot;&gt;here&lt;/a&gt;,
or, indeed, &lt;a href=&quot;http://arxiv.org/abs/0901.1342&quot;&gt;here&lt;/a&gt;), so I'll just
plunge in.  This 2003 monograph is the best overview of Bayesian nonparametrics
from the viewpoint of theoretical statistics which I've found, though there has
been a great deal of work since it was written, and I know that a number of new
books are coming out soon.&lt;/dd&gt;
&lt;dd&gt;The author begin (ch. 1) by reviewing* results on the consistency of
Bayesian learning on finite sample spaces and Dirichlet prior distributions.
They then carefully (ch. 2) consider the measure-theoretic issues involved in
constructing prior probability distributions over infinite-dimensional spaces,
especially priors over all probability measures (or all probability densities)
on the real line.  Chapter 3 describes in detail the properties of
Dirichlet-process priors and of Polya tree priors.  Chapter 4 is concerned with
consistency for Bayesian updating with IID data, emphasizing the
&quot;Kullback-Leibler property&quot; (the prior must put sufficient weight on
distributions with small relative entropy from the truth) and the
exponentially-consistent-testing conditions which go back to Schwartz.  Chapter
5 specializes to inferring probability densities; this is the only place they
use Gausian process priors.  Chapter 6 considers inferring the location
parameter of distributions of unknown shape, and outlines (without full detail)
the notorious examples, due to Freedman and Diaconis, of how Bayesian learning
can fail to be consistent.  Chapter 7 considers linear regression with an
unknown noise distribution; this is the only departure from assuming IID data
made here.  The remaining chapters try to construct uniform distributions on
infinite-dimensional spaces, look at some issues in survival analysis, and
technical aspects of &quot;neutral to the right&quot; priors, ones whose cumulative
hazard functions have independent increments.  It is assumed throughout that
the true, data-generating distribution lies within the support of the
prior.&lt;/dd&gt;
&lt;dd&gt;Ghosh and Ramamoorthi focus on mathematical issues, to the exclusion of
computational and statistical considerations.  (There are no applications to
data, or even to elaborate simulations.) The writing is adequate for a work of
&quot;theorem-proof, theorem-proof&quot; math, but no more.  Those proofs, however, are
really clear and clean, without tricks or complications.  I recommend the book
for those who want to understand, in depth, the technicalities of constructing
priors on infinite-dimensional spaces, and of establishing their consistency
when updated with IID data.  There are a handful of exercises at the end of the
book, but I do not think it would be suitable as a classroom textbook.  It
could work as the first part of an advanced graduate seminar, or for self-study
for motivated and mathematically mature readers.&lt;/dd&gt;
&lt;dd&gt;&lt;span class=&quot;blognotes&quot;&gt;*: Actually, it's my impression that lots of
introductions to Bayesian statistics, even at the graduate level,
do &lt;em&gt;not&lt;/em&gt; cover these results.  This is, I think, something of a scandal
for the profession.  That goes double if it's due to the attitude which Ghosh
and Ramamoorthi (p. 122) paraphrase as &quot;the prior and the posterior given by
Bayes theorem [sic] are imperatives arising out of axioms of rational behavior
--- and since we are already rational why worry about one more&quot; criterion,
namely convergence to the truth.  One does indeed find pernicious relativism
and epistemic nihilism everywhere these days!&lt;/span&gt;&lt;/dd&gt;
&lt;dt&gt;&lt;a href=&quot;http://www.nadiagordon.com/&quot;&gt;Nadia
Gordon&lt;/a&gt;, &lt;cite&gt;&lt;a href=&quot;http://www.powells.com/partner/27627/biblio/9780811858014&quot; name=&quot;lethal-vintage&quot;&gt;Lethal
Vintage&lt;/a&gt;&lt;/cite&gt;&lt;/dt&gt;
&lt;dd&gt;Continuing amateur-sleuthing adventures of a Napa Valley restaurant-owner
and her foodie (and wine-y) friends.  No prior acquaintance with the series [&lt;a href=&quot;algae-2008-11.html#sharpshooter&quot;&gt;1&lt;/a&gt;, &lt;a href=&quot;algae-2005-09.html&quot;&gt;2&lt;/a&gt;, &lt;a href=&quot;algae-2009-08.html#murder-alfresco&quot;&gt;3&lt;/a&gt;] needed.&lt;/dd&gt;
&lt;dt&gt;&lt;a href=&quot;http://www.larrygonick.com/&quot;&gt;Larry
Gonick&lt;/a&gt;, &lt;cite&gt;&lt;a href=&quot;http://www.powells.com/partner/27627/biblio/9780060760083&quot;&gt;The
Cartoon History of the Modern World, Part II: From the Bastille to
Baghdad&lt;/a&gt;&lt;/cite&gt;&lt;/dT&gt;
&lt;dd&gt;My parents got the first part of the &lt;cite&gt;Cartoon History of the
Universe&lt;/cite&gt; (in its original, shorter edition) for my brother and I in
1981, when I was seven.  We loved it so much they ended up having to get
us &lt;em&gt;two&lt;/em&gt; copies.  I have thus been reading the &lt;cite&gt;History&lt;/cite&gt;, as
it came out, all my conscious life.  (And re-reading it, without any
visitations from
the &lt;a href=&quot;http://www.tor.com/index.php?option=com_content&amp;view=blog&amp;id=57633&quot;&gt;Suck
Fairy&lt;/a&gt; so far.)  This latest volume is, as always a delight, but not a pure
one, because it's also the last.  I can understand wanting to be finished with
the work of a lifetime, especially one which in the nature of things could be
spun out indefinitely; but I can't help wishing for more.&lt;/dd&gt;
&lt;dt&gt;G. Willow Wilson and M. K. Perker, &lt;cite&gt;Air&lt;/cite&gt;,
vol. 2: &lt;cite&gt;&lt;a href=&quot;http://www.powells.com/partner/27627/biblio/9781401224837&quot; name=&quot;flying-machine&quot;&gt;Flying Machine&lt;/a&gt;&lt;/cite&gt;&lt;/dt&gt;
&lt;dt&gt;James Sethna, &lt;cite&gt;&lt;a href=&quot;http://www.powells.com/partner/27627/biblio/978-0198566779&quot; name=&quot;sethna&quot;&gt;Statistical Mechanics: Entropy, Order Parameters, and
Complexity&lt;/a&gt;&lt;/cite&gt;&lt;/dt&gt;
&lt;dd&gt;The best introductory statistical mechanics book I have ever seen.
(Meaning: advanced undergraduates, not the graduate level of Landau and
Lifshitz.)  The reader is supposed to have some familiarity with classical and
quantum mechanics, a little electromagnetism, and the very barest rudiments of
thermodynamics, the latter not going beyond what's in a good first-year physics
course.  Beyond the basics of differential equations and linear algebra, the
only real pieces of math used here are Fourier transforms and elementary
probability (such as one sees in undergraduate quantum mechanics).  On this
basis Sethna erects classical (and, in one chapter, quantum) statistical
mechanics, emphasizing the modern applications of the theory and physical
intuition.&lt;/dd&gt;
&lt;dd&gt;The exposition begins with random walks, including diffusion and the
central limit theorem.  The micro-canonical ensemble comes next, along with a
very nice chapter on its &lt;a href=&quot;../notebooks/ergodic-theory.html&quot;&gt;ergodic&lt;/a&gt;
basis and failures of ergodicity (such
as &lt;a href=&quot;http://en.wikipedia.org/wiki/Kolmogorov-Arnold-Moser_theorem&quot;&gt;KAM
theory&lt;/a&gt;).  The other ensembles are derived from imposing the micro-canonical
ensemble on the whole system, and looking at marginal distribution of
sub-systems. The elaborate axiomatic structure of pure thermodynamics is
touched on only briefly; thermodynamic quantities are seen, quite properly, as
derivative of statistical-mechanical ones.  The question of what macroscopic
variables need to be included in the free energy leads naturally to a superb
chapter on the meaning and identification of order parameters.  This in turn is
followed by a really lucid treatment of the connections between spontaneous
fluctuations, the decay correlations, response to external forces, and the
dissipative approach to equilibrium.  The whole is capped off by chapters on
abrupt (e.g., ice-water, water-steam) and continuous (e.g., magnetic) phase
transitions, including a nice hand-waving discussion of the renormalization
group.  In addition to the main thread of exposition, each chapter has a large
collection of problems, ranging from mathematical proofs through calculations
to simulation challenges, which contain a &lt;em&gt;lot&lt;/em&gt; of neat applications and
special topics, and should at least be read if not attempted.&lt;/dd&gt;
&lt;dd&gt;There are a few places where I would quibble &amp;mdash;
per &lt;a href=&quot;http://arxiv.org/abs/math-ph/0010018&quot;&gt;Lebowitz&lt;/a&gt;, surely the
Boltzmann entropy is more useful out of equilibrium than the Gibbs?; couldn't
he have been more explicit about
the &lt;a href=&quot;http://arxiv.org/abs/cond-mat/0009219&quot;&gt;probabilistic foundations
of renormalization&lt;/a&gt;? &amp;mdash; but mostly I just wish this book had been
written sixteen years ago when I was taking stat. mech.&lt;/dd&gt;
&lt;dd&gt;&lt;em&gt;Disclaimer:&lt;/em&gt; Friends of mine used to work for Sethna, and he's
lectured at the SFI summer school (the chapter on order parameters began as a
lecture there in 1991), but I've never met him, and have no stake in the
success of the book.&lt;/dd&gt;
&lt;dd&gt;&lt;em&gt;Update:&lt;/em&gt; Thanks to &lt;a href=&quot;http://nanopolitan.blogspot.com/&quot;&gt;T.
A. Abinandanan&lt;/a&gt; for alerting me to the fact that there's
a &lt;a href=&quot;http://pages.physics.cornell.edu/sethna/StatMech/&quot;&gt;free PDF&lt;/a&gt; of
the whole book!&lt;/dd&gt;
&lt;dt&gt;&lt;a href=&quot;http://ancestralstars.com/&quot;&gt;Laura E. Reeve&lt;/a&gt;, &lt;cite&gt;&lt;a href=&quot;http://www.powells.com/partner/27627/biblio/9780451462985&quot; name=&quot;vigilante&quot;&gt;Vigilante&lt;/a&gt;&lt;/cite&gt;&lt;/dt&gt;
&lt;dd&gt;Sequel
to &lt;cite&gt;&lt;a href=&quot;algae-2009-06.html#peacekeeper&quot;&gt;Peacekeeper&lt;/a&gt;&lt;/cite&gt;, with
an even more awful and totally-misleading cover.  (The synposes at the link are
accurate, however.)  Tasty mind-candy.&lt;/dd&gt;
&lt;dt&gt;C. L. Anderson, &lt;cite&gt;&lt;a href=&quot;http://www.powells.com/partner/27627/biblio/9780553592177&quot; name=&quot;bitter-angels&quot;&gt;Bitter Angels&lt;/a&gt;&lt;/cite&gt;&lt;/dt&gt;
&lt;dd&gt;Space opera about active struggles to prevent war, and other
morally-compromising endeavors; military science fiction that lets me respect
myself in the morning.  The climax, where it becomes clear what is going on,
and why and how, and what the peace-keepers will do about it, with what
consequences, was very fine indeed.  Picked up after reading the author's
&lt;a href=&quot;http://whatever.scalzi.com/2009/08/25/the-big-idea-c-l-anderson/&quot;&gt;self-advertisement&lt;/a&gt;
on Scalzi's blog, which has more.&lt;/dd&gt;
&lt;dt&gt;&lt;a href=&quot;http://madrobins.livejournal.com/&quot;&gt;Madeleine
E. Robins&lt;/a&gt;, &lt;cite&gt;&lt;a href=&quot;http://www.powells.com/partner/27627/biblio/9780765343062&quot; name=&quot;petty-treason&quot;&gt;Petty Treason&lt;/a&gt;&lt;/cite&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;a href=&quot;algae-2009-09.html#point-of-honour&quot;&gt;More&lt;/a&gt;
alternate-history Regency England private-eye detection (romance-free this
time).  Very enjoyable; I wish there were more.&lt;/dd&gt;
&lt;/dl&gt;

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;http://bactra.org/weblog/cat_algae.html&quot;&gt;Books to Read While the
Algae Grow in Your Fur&lt;/a&gt;;
&lt;a href=&quot;http://bactra.org/weblog/cat_enigmas_of_chance.html&quot;&gt;Enigmas of Chance&lt;/a&gt;;
&lt;a href=&quot;http://bactra.org/weblog/cat_scientifiction.html&quot;&gt;Scientifiction and Fantastica&lt;/a&gt;;
&lt;a href=&quot;http://bactra.org/weblog/cat_detection.html&quot;&gt;Pleasures of Detection, Portraits of Crime&lt;/a&gt;;
&lt;a href=&quot;http://bactra.org/weblog/cat_writing_for_antiquity.html&quot;&gt;Writing for Antiquity&lt;/a&gt;;
&lt;a href=&quot;http://bactra.org/weblog/cat_the_great_transformation.html&quot;&gt;The Great Transformation&lt;/a&gt;;
&lt;a href=&quot;http://bactra.org/weblog/cat_physics.html&quot;&gt;Physics&lt;/a&gt;;
&lt;a href=&quot;http://bactra.org/weblog/cat_complexity.html&quot;&gt;Complexity&lt;/a&gt;;
&lt;a href=&quot;http://bactra.org/weblog/cat_cthulhiana.html&quot;&gt;Cthulhiana&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>The Professions Considered as Pitchers of Icy Refreshing Lemonade</title>
    <link>http://bactra.org/weblog/628.html</link>
    <description>
&lt;blockquote&gt;&lt;em&gt;Attention conservation notice:&lt;/em&gt; Idle economic musings of a
non-economist.  Sparked
by &lt;a href=&quot;http://nobelprize.org/nobel_prizes/economics/laureates/2009/&quot;&gt;recent
developments&lt;/a&gt;, but if you're interested in that you'd be better off 
&lt;a href=&quot;http://crookedtimber.org/2009/10/12/the-ostrom-nobel/&quot;&gt;elsewhere&lt;/a&gt;.&lt;/blockquote&gt;

&lt;P&gt;The usual libertarian story about professional licensing requirements
&amp;mdash; e.g., requiring someone who wants to practice medicine to go to medical
school and pass exams, on pain of fines or jail &amp;mdash; is that these are
simply professionals conspiring in restraint of trade.  Licensing simply erects
a barrier to entry into the market for medical services, restricting supply and
driving up price.  Eliminate it, they say, and supply will expand and prices
fall.

&lt;P&gt;This presumes, however, that the demand for unlicensed professionals will be
equal to the demand for licensed ones.  It seems to me very easy to tell a
&quot;market for lemons&quot; story here: someone in the market for professional services
generally knows very little about how skilled various potential providers
actually are.  The sellers, however, generally know a lot about their own skill
level, or at least more than the potential clients do.  (There are no doubt
exceptions, such as sincere quacks and the
&lt;a href=&quot;http://www.ncbi.nlm.nih.gov/pubmed/10626367&quot;&gt;Dunning-Kreuger
effect&lt;/a&gt;, but I don't think matters for the story.)  This is the classic
&lt;a href=&quot;http://nobelprize.org/nobel_prizes/economics/laureates/2001/public.html&quot;&gt;asymmetric
information problem from Akerlof&lt;/a&gt;, with the usual result: the skilled
providers demand more, but the clients have no way of telling them from the
unskilled ones, so the only equilibrium is for only unskilled providers to be
on the market and for trade to be depressed, or indeed absent.  By putting a
floor on the incompetence of professionals, licensing requirements stop the
unraveling of the market and increase demand.  They get us out of the market
for lemons.

&lt;P&gt;This occurred to me the other day, but it's obvious enough that I'm sure
someone wrote it up long ago; where?  (And did I read it and forget about it?)

&lt;P&gt;(After-notes: 1. Of course, having told the story I have no idea if it's
true of actual markets for professional services; learning that would require
rather delicate empirical investigations.  Checking the restraint-of-trade
fable from Milton Friedman would, naturally, require those same investigations.
2. This doesn't rationalize why professions should be so largely
self-governing, nor does it rule out the idea that some licensing
requirements &lt;em&gt;are&lt;/em&gt; counter-productive barriers to entry.  3. Replacing
professional certification with some sort of market-based entity telling
consumers about the quality of professional service-sellers won't work, for all
the usual reasons that competitive markets are incapable of adequately
providing information &amp;mdash; to say nothing of the difficulty of telling
whether the raters know what they're talking about.  4. Universities are
accredited because students and parents would otherwise be in a market for
lemons.  Universities themselves, however, can tell how skilled those selling
academic services are &amp;mdash; or at least they're &lt;em&gt;supposed&lt;/em&gt; to have
that ability.  5. I should re-read Phil Agre
on &lt;a href=&quot;http://polaris.gseis.ucla.edu/pagre/notes/00-8-16.html&quot;&gt;the
professionalization of everything&lt;/a&gt; and see if it holds up.)

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_the_dismal_science.html&quot;&gt;The Dismal Science&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>Twilight of the Market Gods</title>
    <link>http://bactra.org/weblog/627.html</link>
    <description>
&lt;P&gt;My &lt;a href=&quot;http://www.americanscientist.org/bookshelf/pub/twilight-of-the-efficient-markets&quot;&gt;review&lt;/a&gt;
of &lt;a href=&quot;http://curiouscapitalist.blogs.time.com/&quot;&gt;Justin
Fox&lt;/a&gt;'s &lt;cite&gt;&lt;a href=&quot;http://www.powells.com/partner/27627/biblio/9780060598990&quot;&gt;Myth
of the Rational Market&lt;/a&gt;&lt;/citE&gt;
in &lt;cite&gt;&lt;a href=&quot;http://www.americanscientist.org/&quot;&gt;American
Scientist&lt;/a&gt;&lt;/cite&gt; is out.  (Shorter me: read the book.)  Sometime soon I'll
put up a version with links, which alas don't work in print.

&lt;P&gt;&lt;em&gt;Manual
trackback&lt;/em&gt;: &lt;a href=&quot;http://www.3quarksdaily.com/3quarksdaily/2009/10/the-theory-of-efficient-markets-in-finance-should-be-relegated-to-the-museum-of-nice-tries.html&quot;&gt;3
Quarks Daily&lt;/a&gt;

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_the_dismal_science.html&quot;&gt;The Dismal Science&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>Wit and Wisdom of Pittsburgh Bar Patrons (Part 1)</title>
    <link>http://bactra.org/weblog/626.html</link>
    <description>
&lt;P&gt;&quot;They [= the Steelers] are like this utterly adorable, totally hot girl next
door, who you suddenly realize is everything you've ever wanted in a football
team &amp;mdash; I mean, girlfriend.&quot;

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_pgh.html&quot;&gt;Heard About Pittsburgh PA&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>&quot;Completely Random Measures for Bayesian Nonparametrics&quot; (This Year at the DeGroot Lecture)</title>
    <link>http://bactra.org/weblog/625.html</link>
    <description>
&lt;blockquote&gt;&lt;em&gt;Attention conservation notice&lt;/em&gt;: Only of interest if you (1)
care about specifying probability distributions on infinite-dimensional spaces
for use in nonparametric Bayesian inference, and (2) are in
Pittsburgh.&lt;/blockquote&gt;

&lt;P&gt;The CMU statistics department sponsors an annual distinguished lecture
series in memory of our sainted
founder, &lt;a href=&quot;http://projecteuclid.org/euclid.ss/1177011925&quot;&gt;Morris
H. DeGroot&lt;/a&gt;.  This year, the lecturer
is &lt;a href=&quot;http://www.stat.berkeley.edu/~jordan/&quot;&gt;Michael Jordan&lt;/a&gt;.  (I
realize that's a common name; I mean the one my peers and I wanted to be when
we grew up.)

&lt;dl&gt;
&lt;dt&gt;&quot;Completely Random Measures for Bayesian Nonparametrics&quot;&lt;/dt&gt;
&lt;dd&gt;&lt;em&gt;Abstract:&lt;/em&gt; Bayesian nonparametric modeling and inference are based
on using general stochastic processes as prior distributions.  Despite the
great generality of this definition, the great majority of the work in Bayesian
nonparametrics is based on only two stochastic processes: the Gaussian process
and the Dirichlet process.  Motivated by the needs of applications, I present a
broader approach to Bayesian nonparametrics in which priors are obtained from a
class of stochastic processes known as &quot;completely random measures&quot; (Kingman,
1967). In particular I will present models based on the beta process and the
Bernoulli process, and will discuss an application of these models to the
analysis of motion capture data in computational vision.

&lt;dd&gt;(Joint work with Emily Fox, Erik Sudderth and Romain Thibaux.)&lt;/dd&gt;
&lt;dd&gt;&lt;em&gt;Time and place:&lt;/em&gt; 4:15 pm on Friday, 16 October 2009, in the Giant
Eagle Auditorium in Baker Hall (room A51)&lt;/dd&gt;
&lt;/dl&gt;

&lt;P&gt;&lt;strong&gt;Update&lt;/strong&gt;: I counted over 210 people in the audience.

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_enigmas_of_chance.html&quot;&gt;Enigmas of Chance&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>&quot;High Dimensional Nonlinear Learning using Local Coordinate Coding&quot; (Next Week at the Statistics Seminar)</title>
    <link>http://bactra.org/weblog/624.html</link>
    <description>
&lt;blockquote&gt;&lt;em&gt;Attention conservation notice&lt;/em&gt;: Only of interest if you (1)
care about statistical learning in high-dimensional spaces and (2) are in
Pittsburgh.&lt;/blockquote&gt;

&lt;P&gt;Since manifold learning has been on my mind this week, owing to trying to
teach it in &lt;a href=&quot;617.html&quot;&gt;data-mining&lt;/a&gt;, I am extra pleased by the
scheduling of this talk:

&lt;dl&gt;
&lt;dt&gt;&quot;High Dimensional Nonlinear Learning using Local Coordinate Coding&quot;&lt;/dt&gt;
&lt;dd&gt;Prof. &lt;a href=&quot;http://www.stat.rutgers.edu/~tzhang/&quot;&gt;Tong Zhang&lt;/a&gt;,
Rutgers University&lt;/dd&gt;
&lt;dd&gt;&lt;em&gt;Abstract:&lt;/em&gt; We present a new method for learning nonlinear functions
in high dimension using semisupervised learning. Our method includes a phase of
unsupervised basis learning and a phase of supervised function learning. The
learned bases provide a set of anchor points to form a local coordinate system,
such that each data point on a high dimensional manifold can be locally
approximated by a linear combination of its nearby anchor points, with the
linear weights offering its local-coordinate coding. We show that a high
dimensional nonlinear function can be approximated by a global linear function
with respect to this coding scheme, and the approximation quality is ensured by
the locality of such coding. The method turns a difficult nonlinear learning
problem into a simple global linear learning problem, which overcomes some
drawbacks of traditional local learning methods. The empirical success of our
approach has been demonstrated in a recent pascal image classification
competition, where the top performance was achieved by an NEC system using this
idea.&lt;/dd&gt;
&lt;dd&gt;(Joint work with Kai Yu at NEC Lab America.)&lt;/dd&gt;
&lt;dd&gt;&lt;em&gt;Time and place&lt;/em&gt;: 4 pm on Monday, 12 October 2009, in Doherty Hall
310&lt;/&lt;/dd&gt;
&lt;/dl&gt;

&lt;P&gt;As always, the seminar is free and open to the public.

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_enigmas_of_chance.html&quot;&gt;Enigmas of Chance&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>In re John Holland</title>
    <link>http://bactra.org/weblog/623.html</link>
    <description>
&lt;P&gt;Having &lt;a href=&quot;621.html&quot;&gt;vowed&lt;/a&gt; two weeks ago to post something positive
at least once a week, I missed last week, with the excuse of being back in Ann
Arbor for
the &lt;a href=&quot;http://cscs.umich.edu/events/talks-2009-F.html&quot;&gt;celebration of
John Holland's 80th birthday&lt;/a&gt; at
the &lt;a href=&quot;http://www.cscs.umich.edu/&quot;&gt;Center for the Study of Complex
Systems&lt;/a&gt;.  There was no time to post, or even to see everyone I wanted to,
but I did actually start writing something about Holland's scientific work,
only to realize yesterday I was merely engaged in self-plagiarism, from
&lt;a href=&quot;../notebooks/evol-comp.html&quot;&gt;this&lt;/a&gt;, &lt;a href=&quot;../reviews/hhnt-induction/&quot;&gt;this&lt;/a&gt;
and &lt;a href=&quot;../reviews/holland-on-emergence/&quot;&gt;this&lt;/a&gt;, and probably other
things I'd written too, because reading Holland has quite profoundly shaped my
thinking.  So I'll just point you to the back-catalogue, as it were, and get
back to revising &lt;a href=&quot;http://arxiv.org/abs/0901.1342&quot;&gt;a paper&lt;/a&gt; I'd never
have written if I hadn't read
&lt;citE&gt;Adaptation in Natural and Artificial Systems&lt;/cite&gt;.

&lt;P&gt;(So long as I'm talking about the workshop, and without any slight to the
other presentations, the &lt;em&gt;neatest&lt;/em&gt; work was that
by &lt;a href=&quot;http://www.cs.unm.edu/~forrest/&quot;&gt;Stephanie Forrest&lt;/a&gt; et al. on
using genetic programming
to &lt;a href=&quot;http://www.cs.unm.edu/~forrest/publications/gecco09.pdf&quot;&gt;evolve bug
fixes&lt;/a&gt;.)

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_complexity.html&quot;&gt;Complexity&lt;/a&gt;;
&lt;a href=&quot;cat_cognition.html&quot;&gt;Minds, Brains, and Neurons&lt;/a&gt;;
&lt;a href=&quot;cat_enigmas_of_chance.html&quot;&gt;Enigmas of Chance&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>&quot;Analyzing Networks and Learning with Graphs&quot;</title>
    <link>http://bactra.org/weblog/622.html</link>
    <description>
&lt;P&gt;See you in Whistler?

&lt;blockquote&gt;
&lt;h1&gt;&lt;a href=&quot;http://snap.stanford.edu/nipsgraphs2009/&quot;&gt;Analyzing Networks and Learning with Graphs&lt;/a&gt;&lt;/h1&gt;
&lt;br&gt;a workshop in conjunction with
&lt;h2&gt;&lt;a href=&quot;http://nips.cc/&quot;&gt;23nd Annual Conference on Neural Information Processing Systems (NIPS 2009)&lt;/a&gt;&lt;/h2&gt;
&lt;br&gt;December 11 or 12, 2009 (exact date TBD) Whistler, BC, Canada

&lt;P&gt;Deadline for Submissions: Friday, October 30, 2009
&lt;br&gt;Notification of Decision: Monday, November 9, 2009


&lt;h4&gt;Overview:&lt;/h4&gt;

&lt;P&gt;Recent research in machine learning and statistics has seen the proliferation of computational methods for analyzing networks and learning with graphs. These methods support progress in many application areas, including the social sciences, biology, medicine, neuroscience, physics, finance, and economics.

&lt;P&gt;The primary goal of the workshop is to actively promote a concerted effort to address statistical, methodological and computational issues that arise when modeling and analyzing large collection of data that are largely represented as static and/or dynamic graphs. To this end, we aim at bringing together researchers from applied disciplines such as sociology, economics, medicine and biology, together with researchers from more theoretical disciplines such as mathematics and physics, within our community of statisticians and computer scientists. Different communities use diverse ideas and mathematical tools; our goal is to to foster cross-disciplinary collaborations and intellectual exchange.

&lt;P&gt;Presentations will include novel graph models, the application of established models to new domains, theoretical and computational issues, limitations of current graph methods and directions for future research.


&lt;h4&gt;Online Submissions&lt;/h4&gt;
We welcome the following types of papers:

&lt;ol&gt;
&lt;li&gt; Research papers that introduce new models or apply established models to novel domains,
&lt;li&gt; Research papers that explore theoretical and computational issues, or
&lt;li&gt; Position papers that discuss shortcomings and desiderata of current approaches, or propose new directions for future research.
&lt;/ol&gt;

All submissions will be peer-reviewed; exceptional work will be considered for oral presentation. We encourage authors to emphasize the role of learning and its relevance to the application domains at hand. In addition, we hope to identify current successes in the area, and will therefore consider papers that apply previously proposed models to novel domains and data sets.

&lt;P&gt;Submissions should be 4-to-8 pages long, and adhere to &lt;a href=&quot;http://nips.cc/PaperInformation/StyleFiles&quot;&gt;NIPS format&lt;/a&gt;. Please email your
submissions to: nipsgraphs2009 [at] gmail [dot] com

&lt;h4&gt;Workshop Format&lt;/h4&gt;

This is a one-day workshop. The program will feature invited talks, poster sessions, poster spotlights, and a panel discussion.  All submissions will be peer-reviewed; exceptional work will be considered for oral presentation. More details about the program will be announced soon.

&lt;h4&gt;Organizers&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://www.people.fas.harvard.edu/~airoldi/&quot;&gt;Edoardo Airoldi&lt;/a&gt;, Harvard University
&lt;li&gt;&lt;a href=&quot;http://www.cs.cornell.edu/home/kleinber/&quot;&gt;Jon Kleinberg&lt;/a&gt;, Cornell University
&lt;li&gt;&lt;a href=&quot;http://cs.stanford.edu/~jure/&quot;&gt;Jure Leskovec&lt;/a&gt;, Stanford University
&lt;li&gt;&lt;a href=&quot;http://web.mit.edu/cocosci/josh.html&quot;&gt;Josh Tenenbaum&lt;/a&gt;, MIT
&lt;/ul&gt;

&lt;/blockquote&gt;

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_networks.html&quot;&gt;Networks&lt;/a&gt;;
&lt;a href=&quot;cat_enigmas_of_chance.html&quot;&gt;Enigmas of Chance&lt;/a&gt;;
&lt;a href=&quot;cat_incestuous_amplification.html&quot;&gt;Incestuous Amplification&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  </channel>
</rss>
