<?xml version="1.0"?>
<!-- name="generator" content="blosxom/2.0.2" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
  <channel>
    <title>Three-Toed Sloth   </title>
    <link>http://bactra.org/weblog/</link>
    <description>Slow Takes from the Canopy (My Very Own Internet Tradition)</description>
    <language>en</language>

  <item>
    <title>Ten Years of Monster Raving Egomania and Utter Batshit Insanity</title>
    <link>http://bactra.org/weblog/915.html</link>
    <description>
&lt;P&gt;Sometimes, all you can do is quote verbatim* from your inbox:


&lt;blockquote&gt;&lt;pre&gt;
Date: Tue, 17 Apr 2012 09:31:57 -0400
From: Stephen Wolfram
To: Cosma Shalizi
Subject: 10-year followup on &quot;A New Kind of Science&quot;

Next month it'll be 10 years since I published &quot;A New Kind of Science&quot;
... and I'm planning to take stock of the decade of commentary, feedback and
follow-on work about the book that's appeared.

My archives show that you wrote an early review of the book:
http://www.cscs.umich.edu/~crshalizi/reviews/wolfram/

At the time reviews like yours appeared, most of the modern web apparatus
for response and public discussion had not yet developed.  But now it has,
and there seems to be considerable interest in the community in me using
that venue to give my responses and comments to early reviews.

I'm writing to ask if there's more you'd like to add before I embark on my
analysis in the next week or so.

I'd like to take this opportunity to thank you for the work you put into
writing a review of my book.  I know it was a challenge to review a book of
its size, especially quickly.  I plan to read all reviews with forbearance,
and hope that---especially leavened by the passage of a decade---useful
intellectual points can be derived from discussing them.

If you don't have anything to add to your early review, it'd be very helpful
to know that as soon as possible.

Thanks in advance for your help.

-- Stephen Wolfram

P.S. Nowadays you can find the whole book online at
http://www.wolframscience.com/nksonline/toc.html  If you'd like a new
physical copy, just let me know and I can have it shipped...


&lt;/pre&gt;&lt;/blockquote&gt;

&lt;P&gt;I wrote my &lt;a href=&quot;http://bactra.org/reviews/wolfram/&quot;&gt;my review&lt;/a&gt; in
2002 (though I didn't &lt;a href=&quot;387.html&quot;&gt;put it out until 2005&lt;/a&gt;).  The idea
that complex patterns can arise from simple rules was already old then, and has
only become more commonplace since.  A lot of interesting, substantive,
specific science has been done on that theme in the ensuing decade.  To this
effort, neither Wolfram nor his book have contributed anything of any note.
The one respect in which I was overly pessimistic is that I have not, in fact,
had to spend much time &quot;de-programming students [who] read &lt;cite&gt;A New Kind of
Science&lt;/cite&gt; before knowing any better&quot; &amp;mdash; but I get a rather different
class of students these days than I did in 2002.

&lt;P&gt;Otherwise, and for the record, I do indeed still stand behind the review.


&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt; *: I removed our e-mail addresses, because no one deserves spam.&lt;/span&gt;

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_selfcentered.html&quot;&gt;Self-Centered&lt;/a&gt;;
&lt;a href=&quot;cat_complexity.html&quot;&gt;Complexity&lt;/a&gt;;
&lt;a href=&quot;cat_psychoceramica.html&quot;&gt;Psychoceramica&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>Installing &lt;tt&gt;pcalg&lt;/tt&gt;</title>
    <link>http://bactra.org/weblog/914.html</link>
    <description>
&lt;blockquote&gt;&lt;em&gt;Attention conservation notice&lt;/em&gt;: Boring details about
getting finicky statistical software to work; or, please read the friendly
manual.&lt;/blockquote&gt;

&lt;P&gt;Some of my students are finding it difficult to install
the &lt;a href=&quot;ftp://ftp.math.ethz.ch/sfs/Manuscripts/buhlmann/pcalg-software.pdf&quot;&gt;R
package &lt;tt&gt;pcalg&lt;/tt&gt;&lt;/a&gt;; I share these instructions in case others are also
in difficulty.

&lt;ol&gt;
&lt;li&gt; For representing graphs, &lt;tt&gt;pcalg&lt;/tt&gt; relies on two packages
called &lt;tt&gt;&lt;a href=&quot;http://bioconductor.org/packages/release/bioc/html/RBGL.html&quot;&gt;RBGL&lt;/a&gt;&lt;/tt&gt;
and &lt;tt&gt;&lt;a href=&quot;http://bioconductor.org/packages/release/bioc/html/graph.html&quot;&gt;graph&lt;/a&gt;&lt;/tt&gt;.
These are not available on &lt;a href=&quot;http://cran.r-project.org/&quot;&gt;CRAN&lt;/a&gt;, but
rather are on the &lt;em&gt;other&lt;/em&gt; R software
repository, &lt;a href=&quot;http://bioconductor.org/&quot;&gt;BioConductor&lt;/a&gt;.  To install
them, follow the instructions at those links; to summarize, run this:

&lt;blockquote&gt;
&lt;tt&gt;source(&quot;http://bioconductor.org/biocLite.R&quot;)
&lt;br&gt;biocLite(&quot;RBGL&quot;)&lt;/tt&gt;
&lt;/blockquote&gt;

(Since &lt;tt&gt;RBGL&lt;/tt&gt; depends on &lt;tt&gt;graph&lt;/tt&gt;, this should automatically also
install &lt;tt&gt;graph&lt;/tt&gt;; if not, run &lt;tt&gt;biocLite(&quot;graph&quot;)&lt;/tt&gt;,
then &lt;tt&gt;biocLite(&quot;RBGL&quot;)&lt;/tt&gt;.)

&lt;li&gt;Now
install &lt;a href=&quot;http://cran.r-project.org/web/packages/pcalg/index.html&quot;&gt;&lt;tt&gt;pcalg&lt;/tt&gt;
from CRAN&lt;/a&gt;, along with the packages it depends on.  You will get a warning
about not having the &lt;tt&gt;Rgraphviz&lt;/tt&gt; package.  However, you will be able to
load &lt;tt&gt;pcalg&lt;/tt&gt; and run it.  You should be able to step through the example
labeled &quot;Using Gaussian Data&quot; at the end of &lt;tt&gt;help(pc)&lt;/tt&gt;, though it will &lt;em&gt;not&lt;/em&gt; produce any plots.

&lt;P&gt;You can still extract the graph by hand from the fitted models returned by
functions like &lt;tt&gt;pc&lt;/tt&gt; --- if one of those objects is &lt;tt&gt;fit&lt;/tt&gt;,
then &lt;tt&gt;fit@graph@edgeL&lt;/tt&gt; is a list of lists, where each node has its
own list, naming the other nodes it has arrows to (not from).  If you are doing
this for the final in ADA, you don't actually &lt;em&gt;need&lt;/em&gt; anything beyond
this to do the assignment, as explained in question A1a.

&lt;li&gt;&lt;tt&gt;&lt;a href=&quot;http://www.bioconductor.org/packages/release/bioc/html/Rgraphviz.html&quot;&gt;Rgraphviz&lt;/a&gt;&lt;/tt&gt;
is what &lt;tt&gt;pcalg&lt;/tt&gt; relies on for drawing pictures of causal graphs.  Its installation is somewhat tricky, so there is a &lt;a href=&quot;http://www.bioconductor.org/packages/release/bioc/readmes/Rgraphviz/README&quot;&gt;README
file&lt;/a&gt;, which you should read.
&lt;br&gt;The key point is that &lt;tt&gt;Rgraphviz&lt;/tt&gt; itself relies on a &lt;em&gt;non-R&lt;/em&gt;
suite of programs called &lt;tt&gt;graphviz&lt;/tt&gt;.  You will want to install these.
Go to &lt;a href=&quot;http://www.graphviz.org/&quot;&gt;&lt;tt&gt;graphviz.org&lt;/tt&gt;&lt;/a&gt;, and
download and install the software.  (If you use a Mac, the &lt;a href=&quot;http://www.graphviz.org/Download_macos.php&quot;&gt;standard download&lt;/a&gt; also includes
&lt;tt&gt;Graphviz.app&lt;/tt&gt;, which is a nice visual interface to the actual
graph-drawing functions, and what I use for drawing the DAGs in the lecture
notes.)

&lt;li&gt; You have to make sure that your operating system will let other software
(like R) call on &lt;tt&gt;graphviz&lt;/tt&gt;.  The way to do this is to add the directory
(or folder) where you installed &lt;tt&gt;graphviz&lt;/tt&gt; to the list of places your
computer recognizes as containing executable programs --- the system's &quot;command
path&quot;.  The README for installing &lt;tt&gt;Rgraphviz&lt;/tt&gt; explains what you have to
add to the path.  (If you are a Windows user and do not know how to alter the
command path, &lt;a href=&quot;http://www.computerhope.com/issues/ch000549.htm&quot;&gt;read
this&lt;/a&gt;.)

&lt;li&gt; If you have R open, close it.  (If you do not, it will probably not know
about the new software you've just gotten the system to recognize.)  Re-open R,
and install &lt;tt&gt;Rgraphviz&lt;/tt&gt;.  The basic installation command is just
&lt;blockquote&gt;

&lt;tt&gt;source(&quot;http://bioconductor.org/biocLite.R&quot;)
&lt;br&gt;biocLite(&quot;Rgraphviz&quot;)&lt;/tt&gt;
&lt;/blockquote&gt;

The README for &lt;tt&gt;Rgraphviz&lt;/tt&gt; gives some checks which you should be able to
run if everything is working; try them.

&lt;li&gt; You should now be able to generate pictures of DAGs with &lt;tt&gt;pc&lt;/tt&gt; and
the other functions in &lt;tt&gt;pcalg&lt;/tt&gt;; try stepping through all the examples at
the end of &lt;tt&gt;help(pc)&lt;/tt&gt;.
&lt;/ol&gt;

&lt;P&gt;When I installed &lt;tt&gt;pcalg&lt;/tt&gt; on my laptop two weeks ago, it was painless,
because (1) I already had &lt;tt&gt;graphviz&lt;/tt&gt;, and (2) I knew about BioConductor.
(In fact, the R graphical interface on the Mac will switch between installing
packages from CRAN and from BioConductor.)  To check these instructions, I just
now deleted all the packages from my computer and re-installed them, and
everything worked; elapsed time, ten minutes, mostly downloading.

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_36-402.html&quot;&gt;Advanced Data Analysis from an Elementary Point of View&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>Final Exam (Advanced Data Analysis from an Elementary Point of View)</title>
    <link>http://bactra.org/weblog/913.html</link>
    <description>
&lt;P&gt;In which we are devoted to two problems of political economy, viz., strikes,
and macroeconomic forecasting.

&lt;P&gt;&lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/uADA/12/exams/3/exam-3.pdf&quot;&gt;Assignment&lt;/a&gt;; &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/uADA/12/exams/3/macro.csv&quot;&gt;macro.csv&lt;/a&gt;

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_36-402.html&quot;&gt;Advanced Data Analysis from an Elementary Point of View&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>Time Series I (Advanced Data Analysis from an Elementary Point of View)</title>
    <link>http://bactra.org/weblog/912.html</link>
    <description>
What time series are.  Properties: autocorrelation or serial correlation; other
notions of serial dependence; strong and weak stationarity.  The correlation
time and the world's simplest ergodic theorem; effective sample size.  The
meaning of ergodicity: a single increasing long time series becomes
representative of the whole process.  Conditional probability estimates; Markov
models; the meaning of the Markov property.  Autoregressive models, especially
additive autoregressions; conditional variance estimates.  Bootstrapping time
series.  Trends and de-trending.

&lt;dd&gt;&lt;em&gt;Reading&lt;/em&gt;: Notes, &lt;A href=&quot;http://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch26.pdf&quot;&gt;chapter 26&lt;/a&gt;;
&lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/time-series.R&quot;&gt;R
for
examples&lt;/a&gt;; &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/gdp-pc.csv&quot;&gt;&lt;tt&gt;gdp-pc.csv&lt;/tt&gt;&lt;/a&gt;

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_36-402.html&quot;&gt;Advanced Data Analysis from an Elementary Point of View&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>Books to Read While the Algae Grow in Your Fur, April 2012</title>
    <link>http://bactra.org/weblog/algae-2012-04.html</link>
    <description>
&lt;P&gt;&lt;em&gt;Attention conservation notice&lt;/em&gt;: I have no taste.

&lt;dl&gt;
&lt;dt&gt;&lt;a href=&quot;http://www.bl.uk/researchregister/1.10/?app_cd=RR&amp;page_cd=RESEARCHER&amp;l_researcher_id=34&quot;&gt;Susan Whitfield&lt;/a&gt;, &lt;cite&gt;&lt;a href=&quot;http://www.powells.com/partner/35751/biblio/9780520232143&quot; name=&quot;silk-road&quot;&gt;Life along the Silk Road&lt;/a&gt;&lt;/cite&gt;&lt;/dt&gt;
&lt;dd&gt;Not-quite-historical fiction: life stories of sundry Silk Road characters
&amp;mdash; merchants, monks, soldiers, artists, ordinary widows &amp;mdash;
distributed from Samarkand to Chang-an, and from 700 to 900 AD.  These are all
more or less composites of actual people, glimpsed from the archaeological
record, and especially through the manuscripts
&lt;a href=&quot;../reviews/on-ancient-central-asian-tracks/&quot;&gt;preserved at Dunhuang&lt;/a&gt;
and saved/stolen by &lt;a href=&quot;../reviews/lives-of-aurel-stein/&quot;&gt;Aurel Stein&lt;/a&gt;.
(In fact the whole book owes a great deal to Stein, with a lot of input from
&lt;a href=&quot;http://dannyreviews.com/h/Tibetan_Empire.html&quot;&gt;Beckwith's &lt;citE&gt;The Tibetan Empire in Central Asia&lt;/cite&gt;&lt;/a&gt;.)  The lack of
references makes it hard to know how much is stitched together from sources and
how much is Whitfield's invention, but at the very least it's well-told.

&lt;dt&gt;&lt;a href=&quot;http://www.sabrepunk.com/&quot;&gt;Nathan
Long&lt;/a&gt;, &lt;cite&gt;&lt;a href=&quot;http://www.powells.com/partner/35751/biblio/9781597803960&quot;
name=&quot;waar&quot;&gt;Jane Carver of Waar&lt;/a&gt;&lt;/cite&gt;&lt;/dt&gt;
&lt;dd&gt;Mind candy.  This is at once a parody of, and homage to, Barsoom.  Unlike
Burroughs, Long's book can be enjoyed after the Golden Age of Science Fiction
(i.e., by those over the age of sixteen): his characters are all &lt;em&gt;at
least&lt;/em&gt; two-dimensional (Jane herself is an engaging narrator, though
definitely at the Hill end of
the &lt;a href=&quot;http://pulllist.comixology.com/articles/167/Moby-vs-Hill&quot;&gt;Moby-Hill
spectrum&lt;/a&gt;), his style is decent, and the plot is actually interesting.  I
think it would be enjoyable even if you &lt;em&gt;hadn't&lt;/em&gt; dosed up on planetary
romances as a kid.&lt;/dd&gt;


&lt;dt&gt;&lt;a href=&quot;http://the-expanse.com/&quot;&gt;James S. A. Corey&lt;/a&gt; (i.e., Daniel Abraham and Ty Franck), &lt;cite&gt;&lt;a href=&quot;http://www.powells.com/partner/35751/biblio/9780316129084&quot; name=&quot;leviathan-wakes&quot;&gt;Leviathan Wakes&lt;/a&gt;&lt;/cite&gt;&lt;/dt&gt;
&lt;dd&gt;Mind candy.  Space opera, confined to the solar system a few centuries
hence.  This has gotten a lot of favorable attention, but I found it merely OK;
perhaps I'd have enjoyed it more if my expectations had been lower.  It's split
between two plot lines, with two point-of-view characters; I enjoyed (but
wasn't blown away by) one of them, but found the other both over-predictable
and irritating.  It does some things well (a reasonably-sized solar system!
minimal handwavium! a
non-&lt;a href=&quot;http://www.jwz.org/blog/2005/09/the-grim-meathook-future/&quot;&gt;grim-meathook-future&lt;/a&gt;
future! some decent characterization!), but it never really managed to grab me.
It's definitely nowhere near as good as
say, &lt;a href=&quot;algae-2010-11.html#quiet-war&quot;&gt;McAuley's &lt;cite&gt;The Quiet
War&lt;/cite&gt;&lt;/a&gt;, to name a recent and thematically-similar book.  The sequel
will be out soon, and seems like it will be continuing along the better of the
two narrative threads here, so I might pick it up, but I won't rush to do
so.&lt;/dd&gt;
&lt;dd&gt;Spoiler-laden griping: Bar bs gur gjb cybg yvarf vf n uneq-obvyrq vairfgvtngvba, pbzcyrgr jvgu na nypbubyvp zvqqyr-ntrq qrgrpgvir, pbeehcg vagevthrf, naq n zlfgrevbhf qnzr jub gur qrgrpgvir snyyf va ybir jvgu. V qba'g yvxr gur uneq-obvyrq traer, orpnhfr, juvyr V nz irel fragvzragny, vgf cnegvphyne pbzovangvba bs fragvzragnyvgl naq plavpvfz vf bss-chggvat. Fb onfvpnyyl V jnagrq gb fxvc nyy gur puncgref sebz Zvyyre'f cbvag bs ivrj, naq whfg sbyybj gubfr jvgu Ubyqra naq uvf perj. Yrff crefbanyyl (v.r., nf n engvbanyvmngvba), abve cerfhccbfrf fhpu n irel cnegvphyne, uvfgbevpnyyl-yvzvgrq phygheny frggvat gung frrvat vg fvzcyl qhzcrq vagb jung fubhyq or n enqvpnyyl arj xvaq bs fbpvrgl jnf wneevat. (Rirelguvat ba Prerf jbexf yvxr Puvpntb pvepn 1940 orpnhfr ubj ryfr?).&lt;/dd&gt;
&lt;dd&gt;Ba n qvssrerag cynar nygbtrgure, Cebgbtra'f ernfbaf sbe jnagvat gb gel bhg gur nyvra ivehf/znpuvar ba gur jubyr cbchyngvba bs Rebf ner jrnx. Vs gur cbvag bs gur znpuvar vf gb gnxr bire rkvfgvat ovbznff naq erfuncr vg nppbeqvat gb fbzr cebtenz, vg jbhyq frrz vasvavgryl rnfvre gb tvir vg hzcgrra gbaf bs lrnfg gb cynl jvgu, guna gb fcraq lrnef bepurfgengvat gur gnxr-bire bs n pbybal jvgu bire n zvyyvba crbcyr, gb fnl abguvat bs gur erqhprq cbffvovyvgl sbe oybj-onpx, frphevgl oernpurf, rgp. Ab qbhog gurl'q jnag gb gel vg ba crbcyr riraghnyyl, ohg fgnegvat gurer, jvgu ab pbageby bire rssrpgf, vf whfg onq rkcrevzragny qrfvta. Cyhf &quot;tvir gur napvrag fhcre-nqinaprq nyvra jne znpuvar pbageby bire na nfgrebvq&quot; qbrf abg fbhaq yvxr n cyna juvpu jbhyq qrirybc gb n fbpvbcngu'f nqinagntr. (Gurl jbhyqa'g pner nobhg gur qnzntr gb bguref, ohg gurzfryirf?)&lt;/dd&gt;
&lt;dd&gt;In conclusion, bring me back my cane and then get off my lawn, you're
trampling the lilies.&lt;/dd&gt;

&lt;dt&gt;&lt;a href=&quot;http://www.zatrikion.blogspot.com/&quot; name=&quot;fall-from-earth&quot;&gt;Matthew Johnson&lt;/a&gt;, &lt;citE&gt;Fall from Earth&lt;/cite&gt; [buying: &lt;a href=&quot;http://store.bundoranpress.com/science-fiction/fall-from-earth.html&quot;&gt;publisher&lt;/a&gt;, &lt;a href=&quot;http://iambik.com/books/fall-earth-by-matthew-johnson/&quot;&gt;audio&lt;/a&gt;]&lt;/dt&gt;
&lt;dd&gt;Mind candy. Scheme-laden first-contact space opera with a social setting I
can only call &quot;The Ming Dynasty IN SPAAAAACE&quot;.  Good enough that I will keep a look out for more from Johnson.&lt;/dd&gt;
&lt;dd&gt;&lt;span class=&quot;blognotes&quot;&gt;It's a small thing, but Johnson shows no
appreciation of the energy required to move food from planet to planet, which
makes his &quot;equitable marketing system&quot; a complete non-starter.  (But he shares
this flaw with
Cherryh's &lt;a href=&quot;http://www.tor.com/blogs/2008/12/norway-has-her-standards-cj-cherryhs-downbelow-station&quot;&gt;deservedly-admired&lt;/a&gt; &lt;cite&gt;&lt;a href=&quot;http://www.powells.com/partner/35751/biblio/9780756405502&quot;&gt;Downbelow
Station&lt;/a&gt;&lt;/cite&gt;.)  If, however, the magistracy wants to make sure that no
world can become self-sufficient, the way to do it would be to restrict
their &lt;em&gt;manufacturing&lt;/em&gt;, since any colony would be dependent for survival
on a complex industrial infrastructure.&lt;/span&lt;/dd&gt;

&lt;dt&gt;Bernard Williams, &lt;cite&gt;&lt;a href=&quot;http://www.powells.com/partner/35751/biblio/9780691117911&quot; name=&quot;williams-on-truthfulness&quot;&gt;Truth and Truthfulness: An Essay in Genealogy&lt;/a&gt;&lt;/cite&gt;&lt;/dt&gt;
&lt;dd&gt;Shorter Williams: &quot;Say what you mean. Bear witness.  Iterate.&quot;  (The late
John M. Ford, &lt;a href=&quot;http://nielsenhayden.com/electrolite/archives/003789.html#29472&quot;&gt;in a different context&lt;/a&gt;.)&lt;/dd&gt;
&lt;dd&gt;Slightly longer: You can get a decent sense of what the book is about from the
&lt;a href=&quot;http://press.princeton.edu/titles/7328.html&quot;&gt;publishers&lt;/a&gt;, so I'll
comment without much exposition.&lt;/dd&gt;
&lt;dd&gt;When Williams talks about a &quot;genealogy&quot; of some idea or practice, he means
an account of why, if it did not exist, we would have to invent it.
Specifically, he spins a state-of-nature story about how if, in the state of
nature, human beings did not have an idea of truth, but nonetheless were social
and rational animals, and so dependent on a division of epistemic labor, they
would have to form one, and two &quot;virtues of truthfulness&quot;, namely &quot;sincerity&quot;
(Ford's &quot;say what you mean&quot;) and &quot;accuracy&quot; (Ford's &quot;bear witness&quot;) to make it
effective.  This is not intended as history or pre-history (Williams: &quot;the
state of nature is not the Pleistocene&quot;), but it is a bit mysterious to me how
then it is supposed to &lt;em&gt;explain&lt;/em&gt; our notions of truth, truthfulness,
sincerity, accuracy, etc., much less explain them &quot;non-reductively&quot;.  Perhaps
&amp;mdash; this is suggested by his section on &quot;Shameful Origins&quot; &amp;mdash; it is
just supposed to make us feel better about having them, by convincing us that
we could have acquired such ideas in a way which doesn't discredit them.  (We
are not suckers.)&lt;/dd&gt;
&lt;dd&gt;It may sound odd to describe &quot;accuracy&quot; as a virtue, but being accurate ---
bearing &lt;em&gt;good&lt;/em&gt; witness --- means things like check tendencies to leap to
conclusion, choosing appropriate methods of inquiry, taking pains to secure all
the relevant facts (Williams is especially good on the notion of &quot;facts&quot;), etc.
Williams is indeed eloquent on how the virtues of accuracy are one of the
things which have made the pursuit of science a source of human values,
especially in circumstances where honesty otherwise was hard.&lt;/dd&gt;
&lt;dd&gt;As this last suggests, culture lets us articulate the raw virtues of
sincerity and accuracy into incredibly elaborate and interlocking complexes of
attitudes and practices (Ford's &quot;iterate&quot;).  From the inside, these have, or at
least seem to have, intrinsic as well as instrumental value, and indeed they
would not work at if their value was &lt;em&gt;just&lt;/em&gt; instrumental.  I confess
that I do not fully follow Williams's attempt to try to explain when or why or
how the virtues of truth become &quot;intrinsic values&quot;.  It seems to be something
like: people find these values &lt;em&gt;compelling&lt;/em&gt;, in a way which they would
not if they saw them just as handy tools for achieving selfish ends; this in
turn makes these values successful commitment devices &lt;a href=&quot;#n1&quot; name=&quot;b1&quot;&gt;[1]&lt;/a&gt;. Williams seems
to me to equivocate as to whether these virtues &lt;em&gt;really do&lt;/em&gt; have such
intrinsic value, but on balance I am just as happy that he strayed no deeper
into the swamp of meta-ethics, and wisely turned back to the sounder terrain of
looking at certain episodes in the articulation of these virtues.  The two main
case-studies he gives are contrasts of Thucydides and Herodotus on history, and
of Rousseau and Diderot on authenticity and the self.  Both of these really
have a wider, philosophical import, and as such they would both have been
stronger for a more comparative, cross-cultural perspective &amp;mdash; not in the
service of the small virtue of courtesy (Williams has mercifully few &quot;what you
mean 'we', white man?&quot;  moments), but rather in the service of the great virtue
of accuracy &lt;a href=&quot;#n2&quot; name=&quot;b2&quot;&gt;[2]&lt;/a&gt;.&lt;/dd&gt;
&lt;dd&gt;But I see that I am descending into my usual quibbling.  This is a
profoundly thoughtful and profoundly learned book, which says interesting
things to say about some of the deepest and most humanly-important problems in
philosophy, and says them elegantly.  Go read.&lt;/dd&gt;

&lt;dd&gt;&lt;span class=&quot;blognotes&quot;&gt;&lt;a name=&quot;n1&quot;&gt;[1]&lt;/a&gt; I cannot help but be reminded of &lt;a href=&quot;http://psychclassics.yorku.ca/James/Principles/prin24.htm&quot;&gt;William James&lt;/a&gt;:
&lt;blockquote&gt;
&lt;em&gt;Now, why do the various animals do what seem to us such strange things&lt;/em&gt;, in the presence of such outlandish stimuli? Why does the hen, for example, submit herself to the tedium of incubating such a fearfully uninteresting set of objects as a nestful of eggs, unless she have some sort of a prophetic inkling of the result?  The only answer is &lt;em&gt;ad hominem&lt;/em&gt;.  We can only interpret the instincts of brutes by what we know of instincts in ourselves.  Why do men always lie down, when they can, on soft beds rather than on hard floors?  Why do they sit round the stove on a cold day?  Why, in a, room, do they place themselves, ninety-nine times out of a hundred, with their faces towards its middle rather than to the wall?  Why do they prefer saddle of mutton and champagne to hard-tack and ditch-water?  Why does the maiden interest the youth so that everything about her seems more important and significant than anything else in the world?  Nothing more can be said than that these are human ways, and that every creature likes its own ways, and takes to the following them as a, matter of course.  Science may come and consider these ways, and find that most of them are useful.  But it is not for the sake of their utility that they are followed, but because at the moment of following them we feel that that is the only appropriate and natural thing to do.  Not one man in a billion, when taking his dinner, ever thinks of utility.  He eats because the food tastes good and makes him want more.  If you ask him why he should want to eat more of what tastes like that, instead of revering you as a philosopher he will probably laugh at you for a fool.  The connection between the savory sensation and the act it awakens is for him absolute and &lt;em&gt;selbstverst&amp;auml;ndlich&lt;/em&gt;, an &quot;a priori synthesis&quot; of the most perfect sort, needing no proof but its own evidence.  It takes, in short, what Berkeley calls a mind debauched by learning to carry the process of making the natural seem strange, so far as to ask for the why of any instinctive human act.  To the metaphysician alone can such questions occur as: Why do we smile, when pleased, and not scowl?  Why are we unable to talk to a crowd as we talk to a single friend?  Why does a particular maiden turn our wits so upside-down?  The common man can only say, &quot;Of course we smile, of course our heart palpitates at the sight of the crowd, of course we love the maiden, that beautiful soul clad in that perfect form, so palpably and flagrantly made from all eternity to be loved!&quot;

&lt;br&gt;And so, probably, does each animal feel about the particular things it tends to do in presence of particular objects.  They, too, are a priori syntheses.  To the lion it is the lioness which is made to be loved; to the bear, the she-bear.  To the broody hen the notion would probably seem monstrous that there should be a creature in the world to whom a nestful of eggs was not the utterly fascinating and precious and never-to-be-too-much-sat-upon object which it is to her.

&lt;br&gt;Thus we may be sure that, however mysterious some animals' instincts may appear to us, our instincts will appear no less mysterious to them.  And we may conclude that, to the animal which obeys it, every impulse and every step of every instinct shines with its own sufficient light, end seems at the moment the only eternally right and proper thing to do.  It is done for its own sake exclusively.  What voluptuous thrill may not shake a fly, when she at last discovers the one particular leaf, or carrion, or bit of dung, that out of all the world can stimulate her ovipositor to its discharge?  Does not the discharge then seem to her the only fitting thing?  And need she care or know anything about the future maggot and its food?&lt;/blockquote&gt;

More soberly, or at least with fewer hens and maggots, this is highly
reminiscent of Robert
Frank's &lt;cite&gt;&lt;a href=&quot;http://www.powells.com/partner/35751/biblio/9780393960228&quot;&gt;Passion
within Reason&lt;/a&gt;&lt;/cite&gt;, which I do not believe Williams mentions.
&lt;a href=&quot;#b1&quot;&gt;^&lt;/a&gt;&lt;/span&gt;

&lt;dd&gt;&lt;span class=&quot;blognotes&quot;&gt;&lt;a name=&quot;n2&quot;&gt;[2]&lt;/a&gt; Williams claims, quite
plausibly, that Thucydides had different ideas about historical explanation and
historical evidence than did Herodotus &amp;mdash; ones which are both stricter
about what counts as acceptable history, and which are supported by compelling
rationales even within the older framework.  He also claims, more sketchily,
that Herodotus was immersed in a culture which was still partly oral and partly
literature, while Thucydides was &lt;em&gt;not&lt;/em&gt;.  If all this was right, should
not the same contrast show up in the historical traditions of China, the
Islamic world, etc.?  Why does such a tradition not seem to be indigenous to
India?  (Cf., on all this,
Brown's &lt;a href=&quot;http://www.powells.com/partner/35751/biblio/9780816510603&quot;&gt;&lt;cite&gt;History,
Hierarchy, and Human Nature&lt;/a&gt;&lt;/cite&gt;.) Western Europe, after the fall of the
western Roman Empire, never lost literacy, but it certainly didn't produce
histories like Thucydides's for many centuries: why, on Williams's account,
not?  (Actually, outside of Italy, did western Europe ever produce such
histories &lt;em&gt;before&lt;/em&gt; the fall of the empire?)  If there are important
distinctions between these cases, such that Williams's account applies only in
the special circumstances of the Aegean around 500--300 BC, what are those
circumstances?  &amp;mdash; Let me add that it was &lt;em&gt;Williams&lt;/em&gt; who made all
these considerations relevant, not me. &lt;a href=&quot;#b2&quot;&gt;^&lt;/a&gt;&lt;/span&gt;&lt;/dd&gt;






&lt;dt&gt;J. C. W. Rayner and D. J. Best, &lt;cite&gt;&lt;a href=&quot;http://www.powells.com/partner/35751/biblio/0-19-505610-8&quot; name=&quot;rayner-best&quot;&gt;Smooth Tests of Goodness of Fit&lt;/a&gt;&lt;/cite&gt;&lt;/dt&gt;
&lt;dd&gt;Suppose a random variable \( Y \) is confined to the unit interval \( [0,1]
\), and we want to test whether it is uniformly distributed.  One way to do
this would be to construct alternative distributions which are in some sense
smooth departures from uniformity, with densities \( g(y;\theta) =
e^{\sum_{j=1}^{d}{\theta_j h_j(y)}}/z(\theta) \), where it is convenient to
chose the \( h_j \) functions to be an orthonormal basis --- the cosine basis,
say, or the Legendre polynomials.  (That is, they are orthonormal in \( L_2 \),
the space of square-integrable functions on the unit interval.)  Uniformity is
then the special case \( \theta = 0 \), and we can test it against the
alternative that \( \theta \neq 0 \) by the usual devices of a likelihood-ratio
test, a score test, etc., which will all, under the null hypothesis, have an
asymptotic \( \chi^2_d \) distribution.  This is Neyman's original smooth test,
which &lt;a href=&quot;http://ssrn.com/abstract=272888&quot;&gt;seems to have originated&lt;/a&gt;
from the problem of how to combine &lt;i&gt;p&lt;/i&gt;-values from independent
experiments, which should all be uniformly distributed under the null
hypothesis.  One nice feature of this test is that if we reject the null, we
immediately have an alternative, namely our maximum likelihood estimate of \(
\theta \), for what the actual distribution is --- it tells us not just that
the null model is wrong, but how, and what a better one would be like.&lt;/dd&gt;
&lt;dd&gt;The real power of this comes from the following observation.  If \( X \) is
distributed according to some continuous CDF \( F \), then \( Y=F(X) \) is
uniformly distributed on \( [0,1] \).  The smooth alternatives for \( Y \)
translate into smooth alternatives for \( X \), with densities \( g_X(x,\theta)
= f(x) e^{\sum_{j=1}^{d}{\theta_j h_j(F(x))}}/z(\theta) \).  We can test
whether \( X \sim F \) by, once again, testing with \( \theta = 0 \), and the
theory works just as before.  If \( F \) is not fixed but involves some
parameters \( \beta \), then we consider the smooth alternative densities \(
g_{X}(x;\beta,\theta) = f(x;\beta) e^{\sum_{j=1}^{d}{\theta_j
h_j(F(x;\beta))}}/z(\theta) \), and again we test the specification by testing
\( \theta = 0 \).  Since this always involves fixing \( d \) parameters, we
always get a \( \chi^2_d \) asymptotic distribution under the null.&lt;/dd&gt;
&lt;dd&gt;Rayner and Best's monograph is a clear, if now somewhat old-fashioned,
exposition of Neyman's smooth test and its relatives and extensions.  They
actually begin with Pearson's \( X^2 \) or \( \chi^2 \) test, which can be seen
as a smooth test for multinomial (rather than continuous) data, before going on
to consider the general theory of likelihood ratio and score tests, and
Neyman's smooth tests.  Much of the book is taken up with various permutations
of discretizing continuous variables and/or allowing estimation of the
parameters I have written \( \beta \); the latter concern seems less important
these days.&lt;/dd&gt;
&lt;dd&gt;An important set of developments which does not get as much attention here
as a more recent treatment would give is that of picking the order of the
alternatives \( d \).  Neyman suggested \( d = 4 \) but emphasized it was
guess; some later workers guessed \( d = 2 \) should be enough.  Really,
however, this is a problem of &lt;a href=&quot;../reviews/claeskens-hjort.html&quot;&gt;model
selection&lt;/a&gt; or capacity control, and so all the usual tools, like
cross-validation or information criteria, can be applied.  This is one place
where &lt;a href=&quot;http://doc.utwente.nl/62408/&quot;&gt;BIC has proved particularly
useful&lt;/a&gt;, leading
to &lt;a href=&quot;http://CRAN.R-project.org/package=ddst&quot;&gt;&quot;data-driven&quot; smooth
tests&lt;/a&gt;.  These no longer have nice \( \chi^2 \) asymptotics, but it's pretty
easy to get their sampling distributions from simulation.&lt;/dd&gt;
&lt;dd&gt;Despite these limits, this is still a useful reference for people interested in specification checking.&lt;/dd&gt;




&lt;dt&gt;&lt;a href=&quot;http://aliettedebodard.com/&quot;&gt;Aliette De Bodard&lt;/a&gt;, &lt;cite&gt;&lt;a href=&quot;http://www.powells.com/partner/35751/biblio/9780857660312&quot; name=&quot;servant-of-the-underworld&quot;&gt;Servant of the
Underworld&lt;/a&gt;&lt;/cite&gt;&lt;/dt&gt;
&lt;dd&gt;Mind candy: historical fantasy/mystery set in Tenochtitlan (a few
generations before what would be the Conquest), only with the mythology of the
Aztecs being literally true and magic very much a part of actual life.  It had
some typical first-novel flaws (too much exposition, the plot drags in places),
but overall decent.&lt;/dd&gt;

&lt;/dl&gt;


&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_algae.html&quot;&gt;Books to Read While the Algae Grow in Your Fur&lt;/a&gt;;
&lt;a href=&quot;cat_scientifiction.html&quot;&gt;Scientifiction and Fantastica&lt;/a&gt;;
&lt;a href=&quot;cat_detection.html&quot;&gt;Pleasures of Detection, Portraits of Crime&lt;/a&gt;;
&lt;a href=&quot;cat_enigmas_of_chance.html&quot;&gt;Enigmas of Chance&lt;/a&gt;;
&lt;a href=&quot;cat_central_asia.html&quot;&gt;Central Asia&lt;/a&gt;;
&lt;a href=&quot;cat_philosophy.html&quot;&gt;Philosophy&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>Brought to You by the Letters D, A, and G (Advanced Data Analysis from an Elementary Point of View)</title>
    <link>http://bactra.org/weblog/911.html</link>
    <description>
&lt;P&gt;In which the arts of estimating causal effects from observational data are practiced on &lt;cite&gt;Sesame Street&lt;/cite&gt;.

&lt;P&gt;&lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/uADA/12/hw/11/hw-11.pdf&quot;&gt;Assignment&lt;/a&gt;

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_36-402.html&quot;&gt;Advanced Data Analysis from an Elementary Point of View&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>Estimating Causal Effects from Observations (Advanced Data Analysis from an Elementary Point of View)</title>
    <link>http://bactra.org/weblog/910.html</link>
    <description>
&lt;P&gt;Estimating graphical models: substituting consistent estimators into the
formulas for front and back door identification; average effects and
regression; tricks to avoid estimating marginal distributions; propensity
scores and matching and propensity scores as computational short-cuts in
back-door adjustment.  Instrumental variables estimation: the Wald estimator,
two-stage least-squares.  Summary recommendations for estimating causal
effects.

&lt;P&gt;&lt;em&gt;Reading&lt;/em&gt;:
Notes, &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch24.pdf&quot;&gt;chapter
24&lt;/a&gt;

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_36-402.html&quot;&gt;Advanced Data Analysis from an Elementary Point of View&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>Separated at Birth (Advanced Data Analysis from an Elementary Point of View)</title>
    <link>http://bactra.org/weblog/909.html</link>
    <description>
&lt;P&gt;In which we use graphical causal models to understand twin studies and variance components.

&lt;P&gt;&lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/uADA/12/hw/10/hw-10.pdf&quot;&gt;Assignment&lt;/a&gt;

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_36-402.html&quot;&gt;Advanced Data Analysis from an Elementary Point of View&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>Identifying Causal Effects from Observations (Advanced Data Analysis from an Elementary Point of View)</title>
    <link>http://bactra.org/weblog/908.html</link>
    <description>
&lt;P&gt;Reprise of causal effects vs. probabilistic conditioning.  &quot;Why think, when
you can do the experiment?&quot;  Experimentation by controlling everything
(Galileo) and by randomizing (Fisher).  Confounding and identifiability.  The
back-door criterion for identifying causal effects: condition on covariates
which block undesired paths.  The front-door criterion for identification: find
isolated and exhaustive causal mechanisms.  Deciding how many black boxes to
open up.  Instrumental variables for identification: finding some exogenous
source of variation and tracing its effects.  Critique of instrumental
variables: vital role of theory, its fragility, consequences of weak
instruments.  Irremovable confounding: an example with the detection of social
influence; the possibility of bounding unidentifiable effects.  Summary
recommendations for identifying causal effects.

&lt;P&gt;&lt;em&gt;Reading&lt;/em&gt;:
Notes, &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch23.pdf&quot;&gt;chapter
23&lt;/a&gt;

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_36-402.html&quot;&gt;Advanced Data Analysis from an Elementary Point of View&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>Just How Quickly Do We Forget?</title>
    <link>http://bactra.org/weblog/907.html</link>
    <description>
&lt;!-- 
&lt;script type=&quot;text/javascript&quot;
   src=&quot;http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML&quot;&gt;&lt;/script&gt;
!--&gt;

&lt;blockquote&gt;&lt;em&gt;Attention conservation notice&lt;/em&gt;: 2500+ words on estimating
how quickly time series forget their own history.  Only of interest if you care
about the intersection of stochastic processes and statistical learning theory.
Full of jargon, equations, log-rolling and self-promotion, yet utterly
abstract.&lt;/blockquote&gt;

&lt;P&gt;I &lt;a href=&quot;902.html&quot;&gt;promised to say something
about the content of Daniel's thesis&lt;/a&gt;, so let me talk about two of his
papers, which go into chapter 4; there is a short conference version and a long
journal version.
&lt;dl&gt;
&lt;dt&gt;Daniel J. McDonald, Cosma Rohilla Shalizi and Mark Schervish, &quot;Estimating beta-mixing coefficients&quot;, &lt;a href=&quot;http://jmlr.csail.mit.edu/proceedings/papers/v15/mcdonald11a.html&quot;&gt;AIStats 2011&lt;/a&gt;, &lt;a href=&quot;http://arxiv.org/abs/1103.0941&quot;&gt;arxiv:1103.0941&lt;/a&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;em&gt;Abstract&lt;/em&gt;: The literature on statistical learning for time series
assumes the asymptotic independence or &quot;mixing&quot; of the data-generating
process. These mixing assumptions are never tested, nor are there methods for
estimating mixing rates from data. We give an estimator for the \( \beta
\)-mixing rate based on a single stationary sample path and show it is \( L_1
\)-risk consistent.&lt;/dd&gt;
&lt;dt&gt;----, &quot;Estimating beta-mixing coefficients via histograms&quot;, &lt;a href=&quot;http://arxiv.org/abs/1109.5998&quot;&gt;arxiv:1109.5998&lt;/a&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;em&gt;Abstract&lt;/em&gt;: The literature on statistical learning for time series often assumes asymptotic independence or &quot;mixing&quot; of data sources. Beta-mixing has long been important in establishing the central limit theorem and invariance principle for stochastic processes; recent work has identified it as crucial to extending results from empirical processes and statistical learning theory to dependent data, with quantitative risk bounds involving the actual beta coefficients. There is, however, presently no way to actually estimate those coefficients from data; while general functional forms are known for some common classes of processes (Markov processes, ARMA models, etc.), specific coefficients are generally beyond calculation. We present an \( L_1 \)-risk consistent estimator for the beta-mixing coefficients, based on a single stationary sample path. Since mixing coefficients involve infinite-order dependence, we use an order-d Markov approximation. We prove high-probability concentration results for the Markov approximation and show that as \( d \rightarrow \infty \), the Markov approximation converges to the true mixing coefficient. Our estimator is constructed using d dimensional histogram density estimates. Allowing asymptotics in the bandwidth as well as the dimension, we prove \( L_1 \) concentration for the histogram as an intermediate step.&lt;/dd&gt;
&lt;/dl&gt;

&lt;P&gt;&lt;a href=&quot;668.html&quot;&gt;Recall&lt;/a&gt; the world's simplest ergodic theorem: if \( X_t \) is a sequence
of random variables with common expectation \( m \) and variance \( v \), and
stationary covariance \( \mathrm{Cov}[X_t, X_{t+h}] = c_h \).  Then
the time average \( \overline{X}_n \equiv \frac{1}{n}\sum_{i=1}^{n}{X_i} \) also has expectation \( m \), and the question is whether it converges on that expectation.
The world's simplest ergodic theorem asserts that
if the correlation time
\[
T = \frac{\sum_{h=1}^{\infty}{|c_h|}}{v} &lt; \infty
\]
then
\[
\mathrm{Var}\left[ \overline{X}_n \right] \leq \frac{v}{n}(1+2T)
\]

&lt;P&gt;Since, as I said, the expectation of \( \overline{X}_n \) is \( m \) and its variance is
going to zero, we say that \( \overline{X}_n \rightarrow m \) &quot;in mean square&quot;.

&lt;P&gt;From this, we can get a crude but often effective &lt;a href=&quot;http://bactra.org/notebooks/deviation-inequalities.html&quot;&gt;deviation
inequality&lt;/a&gt;, using &lt;a href=&quot;http://en.wikipedia.org/wiki/Chebyshev's_inequality&quot;&gt;Chebyshev's inequality&lt;/a&gt;:

\[
\Pr{\left(|\overline{X}_n - m| &gt; \epsilon\right)} \leq \frac{v}{\epsilon^2}\frac{1+2T}{n}
\]


&lt;P&gt;The meaning of the condition that the correlation time \( T \) be finite is
that the correlations themselves have to trail off as we consider events which
are widely separated in time &amp;mdash; they don't ever have to be zero, but they
do need to get smaller and smaller as the separation \( h \) grows.  (One can
actually weaken the requirement on the covariance function to just \(
\lim_{n\rightarrow \infty}{\frac{1}{n}\sum_{h=1}^{n}{c_h}} = 0 \), but this
would take us too far afield.)  In fact, as these formulas show, the
convergence looks just like what we'd see for independent data, only with \(
\frac{n}{1+2T} \) samples instead of \( n \), so we call the former the
effective sample size.

&lt;P&gt;All of this is about the convergence of averages of \( X_t \), and based on
its covariance function \( c_h \).  What if we care not about \( X \) but about
\( f(X) \)?  The same idea would apply, but unless \( f \) is linear, we can't
easily get its covariance function from \( c_h \).  The mathematicians'
solution to this has been to invent stronger notions of decay-of-correlations,
called &quot;mixing&quot;.  Very roughly speaking, we say that \( X \) is mixing when, if
you pick any two (nice) functions \( f \) and \( g \), I can always show that

\[
\lim_{h\rightarrow\infty}{\mathrm{Cov}\left[ f(X_t), g(X_{t+h}) \right]} = 0
\]

&lt;P&gt;Note (or believe) that this is &quot;convergence in distribution&quot;; it happens if,
and only if, the distribution of events up to time \( t \) is becoming
independent of the distribution of events from time \( t+h \) onwards.

&lt;P&gt;To get useful results, it is necessary to quantify mixing, which is usually
done through somewhat stronger notions of dependence.  (Unfortunately, none of
these have meaningful names.
The &lt;a href=&quot;http://arxiv.org/abs/math/0511078&quot;&gt;review by Bradley&lt;/a&gt; ought to
be the standard reference.)  For instance, the &quot;total variation&quot; or \( L_1 \)
distance between probability measures \( P \) and \( Q \), with densities \( p
\) and \( q \) is,

\[
d_{TV}(P,Q) = \frac{1}{2}\int{|p(u) - q(u)| du}
\]

This has several interpretations, but the easiest to grasp is that it says how
much \( P \) and \( Q \) can differ in the probability they give to any one
event: for any \( E \), \( d_{TV}(P,Q) \geq |P(E) - Q(E)| \).  One use of this
distance is to measure how the dependence between random variables, by seeing
far their joint distribution is from the product of their marginal
distributions.  Abusing notation a little to write \( P(U,V) \) for the joint
distribution of \( U \) and \( V \), we measure dependence as

\[
\beta(U,V) \equiv d_{TV}(P(U,V), P(U) \otimes P(V)) = \frac{1}{2}\int{|p(u,v)-p(u)p(v)|du dv}
\]

This will be zero just when \( U \) and \( V \) are statistically independent,
and one when, on average, conditioning on \( U \) confines \( V \) to a set
which would otherwise have probability zero.  (For instance if \( U \) has a
continuous distribution and \( V \) is a function of \( U \) &amp;mdash; or one of
two randomly chosen functions of \( U \).)

&lt;P&gt;We can relate this back to the earlier idea of correlations between functions by realizing that

\[
\beta(U,V) = \sup_{|r|\leq 1}{\left|\int{r(u,v) dP(U,V)} - \int{r(u,v)dP(U)dP(V)}\right|} ~,
\]

that \( \beta \) says how much the expected value of a bounded function \( r \)
could change between the dependent and the independent distributions.  (There
is no assumption that the test function \( r \) factorizes, and in fact it's
important to allow \( r(u,v) \neq f(u)g(v) \).)





&lt;P&gt;We apply these ideas to time series by looking at the dependence between the past and the future:

\[
\begin{eqnarray*}
\beta(h) &amp; \equiv &amp; d_{TV}(P(X^t_{-\infty}, X_{t+h}^{\infty}), P(X^t_{-\infty}) \otimes P(X_{t+h}^{\infty})) \\
&amp; = &amp; \frac{1}{2}\int{|p(x^t_{-\infty},x_{t+h}^{\infty})-p(x^t_{-\infty})p(x^{\infty}_{t+h})|dx^t_{-\infty}dx^{\infty}_{t+h}}
\end{eqnarray*}
\]

(By stationarity, the integral actually does not depend on \( t \).)  When \(
\beta(h) \rightarrow 0 \) as \( h \rightarrow \infty \), we have a
&quot;beta-mixing&quot; process.  (These are also called
&quot;&lt;a href=&quot;http://absolutely-regular.blogspot.com/&quot;&gt;absolutely regular&lt;/a&gt;&quot;.)
Convergence in total variation implies convergence in distribution, but not
vice versa, so beta-mixing is stronger than common-or-garden mixing.

&lt;P&gt;Notions like beta-mixing were originally introduced purely for probabilistic
convenience, to handle questions
like &lt;a href=&quot;http://www.pnas.org/cgi/reprint/42/1/43&quot;&gt;&quot;when does the central
limit theorem hold for stochastic processes?&quot;&lt;/a&gt;  These are interesting for
people who like stochastic processes, or indeed for those who want to do Markov
chain Monte Carlo and want to know how long to let the chain run.  For our
purposes, though, what's important is that when people in statistical learning
theory have
given &lt;a href=&quot;http://bactra.org/notebooks/dependent-learning.html&quot;&gt;serious
attention to dependent data&lt;/a&gt;, they have usually relied on a beta-mixing
assumption.

&lt;P&gt;The reason for this focus on beta-mixing is that it &quot;plays nicely&quot; with
approximating dependent processes by independent ones.  The usual form of such
arguments is as follows.  We want to prove a result about our dependent but
mixing process \( X \).  For instance, we realize that our favorite prediction
model will tend to do worse out-of-sample than on the data used to fit it, and
we might want to bound the probability that this over-fitting will exceed \(
\epsilon \).  If we know the beta-mixing coefficients \( \beta(h) \), we can
pick a separation, call it \( a \), where \( \beta(a) \) is reasonably small.
Now we divide \( X \) up into \( \mu = n/a \) blocks of length \( a \).  If we
take every other block, they're nearly independent of each other (because \(
\beta(a) \) is small) but not quite (because \( \beta(a) \neq 0 \)).  Introduce
a (fictitious) random sequence \( Y \), where blocks of length \( a \) have the
same distribution as the blocks in \( X \), but there's no dependence between
blocks.  Since \( Y \) is an IID process, it is easy for us to prove that, for
instance, the probability of over-fitting \( Y \) by more than \( \epsilon \)
is at most some small \( \delta(\epsilon,\mu/2) \).  Since \( \beta \) tells us
about how well dependent probabilities are approximated by independent ones,
the probability of the bad event happening with the dependent data is at most
\( \delta(\epsilon,\mu/2) + (\mu/2)\beta(a) \).  We can make this as small as
we like by letting \( \mu \) and \( a \) both grow as the time series gets
longer.  Basically, anything result which holds for an IID process will also
hold for a beta-mixing one, with a penalty in the probability that depends on
\( \beta \).  There are some details to fill in here (how to pick the
separation \( a \)?  should the blocks always be the same length as the
&quot;filler&quot; between blocks?), but this is the basic frame.

&lt;P&gt;What it leaves open, however, is how to &lt;em&gt;estimate&lt;/em&gt; the mixing
coefficients \( \beta(h) \).  For Markov models, one could it principle
calculate it from the transition probabilities.  For more general processes,
though, calculating beta from the known distribution is not easy.  In fact, we
are not aware of any previous work on &lt;em&gt;estimating&lt;/em&gt; the \( \beta(h) \)
coefficients from observational data.  (References welcome!)  Because of this,
even in learning theory, people have just assumed that the mixing coefficients
were known, or that it was known they went to zero at a certain rate.  This was
not enough for what we wanted to do, which was actually calculate bounds on
error from data.

&lt;P&gt;There were two tricks to actually coming up with an estimator.  The first
was to reduce the ambitions a little bit.  If you look at the equation for \(
\beta(h) \) above, you'll see that it involves integrating over the
infinite-dimensional distribution.  This is daunting, so instead of looking at
the whole past and future, we'll introduce a horizon, \( d \) steps away, and
cut things off there:

\[
\begin{eqnarray*}
\beta^{(d)}(h) &amp; \equiv &amp; d_{TV}(P(X^t_{t-d}, X_{t+h}^{t+h+d}), P(X^t_{t-d}) \otimes P(X_{t+h}^{t+h+d})) \\
&amp; = &amp; \frac{1}{2}\int{|p(x^t_{t-d},x_{t+h}^{t+h+d})-p(x^t_{t-d})p(x^{t+h+d}_{t+h})|dx^t_{t-d}dx^{t+h+d}_{t+h}}
\end{eqnarray*}
\]

If \( X \) is a Markov process, then there's no difference between \(
\beta^{(d)}(h) \) and \( \beta(h) \).  If \( X \) is a Markov process of order
\( p \), then \( \beta^{(d)}(h) = \beta(h) \) once \( d \geq p \).  If \( X \)
is not Markov at any order, it is still the case that \( \beta^{(d)}(h)
\rightarrow \beta(h) \) as \( d \) grows.  So we have an approximation to \(
\beta \) which only involves finite-dimensional integrals, which we might have
some hope of doing.

&lt;P&gt;The other trick is to get rid of those integrals.  Another way of writing
the beta-dependence between the random variables \( U \) and \( V \) is

\[
\beta(U,V) =
\sup_{\mathcal{A},\mathcal{B}}{\frac{1}{2}\sum_{a\in\mathcal{A}}{\sum_{b\in\mathcal{B}}{\left| \Pr{(a
\cap b)} - \Pr{(a)}\Pr{(b)} \right|}}}
\]

where \( \mathcal{A} \) runs over finite partitions of values of \( U \), and
\( \mathcal{B} \) likewise runs over finite partitions of values of \( V \).  I
won't try to show that this formula is equivalent to the earlier definition,
but I will contend that if you think about how that integral gets cashed out as
a sum, you can sort of see how it would be.  If we want \( \beta^{(d)}(h) \),
we can take \( U = X^{t}_{t-d} \) and \( V = X^{t+h+d}_{t+h} \), and we could
find the dependence by taking the supremum over partitions of those two
variables.

&lt;P&gt;Now, suppose that the joint density \( p(x^t_{t-d},x_{t+h}^{t+h+d}) \) was
piecewise constant, with those pieces being rectangles parallel to the
coordinate axes.  Then sub-dividing those rectangles would not change the sum,
and the \( \sup \) would actually be attained for that particular partition.
Most densities are not of course piecewise constant, but we
can &lt;em&gt;approximate&lt;/em&gt; them by such piecewise-constant functions, and make
the approximation arbitrarily close (in total variation).  More, we
can &lt;a href=&quot;algae-2009-11.html#combinatorial-methods&quot;&gt;&lt;em&gt;estimate&lt;/em&gt;
those piecewise-constant approximating densities&lt;/a&gt; from a time series.  Those
estimates are, simply, histograms, which are about the oldest form of
&lt;a href=&quot;../notebooks/density-estimation.html&quot;&gt;density estimation&lt;/a&gt;.  We show
that histogram density estimates converge in total variation on the true
densities, when the bin-width is allowed to shrink as we get more data.

&lt;P&gt;Because the total variation distance is in fact a metric, we can use the
triangle inequality to get an upper bound on the true beta coefficient, in
terms of the beta coefficients of the estimated histograms, and the expected
error of the histogram estimates.  All of the error terms shrink to zero as the
time series gets longer, so we end up with consistent estimates of \(
\beta^{(d)}(h) \).  That's enough if we have a Markov process, but in general
we don't.  So we can let \( d \) grow as \( n \) does, and that (after a
surprisingly long measure-theoretic argument) turns out to do the job: our
histogram estimates of \( \beta^{(d)}(h) \), with suitably-growing \( d \),
converge on the true \( \beta(h) \).

&lt;P&gt;To confirm that this works, the papers go through some simulation examples,
where it's possible to cross-check our estimates.  We can of course also do
this for empirical time series.  For instance, in his this Daniel took four
standard macroeconomic time series for the US (GDP, consumption, investment,
and hours worked, all de-trended in the usual way).  This data goes back to
1948, and is measured four times a year, so there are 255 quarterly
observations.  Daniel estimated a \( \beta \) of 0.26 at one quarter's
separation, \( \widehat{\beta}(2) = 0.15 \), \( \widehat{\beta}(3) = 0.02 \),
and somewhere between 0 and 0.11 for \(\widehat{\beta}(4) \).  (That last is a
sign that we don't have enough data to go beyond \( h = 4 \).)  Optimistically
assuming no dependence beyond a year, one can calculate the effective number of
independent data points, which is not 255 but 31.  This has morals for
macroeconomics which are worth dwelling on, but that will have to wait for
another time.  (Spoiler: \( \sqrt{\frac{1}{31}} \approx 0.18 \), and that's if
you're lucky.)


&lt;P&gt;It's inelegant to have to construct histograms when all we want is a single
number, so it wouldn't surprise us if there were a slicker way of doing this.
(For &lt;a href=&quot;http://bactra.org/notebooks/entropy-estimation.html&quot;&gt;estimating
mutual information&lt;/a&gt;, which is in many ways analogous, estimating the joint
distribution as an intermediate step is neither necessary nor desirable.)  But
for now, we &lt;em&gt;can&lt;/em&gt; do it, when we couldn't before.

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_enigmas_of_chance.html&quot;&gt;Enigmas of Chance&lt;/a&gt;;
&lt;a href=&quot;cat_incestuous_amplification.html&quot;&gt;Kith and Kin&lt;/a&gt;;
&lt;a href=&quot;cat_selfcentered.html&quot;&gt;Self-Centered&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>Graphical Causal Models (Advanced Data Analysis from an Elementary Point of View)</title>
    <link>http://bactra.org/weblog/906.html</link>
    <description>
&lt;P&gt;Probabilistic prediction is about passively selecting a sub-ensemble,
leaving all the mechanisms in place, and seeing what turns up after applying
that filter.  Causal prediction is about actively &lt;em&gt;producing&lt;/em&gt; a new
ensemble, and seeing what would happen if something were to change
(&quot;counterfactuals&quot;).  Graphical causal models are a way of reasoning about
causal prediction; their algebraic counterparts are structural equation models
(generally nonlinear and non-Gaussian).  The causal Markov property.
Faithfulness.  Performing causal prediction by &quot;surgery&quot; on causal graphical
models.  The d-separation criterion.  Path diagram rules for linear models.

&lt;P&gt;&lt;em&gt;Reading&lt;/em&gt;:
Notes, &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch22.pdf&quot;&gt;chapter
22&lt;/a&gt;

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_36-402.html&quot;&gt;Advanced Data Analysis from an Elementary Point of View&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>Exam: Is This Test Really Necessary? (Advanced Data Analysis from an Elementary Point of View)</title>
    <link>http://bactra.org/weblog/905.html</link>
    <description>
&lt;P&gt;In which the analysis of multivariate data is recursively applied.

&lt;P&gt;&lt;em&gt;Reading&lt;/em&gt;:
Notes, &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/uADA/12/exams/exam-2.pdf&quot;&gt;assignment&lt;/a&gt;

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_36-402.html&quot;&gt;Advanced Data Analysis from an Elementary Point of View&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>Graphical Models (Advanced Data Analysis from an Elementary Point of View)</title>
    <link>http://bactra.org/weblog/904.html</link>
    <description>

&lt;P&gt;Conditional independence and dependence properties in factor models.  The
generalization to graphical models.  Directed acyclic graphs.  DAG models.
Factor, mixture, and Markov models as DAGs.  The graphical Markov property.
Reading conditional independence properties from a DAG.  Creating conditional
dependence properties from a DAG.  Statistical aspects of DAGs.  Reasoning with
DAGs; does asbestos whiten teeth?


&lt;P&gt;&lt;em&gt;Reading&lt;/em&gt;:
Notes, &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch21.pdf&quot;&gt;chapter
21&lt;/a&gt;

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_36-402.html&quot;&gt;Advanced Data Analysis from an Elementary Point of View&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>Mixture Models (Advanced Data Analysis from an Elementary Point of View)</title>
    <link>http://bactra.org/weblog/903.html</link>
    <description>

&lt;P&gt;From factor analysis to mixture models by allowing the latent variable to
be discrete.  From kernel density estimation to mixture models by reducing the
number of points with copies of the kernel.  Probabilistic formulation of
mixture models.  Geometry: planes again.  Probabilistic clustering.  Estimation
of mixture models by maximum likelihood, and why it leads to a vicious circle.
The expectation-maximization (EM, Baum-Welch) algorithm replaces the vicious
circle with iterative approximation.  More on the EM algorithm: convexity,
Jensen's inequality, optimizing a lower bound, proving that each step of EM
increases the likelihood.  Mixtures of regressions.  Other extensions.

&lt;P&gt;Extended example: Precipitation in Snoqualmie Falls revisited.  Fitting a
two-component Gaussian mixture; examining the fitted distribution; checking
calibration.  Using cross-validation to select the number of components to use.
Examination of the selected mixture model.  Suspicious patterns in the
parameters of the selected model.  Approximating complicated distributions
vs. revealing hidden structure.  Using bootstrap hypothesis testing to select
the number of mixture components.


&lt;P&gt;&lt;em&gt;Reading&lt;/em&gt;:
Notes, &lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch20.pdf&quot;&gt;chapter
20&lt;/a&gt;; &lt;tt&gt;&lt;a href=&quot;http://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/mixture-examples.R&quot;&gt;mixture-examples.R&lt;/a&gt;&lt;/tt&gt;

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_36-402.html&quot;&gt;Advanced Data Analysis from an Elementary Point of View&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  <item>
    <title>&quot;Generalization Error Bounds for Time Series&quot;</title>
    <link>http://bactra.org/weblog/902.html</link>
    <description>
&lt;P&gt;On Friday, my student &lt;a href=&quot;http://www.stat.cmu.edu/~danielmc/&quot;&gt;Daniel
McDonald&lt;/a&gt;, who I have been lucky enough to jointly advise
with &lt;a href=&quot;http://www.stat.cmu.edu/~mark/&quot;&gt;Mark
Schervish&lt;/a&gt;, &lt;a href=&quot;http://www.mcsweeneys.net/articles/faq-the-snake-fight-portion-of-your-thesis-defense&quot;&gt;defeated
the snake&lt;/a&gt; &amp;mdash; that
is, &lt;a href=&quot;http://wondermark.com/238/&quot;&gt;defended&lt;/a&gt; his thesis:

&lt;dl&gt;
&lt;dt&gt;&lt;cite&gt;Generalization Error Bounds for Time Series&lt;/cite&gt;&lt;/dt&gt;
&lt;dd&gt;In this thesis, I derive generalization error bounds &amp;mdash; bounds on the
expected inaccuracy of the predictions &amp;mdash; for time series forecasting
models.  These bounds allow forecasters to select among competing models, and
to declare that, with high probability, their chosen model will perform well
&amp;mdash; without making strong assumptions about the data generating process or
appealing to asymptotic theory.  Expanding upon results from statistical
learning theory, I demonstrate how these techniques can help time series
forecasters to choose models which behave well under uncertainty.  I also show
how to estimate the beta-mixing coefficients for dependent data so that my
results can be used empirically. I use the bound explicitly to evaluate
different predictive models for the volatility of IBM stock and for a standard
set of macroeconomic variables.  Taken together my results show how to control
the generalization error of time series models with fixed or growing
memory.&lt;/dd&gt;
&lt;dd&gt;&lt;a href=&quot;http://www.stat.cmu.edu/~danielmc/wp-content/uploads/ClassicThesis.pdf&quot;&gt;PDF&lt;/a&gt;
&lt;/dl&gt;

&lt;P&gt;I hope to have a follow-up post very soon about the substance of Daniel's
work, which is part of our &lt;a href=&quot;700.html&quot;&gt;INET grant&lt;/a&gt;, but in the
meanwhile: congratulations, Dr. McDonald!

&lt;P&gt;&lt;span class=&quot;blognotes&quot;&gt;
&lt;a href=&quot;cat_incestuous_amplification.html&quot;&gt;Kith and Kin&lt;/a&gt;;
&lt;a href=&quot;cat_enigmas_of_chance.html&quot;&gt;Enigmas of Chance&lt;/a&gt;;
&lt;a href=&quot;cat_the_dismal_science.html&quot;&gt;The Dismal Science&lt;/a&gt;
&lt;/span&gt;
</description>
  </item>
  </channel>
</rss>

