Cosma

Research

Changing my shape
I feel like an accident

I work on methods for building predictive models from data generated by stochastic processes, and applying those models to questions about neural information processing, self-organization in cellular automata, and so forth. All of this is about using tools from probability, statistics, and machine learning to understand large, complex, nonlinear dynamical systems. This is why my dissertation was in statistical physics, but I now teach in a statistics department.

My departmental homepage has a fuller explanation of what I do, and how I came to do it, in terms which (I hope) make sense to statisticians.

[Summaries]   [Talks and Presentations]   [Papers and Book Chapters]   [Lecture Notes]

Summaries

Curriculum Vitae/Resume: Lists my publications, talks given, and other professional data.

My doctoral dissertation, Causal Architecture, Complexity and Self-Organization in Time Series and Cellular Automata (2001). A unified presentation of the material from my previous papers on computational mechanics, plus several chapters of until-then unpublished work. Includes quantitative definitions of "emergence" and "self-organization", of which I am fairly fond.

Talks and Presentations

For a complete list of my talks, see my CV.

"Is the Primordial Soup Done Yet? Quantifying Self-Organization, Especially in Cellular Automata" is based on a talk of the same title I gave to the Madison Chaos and Complex Systems Seminar; also on my presentation for my oral exam (in June 1996). It had too few numbers or results and too many jokes and conjectures for a real paper. Update, 2003: see "Quantifying Self-Organization with Optimal Predictors", below, for actual results.

"Spatiotemporal Emergent Structures from Causal Architecture": slides and figures (both gzipped Postscript) for my March 2001 job-talk on this work at the University of Michigan's Center for the Study of Complex Systems. Based on joint work with Rob Haslinger and Jim Crutchfield, reported in my dissertation.

"Pattern Discovery and Networks" (PDF slides, 101k), presented 18 July 2001 at the Air Force Office of Scientific Research workshop "Infospherics: Science for Building Large-scale Global Information Systems," George Mason University, Fairfax, Virginia, 17--19 July 2001.

"Predicting Random Fields on Networks" (PDF slides, 293k), presented at Discrete Models for Complex Systems, June 2003. Includes some material not in the associated paper (below).

"Building Markov Models from Time Series", slides (PDF, 165k) for a talk at the Applied Research Laboratory, Penn State, 16 July 2003.

Papers and Book Chapters

In order of completion. Lecture notes are on my teaching page.

[1] Jim Crutchfield & CRS, "Thermodynamic Depth of Causal States: Objective Complexity via Minimal Representation", Physical Review E 59 (1999): 275--283 (PDF) = cond-mat/9808147. It explains why thermodynamic depth, a notion advanced by the late, great Heinz Pagels, while nifty in concept and certainly not just Yet Another Complexity Measure, doesn't work very well as originally proposed, and what needs to be changed to make it work, namely adding causal states. (In the words of the poet, "we subsume what we do not obliterate.")

[2] Cris Moore, Mats Nordahl, Nelson Minar & CRS, "Entropic Coulomb Forces in Ising and Potts Antiferromagnets and Ice Models," cond-mat/9902200; final published version, "Vortex Dynamics and Entropic Forces in Antiferromagnets and Antiferromagnetic Potts Models," Physical Review E 60 (1999): 5344--5351. (To be honest, I didn't write any of this, but did a large chunk of the simulation work.) Only of interest to people who care about pretty abstract models in statistical mechanics.

[3] Jim Crutchfield, Dave Feldman & CRS, "Comment on `Simple Measure for Complexity,' " chao-dyn/9907001 = Physical Review E 62 (2000): 2996--2997. Short, critical remarks on Yet Another Complexity Measure.

[4] CRS & Jim Crutchfield, "Computational Mechanics: Pattern and Prediction, Structure and Simplicity," Journal of Statistical Physics 104 (2001): 816--879 (PDF) = cond-mat/9907176. I'll let Mathematical Reviews describe this one for me:

The main concern of this article, written in a very general setting, is how to maximally compress the past of a random process still keeping all the useful information, so as to predict its future as well as if all the past were remembered. In this connection the authors introduce "causal states" into which all the space of possible past histories is cut. Within each causal state all the individual histories have one and the same conditional distribution for future observables. An epsilon-machine is defined which uses the causal states for prediction. The authors prove several theorems about them, whose intuitive meaning is natural: that causal states are maximally prescient, that they have the minimal statistical complexity among all prescient rivals, that they are unique, that epsilon-machines have the minimal entropy among all the prescient rivals and some inequalities. This voluminous article, containing 140 references, may also be used as a survey in the area of abstract theory of computational complexity of prediction.

[5] CRS & Bill Tozier, "A Simple Model of the Evolution of Simple Models of Evolution," adap-org/9910002; accepted by JWAS; rejected by Theoretical Population Biology for lack of decorum. A not entirely unserious critique of recent attempts to model evolution by physicists who don't know biology.

[6] CRS & Jim Crutchfield, "Pattern Discovery and Computational Mechanics," cs.LG/0001027. Why people who are interested in machine learning should care about what we do. Amicably rejected by the Proceedings of the 17th International Conference on Machine Learning, with remarks on the order of "interesting, but you really need to say more about how to code it up," which was fair enough, and provided some of the impetus for "An Algorithm for Pattern Discovery" and "Blind Construction" (below).

[7] CRS & Jim Crutchfield, "Information Bottlenecks, Causal States, and Statistical Relevance Bases: How to Represent Relevant Information in Memoryless Transduction," nlin.AO/0006025. Discussion of several related ways of extracting the information one variable contains about another, and using it to model the functional relationship or transducer connecting the two. Advances in Complex Systems 5 (2002): 91--95.

[8] Wim Hordijk, CRS & Jim Crutchfield, "Upper Bound on the Products of Particle Interactions in Cellular Automata," nlin.CG/0008038 = Physica D 154 (2001): 240--258. A proof of a limit on how complicated the interactions between propagating emergent structures in (one-dimensional) CAs can get. Wim is too modest to call it Hordijk's Rule, so I will.

[9] CRS and Dave Albers, "Symbolic Dynamics for Discrete Adaptive Games," cond-mat/0207407, SFI Working Paper 02-07-031. Why the hyper-stylized game-theory models which have taken over econophysics in the last few years are not complex, dynamically or statistically. (We were going to call it "no chaos and little complexity in the minority game," but settled on something more neutral.) Which isn't to say they're not worth studying; just that they need to justify themselves by what they can tell us about real(er) systems. Submitted to Physics Letters A. (Update, 2003: Revised to placate an unusually appalling referee.)

[10] CRS, Kristina Klinkner and Jim Crutchfield, "An Algorithm for Pattern Discovery in Time Series," cs.LG/0210025. (This version supersedes the SFI Working Paper one.) A statistically reliable, linear-time algorithm for inferring causal states from data. The code and documentation are available, released under the Gnu Public License. I'd recommend reading the "Blind Construction" paper first, since I think that has a clearer presentation of the algorithm and its motivation.

[11] CRS and Cris Moore, "What Is a Macrostate? Subjective Measurements and Objective Dynamics," cond-mat/0303625; also PITT-PHIL-SCI-1119 at the Phil-Sci Archive. Why thermodynamic macrostates are neither completely objective nor (as some argue) completely epistemic, but are instead causal states, in the sense of computational mechanics. Submitted to Studies in the History and Philosophy of Modern Physics.

[12] CRS and Kristina Klinkner, "Quantifying Self-Organization in Cyclic Cellular Automata," pp. 108--117 in Lutz Schimansky-Geier, Derek Abbott, Alexander Neiman and Christian Van den Broeck (eds.), Noise in Complex Systems and Stochastic Dynamics (Bellingham, Washington: SPIE, 2003), part of the proceedings of Fluctuations and Noise 2003. A preliminary report on the work that became "Quantifying Self-Organization with Optimal Predictors", below, which however has more details on the algorithm and related literature, because we were less space-constrained. nlin.AO/0507067.

[13] "Optimal Nonlinear Prediction of Random Fields on Networks," for the conference Discrete Models for Complex Systems 2003, printed in Discrete Mathematics and Theoretical Computer Science, AB(DMCS) (2003): 11--30. Available online from either the journal or arxiv.org (math.PR/0305160).

[14] "Methods and Techniques of Complex Systems Science: An Overview", chapter 1 (pp. 33--114) in Thomas S. Deisboeck and J. Yasha Kresh (eds.), Complex Systems Science in Biomedicine (NY: Springer, 2006) = nlin.AO/0307015. A summary of the tools people should use to study complex systems, covering statistical learning and data-mining, time series analysis, cellular automata, agent-based models, evaluation techniques and simulation, information theory and complexity measures, with 288 references (a personal record).

[15] CRS and Kristina Klinkner, "Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences", pp. 504--511 in Max Chickering and Joseph Halpern (eds.), Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference = cs.LG/0406011 (Arlington, Virginia: AUAI Press, 2004). An eight-page paper on CSSR, including experimental comparison to the standard heuristic of fitting hidden Markov models via the EM algorithm, and selecting among them with cross-validation. (We're better.) I think this is the clearest description yet of the algorithm, though proofs were omitted to save space.

[16] CRS, Kristina Klinkner and Rob Haslinger, "Quantifying Self-Organization with Optimal Predictors", Physical Review Letters 93 (2004): 118701 = nlin.AO/0409024. Why self-organization should be identified with increasing complexity over time, and how to measure that complexity by measuring the amount of information needed for optimal prediction. This is the experiment I said I was going to do at the end of "Is the Primordial Soup Done Yet?", after only eight years. But one of those years we spent waiting for the referees to see the light, so it doesn't count. With neat color pictures!

[17] "The Backwards Arrow of Time of the Coherently Bayesian Statistical Mechanic", cond-mat/0410063. Why we should not identify thermodynamic entropy with uncertainty (Shannon entropy) in a distribution over microstates. Doing so, in combination with the ordinary laws of motion and Bayesian probability updating, shows that entropy is non-increasing. Replacing Bayesian updating with repeated entropy maximization is bad statistics, and actually makes things worse. Submitted to Journal of Statistical Mechanics: Theory and Experiment. Comments unusually welcome, particularly if they point out basic fallacies.

[18] Michael T. Gastner, CRS and M. E. J. "Mark" Newman, "Maps and cartograms of the 2004 US presidential election results", Advances in Complex Systems, 8 (2005): 117--123 [PDF preprint, web page]. The only work I have ever done, or likely will do, which generated hate mail.

[19] Kristina Klinkner, CRS and Marcelo Camperi, "Measuring Shared Information and Coordinated Activity in Neuronal Networks", pp. 667--674 in Yair Weiss, Bernhard Schölkopf and John C. Platt (eds.), Advances in Neural Information Processing Systems 18 (MIT Press, 2006), a.k.a. NIPS 2005 = q-bio.NC/0506009. The best way to measure those things is to look at the mutual information between the causal states of the different neurons, suitably normalized: this handles nonlinear, stochastic relationships between extended patterns of behavior without any fuss or issues, and extends naturally to truly global measures of coordination, not just pairwise averages. (Plus, we can find the causal states with CSSR.) Because practice is the sole criterion of truth, we also show that this "informational coherence" works very nicely on a model of beta and gamma rhythms, where standard approaches get confused.

[20] Matthew J. Berryman, Scott W. Coussens, Yvonne Pamula, Declan Kennedy, Kurt Lushington, CRS, Andrew Allison, A. James Martin, David Saint and Derek Abbott, "Nonlinear Aspects of the EEG During Sleep in Children", pp. 40--48 in Nigel G. Stocks, Derek Abbott and Robert P. Morse (eds.), Fluctuations and Noise in Biological, Biophysical, and Biomedical Systems III (Bellingham, Washington: SPIE, 2005) = q-bio.NC/0506015.

[21] "Functionalism, Emergence, and Collective Coordinates", Behavioral and Brain Sciences 27 (2004): 635--636. This is part of the peer commentary (pp. 627--637, same issue) on Don Ross and David Spurrett's "What to Say to a Skeptical Metaphysician: A Defense Manual for Cognitive and Behavioral Scientists" (pp. 603--627), and is followed by Ross and Spurrett's reply to comments (pp. 637--647). [PDF]

[22] CRS, Rob Haslinger, Jean-Baptiste Rouquier, Kristina Klinkner and Cristopher Moore, "Automatic Filters for the Detection of Coherent Structure in Spatiotemporal Systems", nlin.CG/0508001 = Physical Review E 73 (2006): 036104. Two different filters --- one based on local perturbation, the other based on statistical forecasting complexity --- which are both able to recover the known coherent structures of various cellular automata, without using prior knowledge of the rule or the structures. See the auto-vulgarization for more, but not so much more as the paper. (To meet the arxiv's requirements, we had to replace our original figures with highly-compressed jpegs. Here is a PDF version with the original, full-resolution figures; it is, oddly enough, more than a megabyte smaller than the arxiv-generated PDF.) The source code is available, in Objective Caml.

[23] CRS, Marcelo F. Camperi and Kristina Lisa Klinkner, "Discovering Functional Communities in Dynamical Networks", q-bio.NC/0609008 = pp. 140--157 in Anna Goldenberg, Edo Airoldi, Stephen E. Fienberg, Alice Zheng, David M. Blei and Eric P. Xing (eds.), Statistical Network Analysis: Models, Issues and New Directions (New York: Springer-Verlag, forthcoming/2007), the proceedings of the ICML 2006 workshop of the same name. How the combination of informational coherence (see Klinkner et al. above) and community discovery algorithms let you identify functional modules, rather than just anatomical ones. We just look at an example from computational neuroscience here, but there is no intrinsic reason you couldn't do this for any kind of network dynamical system.

[24] "Maximum Likelihood Estimation for q-Exponential (Tsallis) Distributions", math.ST/0701854, submitted to Physical Review E. Tsallis's q-exponential distributions are heavy-tailed distributions which are related to the Pareto distributions (in fact, a special case of what some people call a "type II generalized Pareto"). They've recently become a big deal in statistical physics --- much too big, if you ask me. But if you do want to use them with real data, this is the right way to do it. The paper is accompanied by free, open-source code, written in R, implementing the maximum likelihood estimator and bootstrapped standard errors and confidence intervals.

[25] Aaron Clauset, CRS, and M. E. J. Newman, "Power-law distributions in empirical data", arxiv:0706.1062, SIAM Review 51 (2009): 661--703. What power laws are (not just sort of straight lines on log-log plots), how to estimate their parameters from data (use maximum likelihood, not linear regression), and how to tell if you have one (by actual hypothesis testing). With accompanying code in R and Matlab.

[26] CRS, "Social Media as Windows on the Social Life of the Mind", arxiv:0710.4911, to appear in the AAAI spring 2008 symposium on social information processing. A programmatic paper, pulling together some of my ideas about cultural diffusion in networks and collective cognition, and how those topics might be studied with data from user-driven Web sites.

[27] Rob Haslinger, Kristina Klinkner and CRS, "The Computational Structure of Spike Trains", Neural Computation 22 (2010): 121--157 = arxiv:1001.0036. Applying CSSR to real neural spike trains to recover their computatinal structure and statistical complexity, and detect the influence of external stimuli.

[28] CRS, "Dynamics of Bayesian Updating with Dependent Data and Misspecified Models", arxiv:0901.1342 = Electronic Journal of Statistics 3 (2009): 1039--1074. What happens when all of your models are wrong, but you use Bayesian updating to weight them anyway? Answer: a process of natural selection, in which fitness is proportional to likelihood. This actually converges on the models with the smallest relative entropy rate, if your prior distribution is smart enough to respect a sieve (but then non-parametric non-Bayesian methods work too).