<?xml version="1.0"?>
<!-- name="generator" content="blosxom/2.0" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
  <channel>
    <title>Notebooks   </title>
    <link>http://bactra.org/notebooks</link>
    <description>Cosma's Notebooks</description>
    <language>en</language>

  <item>
    <title>Graphical Models</title>
    <link>http://bactra.org/notebooks/2010/01/04#graphical-models</link>
    <description>
&lt;P&gt;A.k.a. causal models, causal graphs, Bayes graphs, Bayes networks, Bayesian
networks.  (Here &quot;Bayes&quot; is a metonym for &quot;conditional probability&quot;.  There are
perfectly good frequentist interpretations of these models.)  I'm sticking
latent-variable and path-analysis models in here, too, because they all pretty
much work the same way.&lt;/P&gt;

&lt;P&gt;Everyone who takes basic statistics has it drilled into them that
&quot;correlation is not causation.&quot; (When I took psych. 1, the professor said he
hoped that, if he were to come to us on our death-beds and prompt us with
&quot;Correlation is,&quot; we would all respond &quot;not causation.&quot;) This is a problem,
because one can infer correlation from data, and would &lt;em&gt;like&lt;/em&gt; to be able
to make inferences about causation.  There are typically two ways out of this.
One is to perform an experiment, preferably a randomized double-blind
experiment, to eliminate accidental sources of correlation, common causes, etc.
That's nice when you can do it, but impossible with supernovae, and not even
easy with people.  The other out is to look for correlations, say that of
course they don't equal causations, and then act as if they did anyway.  The
technical names for this latter course of action are &quot;linear regression&quot; and
&quot;analysis of variance,&quot; and they form the core of applied quantitative social
science, e.g., &lt;cite&gt;The Bell Curve.&lt;/cite&gt;&lt;/P&gt;

&lt;P&gt;Graphical models are, in part, a way of escaping from this impasse.&lt;/P&gt;

&lt;P&gt;The basic idea is as follows.  You have a bunch of variables, and you want
to represent the causal relationships, or at least the probabilistic
dependencies, between them.  You do so by means of a graph.  Each node in the
graph stands for a variable.  If variable A is a cause of B, then an arrow runs
from A to B.  If A is a cause of B, we also say that A is one of B's
&lt;em&gt;parents,&lt;/em&gt; and B one of A's &lt;em&gt;children.&lt;/em&gt; If there is a causal path
from A to B, then A is an &lt;em&gt;ancestor&lt;/em&gt; of B, and B is a
&lt;em&gt;descendant&lt;/em&gt; of A.  If a variable has no parents in the graph, it is
&lt;em&gt;exogenous,&lt;/em&gt; otherwise it is &lt;em&gt;endogenous.&lt;/em&gt;&lt;/P&gt;

&lt;P&gt;Part of what we mean by &quot;cause&quot; is that, when we know the immediate causes,
the remoter causes are irrelevant --- given the parents, remoter ancestors
don't matter.  The standard example is that applying a flame to a piece of
cotton will cause it to burn, whether the flame came from a match, spark,
lighter or what-not.  Probabilistically, this is a conditional indepedence
property, or a Markov property: a variable is independent of its ancestors
conditional on its parents.  In fact, given its parents, its children, and its
childrens' other parents, a variable is conditionally independent of all other
variables.  This is called the graphical or causal Markov property.  When this
holds, we can factor the joint probability distribution for all the variables
into the product of the distribution of the exogenous variables, and the
conditional distribution for each endogenous variable given its parents.&lt;/P&gt;

&lt;P&gt;(You may be wondering what happens if A is a parent of B and B is a parent
of A, as can happen when there is feedback between the variables.  This leads
to difficulties, traditionally dealt with by explicitly limiting the discussion
to acyclic graphs.  I shall follow this wise precedent here.)&lt;/P&gt;

&lt;P&gt;Now, there are certain rules which let us infer conditional independence
relations from each other.  For instance, if X is independent of the
combination of Y and W, given Z, then X is indepdent of Y alone given Z.  So,
if we have a graph which obeys the causal Markov condition, there are generally
other conditional independence relations which follow from the basic ones.  If
these are the only conditional indepences which hold in the distribution, it is
said to be &lt;em&gt;faithful&lt;/em&gt; to the graph (or vice versa); otherwise it is
unfaithful.  For a graph to be Markov and unfaithful, there must (as it were)
be an elaborate conspiracy among the conditional distributions, so elaborate
that it will generally be destroyed by any change in any of those
distributions.  So faithfulness is a robust property.&lt;/P&gt;

&lt;P&gt;This may sound pretty arcane, but that's just because it &lt;em&gt;is&lt;/em&gt; arcane.
The point, however, is that if you can make the three assumptions above (no
causal cycles, Markov property, faithfulness), you're in business in a really
remarkable way.  There are very powerful statistical techniques that will let
you infer the causal structure connecting your variables.  This comes in two
flavors.  One is the Bayesian way: cook up a prior distribution over all
possible causal graphs; compute the likelihood of the data under each graph;
update your distribution over graphs; iterate.  This is generally
computationally intractable, assuming you can come up with a meaningful prior
in the first place.  The other approach is to use tests for conditional
independence to eliminate possible connections between variables, and so to
narrow down the range of candidate structures; it is basically frequentist, and
can be shown, under a broad range of circumstances, to be asymptotically
reliable.&lt;/P&gt;

&lt;P&gt;Once you have your causal graph --- whether through estimation or through
simply being handed one --- you can do lots of great things with it, like
predict the effects of manipulating some of the variables, or make backward
inferences from effects to causes.  Of course, if the graph is big, doing the
necessary calculations can be very troublesome in itself, and so people work on
approximation methods and even ways of doing statistical inference on models of
statistical distributions...&lt;/P&gt;

&lt;P&gt;It's probably obvious I think this is incredibly neat, and even one of the
most important ideas to come out of &lt;a
href=&quot;learning-inference-induction.html&quot;&gt;machine learning&lt;/a&gt;.  Of course it
doesn't &lt;em&gt;really&lt;/em&gt; solve the problem of establishing causal relations, in
the way &lt;a href=&quot;hume.html&quot;&gt;Hume&lt;/a&gt; objected to; it says, assuming there are
causal relations, of a certain stochastic form, and that these are stable, then
they can be learned.  But that, and the more general questions of what we ought
to mean by &quot;cause&quot;, deserve a &lt;a href=&quot;causality.html&quot;&gt;notebook of their
own&lt;/a&gt;.&lt;/P&gt;

&lt;P&gt;Things I want to understand better: frequentist inference procedures.
Computational learning theory for graphical models (the paper by Janzing and
Herrmann is good).  How to treat systems with feedback?  How to
treat &lt;a href=&quot;chaos.html&quot;&gt;dynamical systems&lt;/a&gt;
and &lt;a href=&quot;time-series.html&quot;&gt;time series&lt;/a&gt;?  How does all of this fit
together with &lt;a href=&quot;computational-mechanics.html&quot;&gt;computational
mechanics&lt;/a&gt;?&lt;/P&gt;

&lt;P&gt;&lt;em&gt;Not even a conjecture.&lt;/em&gt;  Back in the 1960s, Chow and Liu (reference
below) gave a polynomial algorithm for finding the best approximation to a
global joint probability distribution using only pairwise interactions among
the variables, i.e., the one which minimized the Kullback-Leibler divergence
between the true and the approximating distribution.  I have read that
extending this to even three-way interactions is NP, though I don't know if
it's NP-complete.  (1) How is the intractability result established?  (2) Is
this the same as the computational phase transition one finds in going from
2-SAT to 3-SAT, where the critical point is at two-point-something SAT?
(Presumably the answer to (1) would shed some light on this.)  (3) Even if not,
is there an analogous phase transition, perhaps in a different universality
class? (Update in 2009, several years later: Bento and Montanari,
below, &lt;em&gt;sounds&lt;/em&gt; relevant, but I haven't read it yet.)

&lt;ul&gt;Recommended, more general:
	&lt;li&gt;Clark Glymour, &lt;cite&gt;The Mind's Arrows: Bayes Nets and Graphical
Causal Models in Psychology&lt;/cite&gt;
[&lt;a href=&quot;../weblog/algae-2006-07.html#glymour-arrows&quot;&gt;Mini-review&lt;/a&gt;]
	&lt;li&gt;&lt;a href=&quot;http://www.cs.berkeley.edu/~jordan/&quot;&gt;Michael Irwin
Jordan&lt;/a&gt; (ed.), &lt;cite&gt;Learning in Graphical Models&lt;/cite&gt;
	&lt;li&gt;Jordan and Sejnowski (eds.), &lt;cite&gt;Graphical Models&lt;/cite&gt; [Nice
collection of papers from &lt;cite&gt;Neural Computation&lt;/cite&gt;]
	&lt;li&gt;Judea Pearl
		&lt;ul&gt;
		&lt;li&gt;&quot;Causal Inference in Statistics: An Overview&quot;, forthcoming
in &lt;cite&gt;Statistics Surveys&lt;/cite&gt; &lt;strong&gt;3&lt;/strong&gt; (2009): 96--146
[&lt;a href=&quot;http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf&quot;&gt;PDF&lt;/a&gt;]
		&lt;li&gt;&lt;cite&gt;Causality: Models, Reasoning and
Inference&lt;/cite&gt;
		&lt;/ul&gt;
	&lt;li&gt;Peter Spirtes, Clark Glymour and Richard Scheines, &lt;cite&gt;Causation,
Prediction, and Search&lt;/cite&gt; [&lt;a href=&quot;../weblog/algae-2009-12.html#SGS&quot;&gt;Comments&lt;/a&gt;]
	&lt;/ul&gt;

&lt;ul&gt;Recommended, more specialized:
	&lt;li&gt;C. K. Chow and C. N. Liu, &quot;Approximating Discrete Probability
Distributions with Dependence Trees&quot;, &lt;cite&gt;IEEE Transactions on Information
Theory&lt;/cite&gt; &lt;strong&gt;14&lt;/strong&gt; (1968): 462--467 [An old but very nice result
on how to get the optimal approximation to a global probability distribution
using only pairwise interactions among the variables]
	&lt;li&gt;Ghahramani, &quot;Learning Dynamic Bayesian Networks,&quot; in
Giles and Gori (eds.), &lt;cite&gt;Adaptive Processing of Sequences and Data
Structures&lt;/cite&gt;
	&lt;li&gt;Ghahramani and Jordan, &quot;Factorial Hidden Markov Models,&quot;
&lt;citE&gt;Machine Learning&lt;/cite&gt; &lt;strong&gt;29&lt;/strong&gt; (1997): 245--273
	&lt;li&gt;Dominik Janzing and Daniel J. L. Herrmann, &quot;Reliable and
Efficient Inference of Bayesian Networks from Sparse Data by Statistical
Learning Theory&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cs.LG/0309015&quot;&gt;cs.LG/0309015&lt;/a&gt;
	&lt;li&gt;Lauritzen, &lt;cite&gt;Graphical Models&lt;/cite&gt; [A fairly abstract
probabilistic/mathematical-statistical treatment; I have to confess I'm only
about half-way done with it.]
	&lt;li&gt;Han Liu, John Lafferty and Larry Wasserman, &quot;The Nonparanormal:
Semiparametric Estimation of High Dimensional Undirected Graphs&quot;,
&lt;a href=&quot;http://jmlr.csail.mit.edu/papers/v10/liu09a.html&quot;&gt;&lt;cite&gt;Journal of
Machine Learning Research&lt;/cite&gt; &lt;strong&gt;10&lt;/strong&gt; (2009): 2295--2328&lt;/a&gt;
= &lt;a href=&quot;http://arxiv.org/abs/0903.0649&quot;&gt;arxiv:0903.0649&lt;/a&gt;
	&lt;li&gt;John C. Loehlin, &lt;cite&gt;Latent Variable Models: An Introduction to
Factor, Path, and Structural Analysis&lt;/cite&gt; [An intro. to old-school linear
latent-variable models, especially of the sort used by psychologists.  Good in
its own domain, but does not make enough contact with modern graphical models.]
	&lt;li&gt;Eric Mjolsness, &quot;Stochastic Process Semantics for Dynamical Grammar
Syntax: An Overview&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cs.AI/0511073&quot;&gt;cs.AI/0511073&lt;/a&gt;
	&lt;li&gt;Pawel Wocjan, Dominik Janzing, and Thomas Beth, &quot;Required
sample size for learning sparse Bayesian networks with many variables,&quot; &lt;a
href=&quot;http://arxiv.org/abs/cs.LG/0204052&quot;&gt;cs.LG/0204052&lt;/a&gt;
	&lt;/ul&gt;

&lt;ul&gt;To read:
	&lt;li&gt;Francis R. Bach and Michael I. Jordan, &quot;Learning Graphical Models
for Stationary Time Series&quot;, &lt;a
href=&quot;http://www.stat.berkeley.edu/tech-reports/650.abstract&quot;&gt;UCB Statistics
Tech. Rep. 650&lt;/a&gt;
	&lt;li&gt;Onureena Banerjee, Laurent El Ghaoui, Alexandre d'Aspremont, &quot;Model
Selection Through Sparse Maximum Likelihood
Estimation&quot;, &lt;a href=&quot;http://arxiv.org/abs/0707.0704&quot;&gt;arxiv:0707.0704&lt;/a&gt;
	&lt;li&gt;Jose Bento, Andrea Montanari, &quot;Which graphical models are difficult to learn?&quot;, &lt;a href=&quot;http://arxiv.org/abs/0910.5761&quot;&gt;arxiv:0910.5761&lt;/a&gt;
	&lt;li&gt;David Brillinger, &quot;Remarks Concerning Graphical Models for
Time Series and Point Processes,&quot; &lt;cite&gt;Revista de Econometria&lt;/cite&gt;
&lt;strong&gt;16&lt;/strong&gt; (1996): 1--23 [&lt;a
href=&quot;http://stat-www.berkeley.edu/users/brill/Papers/econometria.ps&quot;&gt;PS&lt;/a&gt;]
	&lt;li&gt;Michael Chertkov and Vladimir Y. Chernyak
		&lt;ul&gt;
		&lt;li&gt;&quot;Loop calculus in statistical physics and information
science&quot;, &lt;a href=&quot;http://dx.doi.org/10.1103/PhysRevE.73.065102&quot;&gt;&lt;cite&gt;Physical
Review E&lt;/cite&gt; &lt;strong&gt;73&lt;/strong&gt; (2006): 065102&lt;/a&gt;
= &lt;a href=&quot;http://arxiv.org/abs/cond-mat/0601487&quot;&gt;cond-mat/0601487&lt;/a&gt;
		&lt;li&gt;&quot;Loop series for discrete statistical models on graphs&quot;,
&lt;a href=&quot;http://arxiv.org/abs/cond-mat/0603189&quot;&gt;cond-mat/0603189&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;David Maxwell Chickering, &quot;Optimal Structure Identification
With Greedy Search,&quot; &lt;cite&gt;Journal of Machine Learning Research&lt;/cite&gt;
&lt;strong&gt;3&lt;/strong&gt; (2002): 507--554
	&lt;li&gt;Robert G. Cowell, A. Philip Dawid, Steffen L. Lauritzen and David
J. Spiegelhalter, &lt;cite&gt;Probabilistic Networks and Expert Systems&lt;/cite&gt;
	&lt;li&gt;David Cox and Nanny Warmuth, &lt;cite&gt;Multivariate Dependcencies: Models, Analysis, and Interpretation&lt;/cite&gt;
	&lt;li&gt;Rainer Dahlhaus, &quot;Graphical interaction models for
multivariate time series,&quot; &lt;cite&gt;Metrika&lt;/cite&gt; &lt;strong&gt;51&lt;/strong&gt;
(2000): 157--172
	&lt;li&gt;Luis M. de Campos, &quot;A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests&quot;,
&lt;a href=&quot;http://jmlr.csail.mit.edu/papers/v7/decampos06a.html&quot;&gt;&lt;cite&gt;Journal of
Machine Learning Research&lt;/cite&gt; &lt;strong&gt;7&lt;/strong&gt; (2006): 2149--2187&lt;/a&gt;
[Sounds damn cool]
	&lt;li&gt;Amir Dembo and Andrea Montanari, &quot;Gibbs Measures and Phase Transitions on Sparse Random Graphs&quot;, &lt;a href=&quot;http://arxiv.org/abs/0910.5460&quot;&gt;arxiv:0910.5460&lt;/a&gt;
	&lt;li&gt;Michael Eichler, &quot;Graphical modelling of multivariate time
series&quot;, &lt;a href=&quot;http://arxiv.org/abs/math.ST/0610654&quot;&gt;math.ST/0610654&lt;/a&gt;
	&lt;li&gt;Seif Eldawlatly, Yang Zhou, Rong Jin
and Karim G. Oweiss, &quot;On the Use of Dynamic Bayesian Networks in Reconstructing Functional Neuronal Networks from Spike Train Ensembles&quot;, &lt;a href=&quot;http://dx.doi.org/&quot;&gt;&lt;cite&gt;Neural Computation&lt;/cite&gt; &lt;strong&gt;22&lt;/strong&gt; (2010): 158--189&lt;/a&gt;
	&lt;li&gt;Gal Elidan, Iftach Nachman and Nir Friedman, &quot;'Ideal Parent'
Structure Learning for Continuous Variable Bayesian
Networks&quot;, &lt;a
href=&quot;http://jmlr.csail.mit.edu/papers/v8/elidan07a.html&quot;&gt;&lt;cite&gt;Journal of
Machine Learning Research&lt;/cite&gt; &lt;strong&gt;8&lt;/strong&gt; (2007): 1799--1833&lt;/a&gt;
	&lt;li&gt;Sergi Elizalde and Kevin Woods, &quot;Bounds on the number of inference
functions of a graphical
model&quot;, &lt;a href=&quot;http://arxiv.org/abs/math.CO/0610233&quot;&gt;math.CO/0610233&lt;/a&gt;
	&lt;li&gt;Juan Ferr&amp;aacute;ndiz, Enrique F. Castillo and Pilar Sanmartin,
&quot;Temporal aggregation in chain graph models&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1016/j.jspi.2004.03.012&quot;&gt;&lt;cite&gt;Journal of
Statistical Planning and Inference&lt;/cite&gt; &lt;strong&gt;133&lt;/strong&gt; (2005):
69--93&lt;/a&gt;
	&lt;li&gt;Freedman, &quot;On Specifying Graphical Models for Causation,&quot;
UCB Stat. Tech. Rep. 601 [&lt;a
  href=&quot;http://www.stat.berkeley.edu/tech-reports/601.abstract&quot;&gt;abstract&lt;/a&gt;, &lt;a
href=&quot;http://www.stat.berkeley.edu/~census/601.pdf&quot;&gt;pdf&lt;/a&gt;]
	&lt;li&gt;Frey, &lt;cite&gt;Graphical Models in Machine Learning and Data
Communication&lt;/cite&gt;
	&lt;li&gt;Roland Fried and Vanessa Didelez, &quot;Latent variable analysis and
partial correlation graphs for multivariate time series&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1016/j.spl.2005.04.002&quot;&gt;&lt;cite&gt;Statistics and
Probability Letters&lt;/cite&gt; &lt;strong&gt;73&lt;/strong&gt; (2005): 287--296&lt;/a&gt;
	&lt;li&gt;Cyril Furtlehner, Jean-Marc Lasgouttes, Arnaud De La Fortelle,
&quot;Belief Propagation and Bethe approximation for Traffic Prediction&quot;,
&lt;a href=&quot;http://arxiv.org/abs/physics/0703159&quot;&gt;physics/0703159&lt;/a&gt;
	&lt;li&gt;Dan Geiger, David Heckerman, Henry King, and Christopher
Meek, &quot;Stratified exponential families: Graphical models and model selection&quot;,
&lt;a href=&quot;http://dx.doi.org/10.1214/aos/1009210550&quot;&gt;&lt;cite&gt;Annals of
Statistics&lt;/cite&gt; &lt;strong&gt;29&lt;/strong&gt; (2001): 505--529&lt;/a&gt;
	&lt;li&gt;Christophe Giraud, &quot;Estimation of Gaussian graphs by model
selection&quot;, &lt;a href=&quot;http://arxiv.org/abs/0710.2044&quot;&gt;arxiv:0710.2044&lt;/a&gt;
	&lt;li&gt;Glymour and Cooper (eds.), &lt;cite&gt;Computation, Causation and
Discovery&lt;/cite&gt;
	&lt;li&gt;Jorge Goncalves and Sean Warnick, &quot;Dynamical Structure Functions
for the Estimation of LTI Networks with Limited Information&quot;, &lt;a
href=&quot;http://arxiv.org/abs/q-bio.MN/0610008&quot;&gt;q-bio.MN/0610008&lt;/a&gt;
[LTI = &quot;linear, time-invariant&quot;]
	&lt;li&gt;Green, Hjort and Richardson (eds.), &lt;Cite&gt;Highly Structured
Stochastic Systems&lt;/cite&gt;
	&lt;li&gt;Vikas Hamine and Paul Helman, &quot;Learning Optimal Augmented Bayes
Networks&quot;, &lt;a href=&quot;http://arxiv.org/abs/cs.LG/0509055&quot;&gt;cs.LG/0509055&lt;/a&gt;
	&lt;li&gt;Holger H&amp;ouml;fling and Robert Tibshirani, &quot;Estimation of Sparse
Binary Pairwise Markov Networks using
Pseudo-likelihoods&quot;, &lt;a
href=&quot;http://jmlr.csail.mit.edu/papers/v10/hoefling09a.html&quot;&gt;&lt;cite&gt;Journal of
Machine Learning Research&lt;/citE&gt; &lt;strong&gt;10&lt;/strong&gt; (2009): 883--906&lt;/a&gt;
	&lt;li&gt;Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, &quot;Stochastic
Reasoning, Free Energy, and Information
Geometry&quot;, &lt;a
href=&quot;http://neco.mitpress.org/cgi/content/abstract/16/9/1779&quot;&gt;&lt;cite&gt;Neural
Computation&lt;/cite&gt; &lt;strong&gt;16&lt;/strong&gt; (2004): 1779--1810&lt;/a&gt;
	&lt;li&gt;Manfred Jaeger &amp;amp; co., &lt;a
href=&quot;http://www.cs.aau.dk/~jaeger/Primula/&quot;&gt;Primula&lt;/a&gt; [Java implementation
of a modeling language for relational Bayesian networks; released under GPL]
	&lt;li&gt;Markus Kalisch and Peter B&amp;uuml;hlmnann, &quot;Estimating
High-Dimensional Directed Acyclic Graphs with the
PC-Algorithm&quot;, &lt;a
href=&quot;http://jmlr.csail.mit.edu/papers/v8/kalisch07a.html&quot;&gt;&lt;cite&gt;Journal
of Machine Learning Research&lt;/cite&gt; &lt;strong&gt;8&lt;/strong&gt; (2007): 616--636&lt;/a&gt;
	&lt;li&gt;Nicole Kraemer, Juliane Schaefer, Anne-Laure Boulesteix,
&quot;Regularized estimation of large-sacle gene association networks using
graphical Gaussian
models&quot;, &lt;a href=&quot;http://arxiv.org/abs/0905.0603&quot;&gt;arxiv:0905.0603&lt;/a&gt;
	&lt;li&gt;Sanjiang Li, &quot;Causal models have no complete axiomatic
characterization&quot;, &lt;a href=&quot;http://arxiv.org/abs/0804.2401&quot;&gt;arxiv:0804.2401&lt;/a&gt;
	&lt;li&gt;Stephen Luttrell, &quot;Adaptive Cluster Expansion (ACE): A Hierarchical
Bayesian Network&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cs.NE/0410020&quot;&gt;cs.NE/0410020&lt;/a&gt;
	&lt;li&gt;Dhafer Malouche and Sylvie Sevestre-Ghalila, &quot;Estimating High
dimensional faithful Gaussian graphical Models :
uPC-algorithm&quot;, &lt;a href=&quot;http://arxiv.org/abs/0705.1613&quot;&gt;arxiv:0705.1613&lt;/a&gt;
	&lt;li&gt;Giovanni M. Marchetti, Nanny Wermuth, &quot;Matrix representations and independencies in directed acyclic graphs&quot;, &lt;a href=&quot;http://arxiv.org/abs/0904.0333&quot;&gt;arxiv:0904.0333&lt;/a&gt;
	&lt;li&gt;Eric Mjolsness, &quot;Labeled graph notations for graphical models&quot;, UCI
School of Information and Computer science Technical Report 04-03 [&lt;a
href=&quot;http://computableplant.ics.uci.edu/papers/graphNotationsTR.pdf&quot;&gt;PDF&lt;/a&gt;]
	&lt;li&gt;Jennifer Neville and David Jensen, &quot;Relational Dependency Networks&quot;,
&lt;a
href=&quot;http://jmlr.csail.mit.edu/papers/volume8/neville07a/neville07a.pdf&quot;&gt;&lt;cite&gt;Journal
of Machine Learning Research&lt;/cite&gt; &lt;strong&gt;8&lt;/strong&gt; (2007): 653--692&lt;/a&gt;
	&lt;li&gt;Lior Pachter and Bernd Sturmfels
		&lt;ul&gt;
		&lt;li&gt;&quot;Tropical Geometry of Statistical Models&quot;, &lt;a
href=&quot;http://arxiv.org/abs/q-bio.QM/0311009&quot;&gt;q-bio.QM/0311009&lt;/a&gt;
		&lt;li&gt;&quot;Parametric Inference for Biological Sequence Analysis&quot;, &lt;a
href=&quot;http://arxiv.org/abs/q-bio.GN/0401033&quot;&gt;q-bio.GN/0401033&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;Alessandro Pelizzola, &quot;Cluster variation method in statistical
physics and probabilistic graphical models&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1088/0305-4470/38/33/R01&quot;&gt;&lt;cite&gt;Journal of Physics
A: Mathematical and General&lt;/cite&gt; &lt;strong&gt;38&lt;/strong&gt; (2005): R309--R339&lt;/a&gt;
= &lt;a href=&quot;http://arxiv.org/abs/cond-mat/0508216&quot;&gt;cond-mat/0508216&lt;/a&gt;
	&lt;li&gt;Tapani Raiko, Harri Valpola, Markus Harva and Juha Karhunen,
&quot;Building Blocks for Variational Bayesian Learning of Latent Variable
Models&quot;, &lt;a
href=&quot;http://jmlr.csail.mit.edu/papers/volume8/raiko07a/raiko07a.pdf&quot;&gt;&lt;cite&gt;Journal
of Machine Learning Research&lt;/cite&gt; &lt;strong&gt;8&lt;/strong&gt; (2007): 155--201&lt;/a&gt;
	&lt;li&gt;Pradeep Ravikumar, Martin J. Wainwright, John D. Lafferty,
&quot;High-Dimensional Graphical Model Selection Using $\ell_1$-Regularized Logistic
Regression&quot;, &lt;a href=&quot;http://arxiv.org/abs/0804.4202&quot;&gt;arxiv:0804.4202&lt;/a&gt;
	&lt;li&gt;Marco Reale, &lt;cite&gt;A Graphical Modelling Approach to Time
Series,&lt;/cite&gt; Ph.D. thesis, Lancaster University, 1998 [&lt;a
href=&quot;http://www.math.canterbury.ac.nz/~mathmcr/research.html&quot;&gt;Reale's
website&lt;/a&gt;]
	&lt;li&gt;Marco Reale and Granville Tunnicliffe Wilson
		&lt;ul&gt;
		&lt;li&gt;&quot;Identification of vector AR models with recursive
structural errors using conditional independence graphs&quot;
		&lt;li&gt;&quot;The Sampling Properties of Conditional Independence Graphs
for Structural Vector Autoregressions&quot;
		  &lt;/ul&gt;
	&lt;li&gt;T. Rizzo, B. Wemmenhove, H.J. Kappen, &quot;On Cavity Approximations for
Graphical
Models&quot;, &lt;a href=&quot;http://arxiv.org/abs/cond-mat/0608312&quot;&gt;cond-mat/0608312&lt;/a&gt;
	&lt;li&gt;Philipp R&amp;uuml;timann and Peter B&amp;uuml;hlmann, &quot;High
dimensional sparse covariance estimation via directed acyclic graphs&quot;,
&lt;a href=&quot;http://arxiv.org/abs/0911.2375&quot;&gt;arxiv:0911.2375&lt;/a&gt; = &lt;a href=&quot;http://projecteuclid.org/euclid.ejs/1259677088&quot;&gt;&lt;cite&gt;Electronic
Journal of Statistics&lt;/cite&gt; &lt;strong&gt;3&lt;/strong&gt; (2009): 1133--1160&lt;/a&gt;
	&lt;li&gt;Marco Scutari, &quot;Learning Bayesian Networks with the bnlearn
Package&quot;, &lt;a href=&quot;http://arxiv.org/abs/0908.3817&quot;&gt;arxiv:0908.3817&lt;/a&gt;
	&lt;li&gt;Bill Shipley, &lt;cite&gt;Cause and Correlation in Biology: A User's
Guide to Path Analysis, Structural Equations and Causal Inference&lt;/cite&gt;
	&lt;li&gt;Tomi Silander, Teemu Roos, Petri Myllymaki, &quot;Locally Minimax Optimal Predictive Modeling with Bayesian Networks&quot;, &lt;cite&gt;Journal of Machine
Learning Research Workshop and Conference Proceedings&lt;/cite&gt; &lt;strong&gt;5&lt;/strong&gt; (AISTATS 2009): 504--511&lt;/a&gt;
	&lt;li&gt;Milan Studeny, &lt;cite&gt;Probabilistic Conditional Independence
Structures&lt;/cite&gt; [&quot;The main topic of the monograph is a non-graphical
algebraic method for describing probabilistic CI structures. However, one of
the first two chapters in the book recalls and gathers basic mathematical tools
for study of probabilistic conditional independence (CI) and the other one is a
sketchy overview of recent advanced graphical approaches to the desciption of
CI structures. The next four chapters develop the non-graphical method. The
last standard chapter is an attempt to apply the method in practice: it is
devoted to learning Bayesian nets and it is more mathematical (and
'philosophical') revision of some methods for learning Bayesian networks. The
main aim of that chapter is to indicate that an algebraic approach can also be
applied in this area.&quot;]
	&lt;li&gt;Charles Sutton, Andrew McCallum and Khashayar Rohanimanesh,
&quot;Dynamic Conditional Random Fields: Factorized Probabilistic Models for
Labeling and Segmenting Sequence Data&quot;, &lt;a
href=&quot;http://jmlr.csail.mit.edu/papers/volume8/sutton07a/sutton07a.pdf&quot;&gt;&lt;cite&gt;Journal
of Machine Learning Research&lt;/cite&gt; &lt;strong&gt;8&lt;/strong&gt; (2007): 693--723&lt;/a&gt;
	&lt;li&gt;Vincent Y. F. Tan, Animashree Anandkumar, Lang Tong and Alan
S. Willsky, &quot;A Large-Deviation Analysis of the Maximum-Likelihood Learning of
Markov Tree
Structures&quot;, &lt;a href=&quot;http://arxiv.org/abs/0905.0940&quot;&gt;arxiv:0905.0940&lt;/a&gt;
[Large deviations for Chow-Liu trees]
	&lt;li&gt;Robert E. Tillman, Arthur Gretton and Peter Spirtes,
&quot;Nonlinear directed acyclic structure learning with weakly additive
noise models&quot; [Thanks to Prof. Spirtes for a preprint]
	&lt;li&gt;Achim Tresch, Florian Markowetz, &quot;Structure Learning in Nested
Effects Models&quot;, &lt;a href=&quot;http://arxiv.org/abs/0710.4481&quot;&gt;0710.4481&lt;/a&gt;
	&lt;li&gt;M. J. Wainwright and M. I. Jordan, &quot;Graphical models, exponential
families, &amp; variational inference&quot;, &lt;a
href=&quot;http://www.stat.berkeley.edu/tech-reports/649.abstract&quot;&gt;UCB Statistics
Tech. Rep. 649&lt;/a&gt;
	&lt;li&gt;Xianchao Xie, Zhi Geng, &quot;A Recursive Method for Structural Learning
of Directed Acyclic Graphs&quot;, &lt;a
href=&quot;http://jmlr.csail.mit.edu/papers/v9/geng08a.html&quot;&gt;&lt;cite&gt;Journal of
Machine Learning Research&lt;/cite&gt; &lt;strong&gt;9&lt;/strong&gt; (2008): 459--483&lt;/a&gt;
	&lt;li&gt;Jonathan S. Yedidia, William T. Freeman and Yair Weiss,
&quot;Understanding Belief Propagation and its Generalizations&quot;, &lt;a
href=&quot;http://www.merl.com/publications/TR2001-022/&quot;&gt;Mitsubshi Electric Research
Laboratories Tech. Rep. 2001-22&lt;/a&gt;
	&lt;li&gt;Marco Zaffalon and Marcus Hutter, &quot;Robust Inference of Trees&quot;,
&lt;a href=&quot;http://arxiv.org/abs/cs.LG/0511087&quot;&gt;cs.LG/0511087&lt;/a&gt;
	&lt;/ul&gt;

&lt;P&gt;(Thanks to Gustavo Lacerda for pointing out a goof.)
</description>
  </item>
  </channel>
</rss>