<?xml version="1.0"?>
<!-- name="generator" content="blosxom/2.0" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
  <channel>
    <title>Notebooks   </title>
    <link>http://bactra.org/notebooks</link>
    <description>Cosma's Notebooks</description>
    <language>en</language>

  <item>
    <title>Statistics with Structured Data</title>
    <link>http://bactra.org/notebooks/2009/04/10#structured-data</link>
    <description>
&lt;P&gt;A lot of &lt;a href=&quot;statistics.html&quot;&gt;statistical theory and methods&lt;/a&gt; are
developed for fairly unstructred data --- each data-point is just that, a
single unarticulated point in some space.  What to do when observations have
complicated internal structure?

&lt;P&gt;One option: Break the structures into a collection of &lt;em&gt;dependent&lt;/em&gt;
unstructured observations, i.e., each observation is a realization of a
non-trivial stochastic process.  Examples: multivariate analysis (including its
grown-up version, &lt;a href=&quot;graphical-models.html&quot;&gt;graphical models&lt;/a&gt;),
&lt;a href=&quot;time-series.html&quot;&gt;time
series&lt;/a&gt;, &lt;a href=&quot;spatial-statistics.html&quot;&gt;spatial
statistics&lt;/a&gt;, &lt;a href=&quot;network-data-analysis.html&quot;&gt;network data analysis&lt;/a&gt;.  Time
series and spatial statistics are much better developed than network statistics
not least because the dependency structures there are &lt;em&gt;much simpler&lt;/em&gt;.
Directed networks give us essentially arbitrary binary relations; hypergraphs
arbitrary relational structures.  This threatens (or promises) to bring up all sorts of issues from &lt;a href=&quot;mathematical-logic.html&quot;&gt;logic&lt;/a&gt;.

&lt;P&gt;Can one do inference on the relational structure of complex observations?
How?  (&lt;a href=&quot;grammatical-inference.html&quot;&gt;Grammatical inference&lt;/a&gt;, for
instance?  &lt;a href=&quot;community-discovery.html&quot;&gt;Community discovery&lt;a&gt;?)

&lt;P&gt;See also:
	&lt;a href=&quot;data-mining.html&quot;&gt;Data Mining&lt;/a&gt;;
	&lt;a href=&quot;learning-inference-induction.html&quot;&gt;Machine Learning, Statistical Inference and Induction&lt;/a&gt;

&lt;ul&gt;Recommended, big-picture:
	&lt;li&gt;&lt;a href=&quot;http://www.cs.umd.edu/~getoor/&quot;&gt;Lise Getoor&lt;/a&gt; and Ben Taskar (eds.), &lt;cite&gt;Introduction to Statistical Relational Learning&lt;/cite&gt; [&lt;a href=&quot;http://mitpress.mit.edu/978-0-262-07288-5&quot;&gt;Official blurb&lt;/a&gt;, Lise's &lt;a href=&quot;http://www.cs.umd.edu/srl-book/&quot;&gt;book site&lt;/a&gt; with more links]
	&lt;li&gt;Ulf Grenander, &lt;cite&gt;Elements of Pattern Theory&lt;/cite&gt;
	&lt;/ul&gt;

&lt;ul&gt;Recommended, miscellaneous close-ups:
	&lt;li&gt;Tommi S. Jaakkola and David Haussler, &quot;Exploiting generative models
in discriminative classifiers&quot;, &lt;cite&gt;NIPS 11&lt;/cite&gt; (1998)
[&lt;a href=&quot;http://books.nips.cc/papers/files/nips11/0487.pdf&quot;&gt;PDF&lt;/a&gt;]
	&lt;li&gt;Leonid Peshkin, &quot;Structure induction by lossless graph compression&quot;,
&lt;a href=&quot;http://arxiv.org/abs/cs.DS/0703132&quot;&gt;cs.DS/0703132&lt;/a&gt; [Adapting
data-compression ideas to discover hierarchical structures in graphs, e.g., the
4 bases from a tinker-toy model of DNA.]
	&lt;/ul&gt;

&lt;ul&gt;To read:
	&lt;li&gt;Yonatan Amit, Shai Shalev-Shwartz, Yoram Singer, &quot;Online Learning
of Complex Prediction Problems Using Simultaneous
Projections&quot;, &lt;a
href=&quot;http://jmlr.csail.mit.edu/papers/v9/amit08a.html&quot;&gt;&lt;cite&gt;Journal of
Machine Learning Research&lt;/cite&gt; &lt;strong&gt;9&lt;/strong&gt; (2008): 1399--1435&lt;/a&gt;
	&lt;li&gt;G&amp;ouml;khan Bakir, Thomas Hofmann, Bernhard Sch&amp;ouml;lkopf, Alexander J. Smola, Ben Taskar and S. V. N. Vishwanathan (eds.), &lt;cite&gt;Predicting Structred Data&lt;/cite&gt; [&lt;a href=&quot;http://mitpress.mit.edu/978-0-262-02617-8&quot;&gt;blurb&lt;/a&gt;]
	&lt;li&gt;Peter J. Green, Peter, Nils Lid Hjort and Sylvia Richardson (eds.), 
&lt;cite&gt;Highly Structured Stochastic Systems&lt;/cite&gt;
	&lt;li&gt;Ulf Grenander and Michael Miller, &lt;cite&gt;Pattern Theory:
From Representation to Inference&lt;/cite&gt;
	&lt;li&gt;Fionn Murtagh, &quot;Symmetry in Data Mining and Analysis: A Unifying View based on Hierarchy&quot;, &lt;a href=&quot;http://arxiv.org/abs/0805.2744&quot;&gt;arxiv:0805.2744&lt;/a&gt;
	&lt;li&gt;Marlos A. G. Viana, &lt;cite&gt;Symmetry Studies: An Introduction
to the Analysis of Structured Data in Applications&lt;/cite&gt; [&lt;a href=&quot;http://cambridge.org/9780521841030&quot;&gt;blurb&lt;/a&gt;]
	&lt;li&gt;Haonan Wang, J. S. Marron, &quot;Object oriented data analysis: Sets of
trees&quot;, &lt;a href=&quot;http://arxiv.org/abs/0711.3147&quot;&gt;arxiv:0711.3147&lt;/a&gt; [&quot;Object
oriented data analysis is the statistical analysis of populations of complex
objects&quot;]
	&lt;/ul&gt;
</description>
  </item>
  </channel>
</rss>