<?xml version="1.0"?>
<!-- name="generator" content="blosxom/2.0" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
  <channel>
    <title>Notebooks   </title>
    <link>http://bactra.org/notebooks</link>
    <description>Cosma's Notebooks</description>
    <language>en</language>

  <item>
    <title>Statistical Inference for Markov and Hidden Markov Models</title>
    <link>http://bactra.org/notebooks/2009/10/29#inference-markov</link>
    <description>
&lt;P&gt;I am concerned here with inferring the parameters and/or the structure of
the model, not with the estimation of the hidden state (in th HMM case); that
problem falls under &lt;a href=&quot;filtering.html&quot;&gt;filtering&lt;/a&gt;.

&lt;P&gt;Parameter inference (what machine learning types would call &quot;learning&quot;)
given a known, fixed structure.  Determining the structure (&quot;discovery&quot;).
Order estimation as a particular case of discovery, and
of &lt;a href=&quot;model-selection.html&quot;&gt;model selection&lt;/a&gt;.

&lt;P&gt;See also:
	&lt;a href=&quot;markov.html&quot;&gt;Markov models&lt;/A&gt;;
	&lt;a href=&quot;statistics.html&quot;&gt;Statistics&lt;/a&gt;;
	&lt;a href=&quot;time-series.html&quot;&gt;Time Series&lt;/a&gt;;
	&lt;a href=&quot;universal-prediction.html&quot;&gt;Universal Prediction Algorithms&lt;/a&gt;


&lt;ul&gt;Recommended (big picture):
	&lt;li&gt;Patrick Billingsley, &lt;cite&gt;Statistical Inference for Markov Chains&lt;/cite&gt;
	&lt;li&gt;Andrew Fraser, &lt;cite&gt;Hidden Markov Models and Dynamical
Systems&lt;/cite&gt; [&lt;a href=&quot;../reviews/fraser-on-HMMs&quot;&gt;Review: The Statistics of Moving Shadows&lt;/a&gt;]
	&lt;li&gt;Peter Guttorp, &lt;cite&gt;Stochastic Modelling of Scientific Data&lt;/cite&gt;
	&lt;/ul&gt;

&lt;ul&gt;Recommended (close-ups):
	&lt;li&gt;David Blackwell and Lambert Koopmans, &quot;On the Identifiability
Problem for Functions of Finite Markov Chains&quot;, &lt;cite&gt;The Annals of
Mathematical Statistics&lt;/cite&gt; &lt;strong&gt;28&lt;/strong&gt; (1957): 1011--1015 [An old,
but very clear, paper on the problems presented by what we now call hidden
Markov models]
	&lt;li&gt;Peter B&amp;uuml;hlmann and Abraham J. Wyner, &quot;Variable Length Markov
Chains&quot;, &lt;cite&gt;The Annals of Statistics&lt;/cite&gt; &lt;strong&gt;27&lt;/strong&gt; (1999):
480--513 [Preprint available as &lt;a
href=&quot;http://www.stat.berkeley.edu/tech-reports/479.abstract&quot;&gt;Berkeley
statistics department technical report 479&lt;/a&gt;]
	&lt;li&gt;Olivier Capp&amp;eacute; &quot;Online EM Algorithm for Hidden Markov
Models&quot;, &lt;a href=&quot;http://arxiv.org/abs/0908.2359&quot;&gt;arxiv:0908.2359&lt;/a&gt;
	&lt;li&gt;George Cybenko and Valentino Crespi, &quot;Learning Hidden Markov Models
using Non-Negative Matrix
Factorization&quot;, &lt;a href=&quot;http://arxiv.org/abs/0809.4086&quot;&gt;arxiv:0809.4086&lt;/a&gt;
[Though it contains an error about the capacities of our CSSR algorithm]
	&lt;li&gt;Subhashis Ghosal and Yongqiang Tang, &quot;Bayesian Consistency for Markov Processes&quot;, &lt;a href=&quot;http://sankhya.isical.ac.in/search/68_2/2006010.html&quot;&gt;&lt;cite&gt;Sankhya&lt;/cite&gt; &lt;strong&gt;68&lt;/strong&gt; (2006): 227--239&lt;/a&gt;
	&lt;li&gt;Subhashis Ghosal and Aad van der Vaart, &quot;Convergence Rates of Posterior Distributions for Non-IID
Observations&quot;, &lt;cite&gt;Annals of Statistics&lt;/cite&gt; &lt;strong&gt;35&lt;/strong&gt;
(2007): 192--223 = &lt;a href=&quot;http://arxiv.org/abs/0708.0491&quot;&gt;arxiv:0708.0491&lt;/a&gt;
	&lt;li&gt;Gusztav Morvai and Benjamin Weiss
		&lt;ul&gt;
		&lt;li&gt;&quot;Estimating the Lengths of Memory
Words&quot;, &lt;a href=&quot;http://dx.doi.org/10.1109/TIT.2008.926316&quot;&gt;&lt;cite&gt;IEEE
Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;54&lt;/strong&gt; (2008):
3804--3807&lt;/a&gt;
		&lt;li&gt;&quot;On estimating the memory for finitarily Markovian processes&quot;, &lt;cite&gt;Ann. I. H. Poincar&amp;eacute;-PR&lt;/cite&gt; &lt;strong&gt;43&lt;/strong&gt; (2007): 15--30 &lt;a href=&quot;http://arxiv.org/abs/0712.0105&quot;&gt;arxiv:0712.0105&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;F. Papangelou, &quot;Large Deviations and the Bayesian Estimation of Higher-Order Markov Transition Functions &quot;, &lt;cite&gt;Journal of Applied Probability&lt;/cite&gt; &lt;strong&gt;33&lt;/strong&gt; (1996): 18--27 [&lt;a href=&quot;http://www.jstor.org/stable/3215260&quot;&gt;JSTOR&lt;/a&gt;]
	&lt;li&gt;Yuval Peres and Paul Shields, &quot;Two new Markov order estimators&quot;,
&lt;a href=&quot;http://arxiv.org/abs/math.ST/0506080&quot;&gt;math.ST/0506080&lt;/a&gt;
[&lt;em&gt;Very&lt;/em&gt; nice.]
	&lt;li&gt;Christopher C. Strelioff, James P. Crutchfield, Alfred W. Hubler,
&quot;Inferring Markov Chains: Bayesian Estimation, Model Comparison, Entropy Rate,
and Out-of-class
Modeling&quot;, &lt;a href=&quot;http://arxiv.org/abs/math.ST/0703715&quot;&gt;math.ST/0703715&lt;/a&gt;
	&lt;li&gt;Daniel R. Upper, &lt;cite&gt;Theory and Algorithms for Hidden Markov
Models and Generalized Hidden Markov Models&lt;/cite&gt; [Ph.D. thesis, math dept.,
Berkeley, 1997; &lt;a href=&quot;http://www.santafe.edu/~cmg/compmech/pubs/TAHMMGHMM.htm&quot;&gt;online&lt;/A&gt;]

	&lt;/ul&gt;

&lt;ul&gt;Not &lt;em&gt;quite&lt;/em&gt; recommended:
	&lt;li&gt;E. Racca, F. Laio, D. Poggi and L. Ridolfi, &quot;Test to determine the
Markov order of a time
series&quot;, &lt;a href=&quot;http://dx.doi.org/10.1103/PhysRevE.75.011126&quot;&gt;&lt;cite&gt;Physical
Review E&lt;/cite&gt; &lt;strong&gt;75&lt;/strong&gt; (2007): 011126&lt;/a&gt; [The test is to linearly
regress &lt;em&gt;x(t+1)&lt;/em&gt; on &lt;em&gt;x(t)&lt;/em&gt;, &lt;em&gt;x(t-1)&lt;/em&gt;, etc., out to some
finite order, and see how far back you have to go before the regression
coefficients are insignificantly different from zero.  This is not crazy as a
first cut idea, but it's not generally valid, and in fact fails for the
logistic map.]
	&lt;/ul&gt;

&lt;ul&gt;&lt;em&gt;Definitely&lt;/em&gt; not recommended:
	&lt;li&gt;S. S. Melnyk, O. V. Usatenko, V. A. Yampol'skii and V. A. Golick,
&quot;Competition between Two Kinds of Correlations in Literary Texts&quot;, &lt;a
href=&quot;http://arxiv.org/abs/physics/0402042&quot;&gt;physics/0402042&lt;/a&gt; [This has
surely got to be one of the ugliest ways of parameterizing a Markov chain I
have seen; it's a miracle they don't get probabilities greater than 1, if
indeed they don't.]
	&lt;/ul&gt;

&lt;ul&gt;Modesty forbids me to recommend:
	&lt;li&gt;CRS, &quot;Dynamics of Bayesian Updating with Dependent Data and
Misspecified Models&quot;, &lt;a href=&quot;http://arxiv.org/abs/0901.1342&quot;&gt;arxiv:0901.1342&lt;/a&gt;
	&lt;li&gt;CRS and Kristina Lisa Klinkner, &quot;Blind Construction of Optimal
Nonlinear Recursive Predictors for Discrete
Sequences&quot;, &lt;a href=&quot;http://arxiv.org/abs/cs.LG/0406011&quot;&gt;cs.LG/0406011&lt;/a&gt;
[CSSR]
	&lt;/ul&gt;


&lt;ul&gt;To read:
	&lt;li&gt;Enrique E. Alvarez, &quot;Estimation in Stationary Markov Renewal
Processes, with Application to Earthquake Forecasting in Turkey&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1007/s11009-005-6658-2&quot;&gt;&lt;cite&gt;Methodology and
Computing in Applied Probability&lt;/cite&gt; &lt;strong&gt;7&lt;/strong&gt; (2005): 119--130&lt;/a&gt;
	&lt;li&gt;Sofia Andersson and Tobias Ryd&amp;eacute;n, &quot;Subspace estimation and
prediction methods for hidden Markov
models&quot;, &lt;a href=&quot;htp://projecteuclid.org/euclid.aos/1256303539&quot;&gt;&lt;cite&gt;Annals
of Statistics&lt;/cite&gt; &lt;strong&gt;37&lt;/strong&gt; (2009): 4131--4152&lt;/a&gt;
	&lt;li&gt;Ana Arribas-Gil, Elisabeth Gassiat and Catherine Matias, &quot;Parameter
estimation in pair hidden Markov
models&quot;, &lt;a href=&quot;http://arxiv.org/abs/math.ST/0509280&quot;&gt;math.ST/0509280&lt;/a&gt;
	&lt;li&gt;Patrice Bertail and St&amp;eacute;phan Cl&amp;eacute;men&amp;ccedil;on,
&quot;Edgeworth expansions of suitably normalized sample mean statistics for atomic
Markov chains&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1007/s00440-004-0360-0&quot;&gt;&lt;citE&gt;Probability Theory and
Related Fields&lt;/cite&gt; &lt;strong&gt;130&lt;/strong&gt; (2004): 388--414&lt;/a&gt; [I need
to learn more about Edgeworth expansions anyway]
	&lt;li&gt;P. J. Bickel and Y. Ritov, &quot;Inference in Hidden Markov Models I:
Local Asymptotic Normality in the Stationary Case&quot;, UCB Statistics Technical
Report 383 [&lt;a
href=&quot;http://www.stat.berkeley.edu/tech-reports/383.abstract&quot;&gt;link&lt;/a&gt;]
	&lt;li&gt;Jose Borges and Mark Levene, &quot;Evaluating Variable Length Markov
Chain Models for Analysis of User Web Navigation Sessions&quot;, &lt;a
href=&quot;http://arxiv.org/abs/cs.AI/0606115&quot;&gt;cs.AI/0606115&lt;/a&gt;
	&lt;li&gt;J. Borwanker, G. Kallianpur and B. L. S. Prakasa Rao, &quot;The
Bernstein-von Mises Theorem for Markov Processes&quot;, &lt;cite&gt;The Annals of
Mathematical Statistics&lt;/cite&gt; &lt;strong&gt;42&lt;/strong&gt; (1971): 1241--1253
[&lt;a href=&quot;http://www.jstor.org/stable/2240025&quot;&gt;JSTOR&lt;/a&gt;]
	&lt;li&gt;Olivier Capp&amp;eacute; Eric Moulines and Tobias Ryden,
&lt;cite&gt;Inference in Hidden Markov Models&lt;/cite&gt; [This is a superb book, treating
all the main statistical problems connected with HMMs in a rigorous manner.
There is a chapter/appendix which reminds the reader who has forgetten about
Markov chains on general state spaces, but remembers measure-theoretic
probability &lt;em&gt;very well&lt;/em&gt;, about their properties.  (The idea that the
reader of a book on HMMs may not know what the Viterbi algorithm is, but will
definitely recall the Hahn-Jordan decomposition, strikes me as very much a
product of the French school of probability theory &amp;mdash; from which I have
learned much!  But it's not so far gone as to make the whole thing an exercise
in the &quot;general theory of
processes&quot;.)  Listed down here until I finish the last chapter. &lt;a
href=&quot;http://www.springeronline.com/sgw/cda/frontpage/0,11855,5-0-22-45006977-0,00.html&quot;&gt;Blurb&lt;/a&gt;]
	&lt;li&gt;Pavel Chigansky, &quot;Maximum Likelihood Estimator for Hidden Markov Models in continuous time&quot;, &lt;a href=&quot;http://arxiv.org/abs/0707.0271&quot;&gt;arxiv:0707.0271&lt;/a&gt;
	&lt;li&gt;C. C. Y. Dorea and L. C. Zhao, &quot;Nonparametric Density Estimation
in Hidden Markov Models&quot;, &lt;cite&gt;Statistical Inference for Stochastic
Processes&lt;/cite&gt; &lt;strong&gt;5&lt;/strong&gt; (2002): 55--64
	&lt;li&gt;Randal Douc, &quot;Non singularity of the asymptotic Fisher information
matrix in hidden Markov
models&quot;, &lt;a href=&quot;http://arxiv.org/abs/math.ST/0511631&quot;&gt;math.ST/0511631&lt;/a&gt;
	&lt;li&gt;P. Dupont, F. Denis and Y. Esposito, &quot;Links between probabilistic
automata and hidden Markov models: probability distributions, learning models
and induction algorithms&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1016/j.patcog.2004.03.020&quot;&gt;&lt;cite&gt;Pattern
Recognition&lt;/citE&gt; &lt;strong&gt;38&lt;/strong&gt; (2005): 1349--1371&lt;/a&gt;
	&lt;li&gt;Farzad Eskandari and Mohammad R. Meshkani, &quot;Empirical Bayes
analysis of log-linear models for a generalized finite stationary Markov
chain&quot;, &lt;cite&gt;Metrika&lt;/cite&gt; &lt;strong&gt;59&lt;/strong&gt; (2004): 173--191 [&lt;a
href=&quot;http://dx.doi.org/doi:10.1007/s001840300278&quot;&gt;abstract&lt;/a&gt;]
	&lt;li&gt;Florence Forbes and Nathalie Peyrard, &quot;Hidden Markov Random Field
Model Selection Criteria Based on Mean Field-Like Approximations&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1109/TPAMI.2003.1227985&quot;&gt;&lt;cite&gt;IEEE Transactions on
Pattern Analysis and Machine Intelligence&lt;/cite&gt; &lt;strong&gt;25&lt;/strong&gt; (2003):
1089--1101&lt;/a&gt; [&lt;a
href=&quot;http://www.inrialpes.fr/is2/people/forbes/ForbesPeyrard.ps&quot;&gt;PostScript
preprint&lt;/a&gt;]
	&lt;li&gt;Cheng-Der Fuh, &quot;Asymptotic operating characteristics of an optimal
change point detection in hidden Markov models&quot;, &lt;a
href=&quot;http://dx.doi.org/10%2E1214/009053604000000580&quot;&gt;&lt;cite&gt;Annals of
Statistics&lt;/cite&gt; &lt;strong&gt;32&lt;/strong&gt; (2004): 2305--2339&lt;/a&gt; = &lt;a
href=&quot;http://arxiv.org/abs/math.ST/0503682&quot;&gt;math.ST/0503682&lt;/a&gt;
	&lt;li&gt;Antonio Galves, Charlotte Galves, Nancy L. Garcia, Florencia Leonardi, &quot;Context tree selection and linguistic rhythm retrieval from written texts&quot;, &lt;a href=&quot;http://arxiv.org/abs/0902.3619&quot;&gt;arxiv:0902.3619&lt;/a&gt;
	&lt;li&gt;Antonio Galves, Florencia Leonardi, &quot;Exponential inequalities for empirical unbounded context trees&quot;, &lt;a href=&quot;http://arxiv.org/abs/0710.5900&quot;&gt;arxiv:0710.5900&lt;/a&gt;
	&lt;li&gt;H. Ito, S.-I. Amari and K. Kobayashi, &quot;Identifiability of Hidden
Markov Information Sources and Their Minimum Degrees of Freedom&quot;, &lt;cite&gt;IEEE
Transactions on Information Theory&lt;/cite&gt; &lt;strong&gt;38&lt;/strong&gt; (1992): 324--333
	&lt;li&gt;Hans R. K&amp;uuml;nsch, &quot;State Space and Hidden Markov Models&quot;,
pp. 109--173 in Ole E. Barndorff-Nielsen, David R. Cox and Claudia
Kl&amp;uuml;ppelberg (eds.), &lt;cite&gt;Complex Stochastic Systems&lt;/cite&gt;
	&lt;li&gt;J. Lember, A. Koloydenko, &quot;Adjusted Viterbi training for hidden Markov models&quot;, &lt;a href=&quot;http://arxiv.org/abs/0709.2317&quot;&gt;arxiv:0709.2317&lt;/a&gt;
	&lt;li&gt;Florencia Leonardi, &quot;Some upper bounds for the rate of convergence of penalized likelihood context tree estimators&quot;, &lt;a href=&quot;http://arxiv.org/abs/0701810&quot;&gt;arxiv:0701810&lt;/a&gt;
	&lt;li&gt;E. Locherbach, &quot;Likelihood Ratio Processes for Markovian Particle
Systems with Killing and Jumps&quot;, &lt;cite&gt;Statistical Inference for Stochastic
Processes&lt;/cite&gt; &lt;strong&gt;5&lt;/strong&gt; (2002): 153--177
	&lt;li&gt;Philipp Metzner, Frank Noe and Christof Schutte, &quot;Estimating
the sampling error: Distribution of transition matrices and functions of transition matrices for given trajectory data&quot;, &lt;cite&gt;Physical Review E&lt;/cite&gt;
&lt;strong&gt;80&lt;/strong&gt; (2009): 021106 [I presume they have a good reason for not
just using the delta method, and/or bootstrapping, but I'll have to read it to
see what that is]
	&lt;li&gt;G. Morvai and B. Weiss, &quot;Order Estimation of Markov Chains&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1109/TIT.2005.844093&quot;&gt;&lt;cite&gt;IEEE Transactions on
Information Theory&lt;/cite&gt; &lt;strong&gt;51&lt;/strong&gt; (2005): 1496--1497&lt;/a&gt;
	&lt;li&gt;Adam Paszkiewicz, &quot;When transition count for a Markov chains is a
complete sufficient statistic&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1016/j.spl.2005.09.010&quot;&gt;&lt;cite&gt;Statistics and
Probability Letters&lt;/cite&gt; &lt;strong&gt;76&lt;/strong&gt; (2006): 757--763&lt;/a&gt; [When the
initial and final states are fixed, for example]
	&lt;li&gt;Spiridon Penev, Hanxiang Peng, Atnon Schick and Wolfgang
Wefelmeyer, &quot;Efficient estimators for functionals of Markov chains with
parametric marginals&quot;, &lt;cite&gt;Statistics and Probability Letters&lt;/cite&gt;
&lt;strong&gt;66&lt;/strong&gt; (2004): 335--345
	&lt;li&gt;Amr Sadek and Nikolaos Limnios, &quot;Nonparametric estimation of
reliability and survival function for continuous-time finite Markov
processes&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1016/j.jspi.2004.03.010&quot;&gt;&lt;cite&gt;Journal of
Statistical Planning and Inference&lt;/cite&gt; &lt;strong&gt;133&lt;/strong&gt; (2005):
1--21&lt;/a&gt;
	&lt;li&gt;Anton Schick and Wolfgang Wefelmeyer, &quot;Estimating Joint
Distributions of Markov Chains&quot;, &lt;cite&gt;Statistical Inference for Stochastic
Processes&lt;/citE&gt; &lt;strong&gt;5&lt;/strong&gt; (2002): 1--22
	&lt;li&gt;Iuliana Teodorescu, &quot;Maximum Likelihood Estimation for Markov Chains&quot;, &lt;a href=&quot;http://arxiv.org/abs/0905.4131&quot;&gt;arxiv:0905.4131&lt;/a&gt;
	&lt;li&gt;M. J. van der Heyden et al., &quot;Testing the Order of Discrete Markov
Chains Using Surrogate Data&quot;, &lt;cite&gt;Physica D&lt;/cite&gt; &lt;strong&gt;117&lt;/strong&gt;
(1998): 299--313
	&lt;li&gt;Ramon van Handel, &quot;On the minimal penalty for Markov
order estimation&quot;, &lt;a href=&quot;http://arxiv.org/abs/0908.3666&quot;&gt;arxiv:0908.3666&lt;/a&gt;
	&lt;li&gt;Martin J. Wainwright, &quot;Inconsistent parameter estimation in Markov
random fields: Benefits in the computation-limited
setting&quot;, &lt;a href=&quot;http://arxiv.org/abs/cs.LG/0602092&quot;&gt;cs.LG/0602092&lt;/a&gt;
	&lt;li&gt;L. C. Zhao, C. C. Y. Dorea and C. R. Gon&amp;ccedil;alves, &quot;On
Determination of the Order of a Markov Chain&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1023/A:1012245821183&quot;&gt;&lt;cite&gt;Statistical Inference
for Stochastic Processes&lt;/cite&gt; &lt;strong&gt;4&lt;/strong&gt; (2001): 273--282&lt;/a&gt;
	&lt;/ul&gt;
</description>
  </item>
  </channel>
</rss>