<?xml version="1.0"?>
<!-- name="generator" content="blosxom/2.0" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
  <channel>
    <title>Notebooks   </title>
    <link>http://bactra.org/notebooks</link>
    <description>Cosma's Notebooks</description>
    <language>en</language>

  <item>
    <title>Bioinformatics</title>
    <link>http://bactra.org/notebooks/2002/10/04#bioinformatics</link>
    <description>
&lt;P&gt;An ugly name.  The use of computation-intensive techniques to study
biological data, especially data generated from sequencing long macromolecules
(chromosomal DNA, proteins, etc.) or otherwise related to them.

&lt;P&gt;Now, it so happens that I've written a whole dissertation about
computation-intensive techniques for discovery patterns in sequences...
This notebook will contain more, when I have more to put in it.

&lt;P&gt;&lt;em&gt;Things to look into&lt;/em&gt;: State of the art of using hidden Markov models
(seems poor, frankly).  Using &lt;a href=&quot;gramamtical-inference.html&quot;&gt;grammatical
inference&lt;/a&gt; to characterize sequence families, regulatory motifs, etc.
Inferring metabolic or regulatory structure from large-scale expression data,
especially gene chip data.  Massaging gene chip data.  Characterizing membrane
proteins and their activity.

&lt;P&gt;I've just heard that some people are using hidden Markov models to
characterize gene-chip data at a &lt;em&gt;single&lt;/em&gt; time, using some odd mapping
of different genes into a serial order.  This seems absurd to me, but I have it
from a reliable source.  If people are really doing that, there's a much better
alternative easily available, namely using &lt;a
href=&quot;graphical-models.html&quot;&gt;graphical models&lt;/a&gt;.  Memo to self: investigate,
and if there's a niche, publish!

&lt;P&gt;See also:
	&lt;a href=&quot;ai.html&quot;&gt;Artificial intelligence&lt;/a&gt;;
	&lt;a href=&quot;biotechnology.html&quot;&gt;Biotechnology&lt;/a&gt;;
	&lt;a href=&quot;development-bio.html&quot;&gt;Developmental Biology&lt;/a&gt;;
	&lt;a href=&quot;evolution.html&quot;&gt;Evolution of Organisms&lt;a&gt;;
	&lt;a href=&quot;gene-expression-data.html&quot;&gt;Gene Expression Data Analysis&lt;/a&gt;;
	&lt;a href=&quot;learning-inference-induction.html&quot;&gt;Machine Learning,
Statistical Inference and Induction&lt;/a&gt;;
	&lt;a href=&quot;molecular-biology.html&quot;&gt;Molecular Biology&lt;/a&gt;;
	&lt;a href=&quot;signal-transduction.html&quot;&gt;Signal Transduction, Control of
Metabolism, and Gene Regulation&lt;/a&gt;

&lt;ul&gt;Recommended:
	&lt;li&gt;Baldi and Brunak, &lt;cite&gt;Bioinformatics: The Machine Learning
Approach&lt;/cite&gt;
	&lt;li&gt;Sandrine Dudoit, Yee Hwa Yang, Matthew J. Callow and Terence P.
Speed, &quot;Statistical Methods for Identifying Differentially Expressed Genes in
Replicated cDNA Microarray Experiments,&quot; &lt;a
href=&quot;http://www.stat.berkeley.edu/tech-reports/&quot;&gt;UCB Statistics Technical
Report&lt;/a&gt; 578 [&lt;a
href=&quot;http://www.stat.berkeley.edu/tech-reports/578.abstract&quot;&gt;Abstract&lt;/a&gt;]
	&lt;li&gt;Neal S. Holter, Madhusmita Mitra, Amos Maritan, Marek Cieplak,
Jayanth R. Banavar and Nina V. Fedoroff, &quot;Fundamental Patterns Underlying Gene
Expression Profiles: Simplicity from Complexity,&quot; &lt;cite&gt;PNAS&lt;/cite&gt;
&lt;strong&gt;97&lt;/strong&gt;: 8409--8414
	&lt;/ul&gt;

	&lt;ul&gt;To read:
	&lt;li&gt;Sven Bergmann, Jan Ihmels and Naama Barkai, &quot;Iterative signature algorithm for the analysis of large-scale gene expression data,&quot; &lt;cite&gt;Physical Review E&lt;/citE&gt; &lt;strong&gt;67&lt;/strong&gt; (2003): 031902
	&lt;li&gt;Bower and Bolouri (eds.), &lt;cite&gt;Computational Modeling of Genetic
and Biochemical Networks&lt;/cite&gt;
	&lt;li&gt;A. J. Butte and I. S. Kohane, &quot;Mutual Information Relevance
Networks: Functional Genomic Clustering Using Pairwise Entropy Measurements&quot;
[&lt;a href=&quot;www.smi.stanford.edu/projects/helix/psb00/butte.pdf&quot;&gt;online&lt;/a&gt;]
	&lt;li&gt;M. Caselle, F. Di Cunto, M. Pellegrino and P. Provero, &quot;Finding
regulatory sites from statistical analysis of nucleotide frequencies in the
upstream region of eukaryotic genes,&quot; &lt;a
href=&quot;http://arxiv.org/abs/physics/0201033&quot;&gt;physics/0201033&lt;/a&gt;
	&lt;li&gt;Josh M. Deutsch, &quot;Algorithm for Finding Optimal Gene Sets in
Microarray Prediction,&quot; &lt;a
href=&quot;http://arxiv.org/abs/physics/0108011&quot;&gt;physics/0108011&lt;/a&gt;
	&lt;li&gt;Eytan Domany, &quot;Cluster Analysis of Gene Expression Data,&quot; &lt;a
href=&quot;http://arxiv.org/abs/physics/0206056&quot;&gt;physics/0206056&lt;/a&gt;
	&lt;li&gt;R. Durbin, S. Eddy, A. Krogh and G. Mitchison, &lt;cite&gt;Biological
Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids&lt;/cite&gt;
	&lt;li&gt;Richard Durrett, &lt;cite&gt;Probability Models for DNA Sequence
Evolution&lt;/cite&gt;
	&lt;li&gt;Warren Ewens and Gregory Grant, &lt;cite&gt;Statistical Methods in
Bioinformatics: An Introduction&lt;/cite&gt;
	&lt;li&gt; Luca Ferraro, Andrea Giansanti, Giovanni Giuliano and Vittorio
Rosato, &quot;Co-expression of statistically over-represented peptides in proteomes:
a key to phylogeny?&quot;, &lt;a
href=&quot;http://arxiv.org/abs/q-bio.MN/0410011&quot;&gt;q-bio.MN/0410011&lt;/a&gt;
	&lt;li&gt;Gad Getz, Hilah Gal, Itai Kela, Eytan Domany and Dan A. Notterman,
&quot;Coupled Two-Way Clustering Analysis of Breast Cancer and Colon Cancer Gene
Expression Data,&quot; &lt;a
href=&quot;http://arxiv.org/abs/physics/0206060&quot;&gt;physics/0206060&lt;/a&gt;
	&lt;li&gt;Gad Getz, Michele Vendruscolo, David Sachs and Eytan Domany,
&quot;Automated assignment of SCOP and CATH protein structure classification from
FSSP scores,&quot; &lt;a
href=&quot;http://arxiv.org/abs/cond-mat/0102280&quot;&gt;cond-mat/0102280&lt;/a&gt;
	&lt;li&gt;Alexander N. Gorban's &lt;a href=&quot;http://mystic.math.neu.edu/gorban/&quot;&gt;Home Page at Northeastern University&lt;/a&gt;
	&lt;li&gt;Alexander N. Gorban, Andrey Yu. Zinovyev and Tatyana G. Popova,
&quot;Self-organizing Approach for Automated Gene Identification in Whole
Genomes,&quot; &lt;a href=&quot;http://arxiv.org/abs/physics/0108016&quot;&gt;physics/0108016&lt;/a&gt;
[Fuller version online &lt;a
href=&quot;http://mystic.math.neu.edu/gorban/geneid.pdf&quot;&gt;here or &lt;a
href=&quot;http://linkage.rockefeller.edu/wli/gene/gorban_pre.pdf&quot;&gt;here&lt;/a&gt;
	&lt;li&gt;Dan Gusfield, &lt;cite&gt;Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology&lt;/cite&gt;
	&lt;li&gt;Alexander K. Hartmann, &quot;Sampling rare events: statistics of local
sequence alignments,&quot; &lt;a
href=&quot;http://arxiv.org/abs/cond-mat/0108201&quot;&gt;cond-mat/0108201&lt;/a&gt;
	&lt;li&gt;Lenwood S. Heath, Naren Ramakrishnan, Ronald R. Sederoff, Ross
W. Whetten, Boris I. Chevone, Craig A. Struble, Vincent Y. Jouenne, Dawei Chen,
Leonel van Zyl and Ruth G. Alscher, &quot;The Expresso Microarray Experiment
Management System: The Functional Genomics of Stress Responses in Loblolly
Pine,&quot; &lt;a href=&quot;http://arxiv.org/abs/cs.OH/0110047&quot;&gt;cs.OH/0110047&lt;/a&gt;
	&lt;li&gt;Trinh Xuan Hoang, Marek Cieplak, Jayanth R. Banavar and Amos
Maritan, &quot;Prediction of Protein Secondary Structures From Conformational
Biases,&quot; &lt;a href=&quot;http://arxiv.org/abs/cond-mat/0201311&quot;&gt;cond-mat/0201311&lt;/a&gt;
	&lt;li&gt;Rui Hu and Bin Wang, &quot;Statistically Significant Strings are
Related to Regulatory Elements in the Promoter Regions of &lt;em&gt;Saccharomyces
cervisiae&lt;/em&gt;,&quot; &lt;a
href=&quot;http://arxiv.org/abs/physics/0009002&quot;&gt;physics/0009002&lt;/a&gt;
	&lt;li&gt;Thomas B. Kepler, Lynn Crosby and Kevin T. Morgan, &quot;Normalization
and Analysis of DNA Microarray Data by Self-Consistency and Local Regression,&quot;
SFI Working Paper 00-09-055
	&lt;li&gt;Cyril Laboulais, Mohammed Ouali, Marc Le Bret and Jacques
Gabarro-Arpa, &quot;Hamming distance geometry of a protein conformational
space. Application to the clustering of a 4 ns molecular dynamics trajectory of
the HIV-1 integrase catalytic core,&quot; &lt;a
href=&quot;http://arxiv.org/abs/physics/0110067&quot;&gt;physics/0110067&lt;/a&gt;
	&lt;li&gt;Ming Li, Xin Li, Bin Ma, Paul Vitanyi, &quot;Normalized Information
Distance and Whole Mitochondrial Genome Phylogeny Analysis,&quot; &lt;a
href=&quot;http://arxiv.org/abs/cs.CC/0111054&quot;&gt;cs.CC/0111054&lt;/a&gt; [Pardon me if I
don't exactly swoon over apporximations to intrinsically uncomputable distance
measures]
	&lt;li&gt;Wentian Li
		&lt;ul&gt;
		&lt;li&gt;&quot;DNA Segmentation as A Model Selection Process,&quot; &lt;a
href=&quot;http://arxiv.org/abs/physics/0104027&quot;&gt;physics/0104027&lt;/a&gt;
		&lt;li&gt;&quot;New stopping criteria for segmenting DNA sequences,&quot; &lt;a
href=&quot;http://arxiv.org/abs/physics/0104026&quot;&gt;physics/0104026&lt;/a&gt;
		&lt;li&gt;&quot; Zipf's Law in Importance of Genes for Cancer
Classification Using Microarray Data,&quot; &lt;a
href=&quot;http://arxiv.org/abs/physics/0104028&quot;&gt;physics/0104028&lt;/a&gt;
		&lt;/ul&gt;
	&lt;li&gt;Wentian Li, Fengzhu Sun and Ivo Grosse, &quot;Extreme Value Distribution
Based Gene Selection Criteria for Discriminant Microarray Data Analysis Using
Logistic Regression&quot;, &lt;a
href=&quot;http://arxiv.org/abs/q-bio.QM/0403038&quot;&gt;q-bio.QM/0403038&lt;/a&gt;
	&lt;li&gt;Wentian Li and Yaning Yang, &quot;How Many Genes Are Needed for a
Discriminant Microarray Data Analysis?&quot; &lt;a
href=&quot;http://arxiv.org/abs/physics/0104029&quot;&gt;physics/0104029&lt;/a&gt;
	&lt;li&gt;Christopher Loose, Kyle Jensen, Isidore Rigoutsos and Gregory
Stephanopoulos, &quot;A linguistic model for the rational design of antimicrobial
peptides&quot;, &lt;a href=&quot;http://dx.doi.org/10.1038/nature05233&quot;&gt;&lt;cite&gt;Nature&lt;/cite&gt;
&lt;strong&gt;443&lt;/strong&gt; (2006): 867--869&lt;/a&gt;
	&lt;li&gt;Felix Naef, Daniel A. Lim, Nila Patil and Marcelo O. Magnasco,
&quot;From Features to Expression: High-Density Oligonucleotide Array Analysis
Revisited,&quot; &lt;a href=&quot;http://arxiv.org/abs/physics/0102010&quot;&gt;physics/0102010&lt;/a&gt;
	&lt;li&gt;Felix Naef, Nicholas D. Socci, and Marcelo Magnasco, &quot;Extracting
more signal at high intensities in oligonucleotide arrays,&quot; &lt;a
href=&quot;http://arxiv.org/abs/physics/0205031&quot;&gt;physics/0205031&lt;/a&gt;
	&lt;li&gt;Jerome K. Percus, &lt;cite&gt;Mathematics of Genome Analysis&lt;/cite&gt;
	&lt;li&gt;Pavel A. Pevzner, &lt;cite&gt;Computational Molecular Biology: An
Algorithmic Approach&lt;/cite&gt;
	&lt;li&gt;Y. Sakakibara, &quot;Grammatical Inference in Bioinformatics&quot;, &lt;a
href=&quot;http://dx.doi.org/10.1109/TPAMI.2005.140&quot;&gt;&lt;cite&gt;IEEE Transactions on
Pattern Analysis and Machine Intelligence&lt;/cite&gt; &lt;strong&gt;27&lt;/strong&gt; (2005):
1051--1062&lt;/a&gt;
	&lt;li&gt;David Sankoff and Joseph Kruskal (eds.), &lt;cite&gt;Time Warps, String
Edits, and Macromolecules: The Theory and Practice of Sequence
Comparison&lt;/cite&gt;
	&lt;li&gt;Federico Mattia Stefanini, &quot;Identification of Highly Informative
Molecular Profile Components Using Genetic Algorithms,&quot; SFI Working Paper
98-05-42
	&lt;li&gt;James Tisdall, &lt;citE&gt;Beginning Perl for Bioinformatics&lt;/cite&gt;
	&lt;li&gt;Erik van Nimwegen, Mihaela Zavolan, Nikolaus Rajewsky and Eric
D. Siggia, &quot;Probabilistic Clustering of Sequences: Inferring new bacterial
regulons by comparative genomics,&quot; &lt;a
href=&quot;http://arxiv.org/abs/physics/0206045&quot;&gt;physics/0206045&lt;/a&gt;
	&lt;li&gt;Jean-Philippe Vert, &quot;Kernel methods in genomics and computational
biology&quot;, &lt;a href=&quot;http://arxiv.org/abs/q-bio.QM/0510032&quot;&gt;q-bio.QM/0510032&lt;/a&gt;
	&lt;li&gt;Jean-Philippe Vert and Minoru Kanehisa, &quot;Graph-driven features
extraction from microarray data,&quot; &lt;a
href=&quot;http://arxiv.org/abs/physics/0206055&quot;&gt;physics/0206055&lt;/a&gt;
	&lt;li&gt;Jason L. T. Wang, Bruce A. Shaprio and Dennis Shasha (eds.),
&lt;cite&gt;Pattern Discovery in Biomolecular Data: Tools, Techniques, and
Applications&lt;/citE&gt;
	&lt;li&gt;Chris Wiggins and Ilya Nemenman, &quot;Process Pathway Inference via
Time Series Analysis,&quot; &lt;a
href=&quot;http://arxiv.org/abs/physics/0206031&quot;&gt;physics/0206031&lt;/a&gt;
	&lt;li&gt;Andrey Zinovyev (any relation to &lt;em&gt;that&lt;/em&gt; Zinoviev?), &lt;a href=&quot;http://www.ihes.fr/~zinovyev/&quot;&gt;Genome Visualization Tools&lt;/a&gt;
	&lt;/ul&gt;

&lt;ul&gt;To write:
	&lt;li&gt;Kristina Lisa Shalizi, CRS, Walter Fontana, &quot;Pattern Discovery in
Artificially Evolved RNA Sequences&quot;
	&lt;/ul&gt;
</description>
  </item>
  </channel>
</rss>
