Linguistics
08 Sep 2009 14:29Yet Another Inadequate Placeholder.
Things I want to learn more about: statistical language processing; pragmatics; semantics; "functional grammar".
Agent-based models of language change warrant their own notebook.
Query on the reliability of historical linguistics. A large part of historical linguistics consists of reconstructing languages which have left no written records, by means of extant or recorded descendants. The paradigm, as it were, is the reconstruction of proto-Indo-European from the recorded Indo-European languages. Accompanying such reconstructions, historical linguists also postulate regular rules for how the sounds in words in the ancestral language changed into different sounds in corresponding words in the descendant languages; similarly for other features of the language, like grammatical rules, conjugations, etc. (You could simply think of these as correspondence rules between the extant languages, without necessarily invoking an ancestor, if you liked, though the ancestor is a very natural hypothesis.) Now, obviously, I'm not competent to critique any of this, but I would like to know if the reliability of linguists at performing such reconstructions, and discovering correspondences, has ever been systematically tested. One test would be to give linguists corpora from related languages whose common ancestor is well-known, and see how well they could reconstruct that ancestor. (E.g., give them the modern Romance languages, and see how close they get to Latin.) Alternately, we could give them samples from languages which are actually unrelated, but tell them they are all connected, and see if they nonetheless come up with regular sound-change patterns and so forth. Has anyone ever done anything like these tests?
Update, 29 March 2005: John O'Neil writes to tell me that both the tests I describe above are, in fact, common exercises in graduate classes in historical and comparative linguistics! He doesn't know of any statistical studies on this kind of thing, however. Also, I am ashamed to learn that the immediate ancestor of the extant Romance languages was not, in fact, literary Latin but "proto-Romance", which had already, e.g., lost noun declensions. (Ashamed, because I should have known that.) I also should take this opportunity to stress that I am not skeptical about the reliability of mainstream historical linguistics in general, just curious if we can quantify that reliability, and about how general ideas about error and the growth of knowledge apply here.
Update, 20 September 2007: Brendan Shean points me to a very neat project on doing actual statistical inference for sound-change rules, and ultimately for linguistic phylogenetic trees. See Bouchard-Cote et al. below.
See also: Analogy and Metaphor; Cognitive Science; Collective Cognition; Grammatical Inference; Narratives; Rhetoric; Semiotics; Structuralism
- Recommended (misc., in need of subdivision):
- Steven Abney, "Statistical Methods and Linguistics," in Judith Klavans and Philip Resnik (eds.), The Balancing Act: Combining Symbolic and Statistical Approaches to Language (1996) [PDF; Abney's other papers]
- Alexandre Bouchard-Côté, Percy Liang, Thomas Griffiths, and Dan Klein, "A Probabilistic Approach to Diachronic Phonology", conference on Empirical Methods on Natural Language Processing 2007 [free PDF, slides]
- William H. Calvin and Derek Bickerton, Lingua ex Machina: Reconciling Darwin and Chomsky with the Human Brain
- Noam Chomsky
- "A Review of B. F. Skinner's Verbal Behavior," Language 35 (1959): 26--58 [online]
- Syntactic Structures
- Catherine Emmott, Narrative Comprehension: A Discourse Perspective
- John Goldsmith, review of Bruce Nevin (ed.), The Legacy of Zellig Harris, in Language 81 (2005): 719--736 [PDF. Recommended as an interesting introduction to Harris. Makes the important connection to the minimum description length principle. Thanks to Prof. Goldsmith for letting me know about his paper.]
- Randy Allen Harris, The Linguistics Wars
- Zellig Harris, Language and Information [Interesting old review by Bruce Nevin. My comments.]
- Ray Jackendoff, Foundations of Language: Brain, Meaning, Grammar, Evolution [Review by Andrew Carstairs-McCarthy in American Scientist; my review: The Object-Oriented Turn in Generative Grammar]
- LanguageLog [Group weblog on linguistics, with contributions by McWhorter and Pullum]
- Mark Liberman and Geoffrey K. Pullum, Far from the Madding Gerund: And Other Dispatches from Language Log [Mini-review]
- John McWhorter, Word on the Street
- Neil Mercer, Words and Minds: How We Use Language to Think Together
- Fernando Pereira, "Formal grammar and information theory: together again?", Philosophical Transactions of the Royal Society 358 (2000): 1239--1253 [PDF preprint; commentary from Mark Liberman]
- Stephen Pinker, The Language Instinct
- Stephen Pinker and Ray Jackendoff, "The Faculty of Language: What's Special about It?", Cognition 95 (2005): 201--236 [preprint]
- Geoffrey K. Pullum
- The Great Eskimo Vocabulary Hoax, and Other Essays
- "Ideology, Power, and Linguistic Theory" [PDF]
- Dan Sperber and Deirdre Wilson, Relevance: Cognition and Communication
- To read:
- N. Asher and A. Lascarides, Logics of Conversation ["People often mean more than they say. Grammar on its own is typically insufficient for determining the full meaning of an utterance; the assumption that the discourse is coherent or 'makes sense' has an important role to play in determining meaning as well. Logics of Conversation presents a dynamic semantic framework called Segmented Discourse Representation Theory, or SDRT, where this interaction between discourse coherence and discourse interpretation is explored in a logically precise manner. Combining ideas from dynamic semantics, commonsense reasoning and speech act theory, SDRT uses its analysis of rhetorical relations to capture intuitively compelling implicatures. It provides a computable method for constructing these logical forms and is one of the most formally precise and linguistically grounded accounts of discourse interpretation currently available."]
- R. Harald Baayen, Analyzing Linguistic Data: A Practical Introduction to Statistics Using R [blurb]
- Mark C. Baker, The Atoms of Language: The Mind's Hidden Rules of Grammar
- Derek Bickerton
- Diane Blakemore, Relevance and Linguistic Meaning: The Semantics and Pragmatics of Discourse Markers
- Andreas Blume, "A Learning-Efficiency Explanation of Structure in Language", Theory and Decision 57 (2004): 265--285
- Rens Bod, Beyond Grammar: An Experience-based theory of language [Free online]
- Rens Bod, Jennifer Hay and Stefanie Jannedy (eds.), Probabilistic Linguistics [Blurb]
- Ted Briscoe (ed.), Linguistic Evolution Through Language Acquisition: Formal and Computational Models
- Penelope Brown and Stephen C. Levinson, Politeness: Some universals in language usage
- Gennaro Chierchia, Meaning and Grammar: An Introduction to Semantics [blurb]
- Herbert H. Clark, Using Language [blurb]
- Ewa Dabrowska, Language, Mind, and Brain: Some Psychological and Neurological Constraints on Theories of Grammar
- T. Deacon
- Lukasz Debowski, "Hilberg's Law and Its Links with Guiruad's Law", cs.CL/0507022 ["Hilberg (1990) supposed that finite-order excess entropy of a random human text is proportional to the square root of the text length. Assuming that Hilberg's hypothesis is true, we derive Guiraud's law, which states that the number of word types in a text is greater than proportional to the square root of the text length. Our derivation is based on some mathematical conjecture in coding theory and on several experiments suggesting that words can be defined approximately as the nonterminals of the shortest context-free grammar for the text."]
- Peter Ford Dominey, "From Sensorimotor Sequence to Grammatical Construction: Evidence from Simulation and Neurophysiology", Adaptive Behavior 13 (2005): 347--361 [Very cool, if it's right: "... describes a functional trajectory from sensorimotor sequence learning to the learning of grammatical constructions in language. ... review of the functional neurophysiology of the cortex and basal ganglia ... as background for a neural network model of this system in sensorimotor sequence learning. Sequential behavior ... defined in terms of serial, temporal and abstract structure. The resulting neuro-computational framework ... account[s] for observed sequence learning .... framework naturally extends to grammatical constructions as form-to-meaning mappings. Predictions ... concerning parallels in language and cognitive sequence processing are tested against behavioral and neurophysiological observations in humans, resulting in a refinement of the allocation of model functions to subdivisions of Broca's area. From a functional perspective this analysis will provide insight into the relation between the coding structure in human languages, and constraints derived from the underlying neurophysiological computational mechanisms." PDF preprint]
- Umberto Eco, The Search for the Perfect Language
- N. J. Enfield, Linguistic Epidemiology: Semantics and Grammar of Language Contact in Mainland Southeast Asia
- Adele Goldberg, Constructions at Work: The Nature of Generalization in Language
- Arthur C. Graesser, Keith K. Millis and Rolf A. Zwaan, "Discourse Comprehension," Annual Review of Psychology 48 (1997) 163--89
- Maria Teresa Guasti, Language Acquisition: The Growth of Grammar [Blurb]
- Patricia Hanna and Bernard Harrison, Word and World: Practice and the Foundations of Language
- Zellig Harris
- "A Theory of Language Structure", American Philosophical Quarterly 13 (1976): 237--255 [JSTOR]
- "Grammar on Mathematical Principles", Journal of Linguistics
14 (1978): 1--20 [JSTOR] - "The Structure of Science Information", Journal of Biomedical Informatics 35 (2002): 215--221
- Arturo Hernandez, Ping Li and Brian MacWhinney, "The emergence of competing modules in bilingualism", Trends in Cognitive Sciences 9 (2005): 220--225
- Kathy Hirsh-Pasek and Roberta Michnick Golinkoff, The Origins of Grammar: Evidence from Early Language Comprehension [blurb]
- John C. L. Ingram, Neurolinguistics: An Introduction to Spoken Language Processing and its Disorders [Blurb]
- Dan Klein and Christopher D. Manning, "Natural language grammar induction with a generative constituent-context model", Pattern Recognition 38 (2005): 1407--1419 ["We present a generative probabilistic model for the unsupervised learning of hierarchical natural language syntactic structure. Unlike most previous work, we do not learn a context-free grammar, but rather induce a distributional model of constituents which explicitly relates constituent yields and their linear contexts.... [Gets the] best published unsupervised parsing results on the ATIS corpus...."]
- Chris Knight et al. (eds.), The Evolutionary Emergence of Language: Social Function and the Origins of Linguistic Form
- Paul Kroger, Analyzing Grammar: An Introduction [blurb]
- Patricia K. Kuhl, "Early Language Acquisition: Cracking the Speech Code", Nature Reviews Neuroscience 5 (2004): 831--843
- John Lawler and Helen A. Dry, Using Computers in Linguistics: A Practical Guide
- Stephen C. Levinson, Presumptive Meanings: The Theory of Generalized Conversational Implicature [Blurb]
- Margaret Masterman, Language, Cohesion and Form
- James D. McCawley, Everything that Linguists Have Always Wanted to Know about Logic --- but Were Ashamed to Ask
- Janet L. McDonald, "Language Acquisition: The Acquisition of Linguistic Structure in Normal and Special Populations", Annal Review of Psychology 48 (1997): 215--2141
- Bob McMurray, "Defusing the Childhood Vocabulary Explosion", Science 317 (2007): 631 ["During the second year of life, the rate at which children acquire new words accelerates dramatically. ... [this] is a necessary by-product of learning if (i) multiple words are learned in parallel and (ii) words are distributed such that there are few words that can be acquired quickly and many difficult ones."]
- John McWhorter, The Power of Babel
- Adilson E. Motter, Alessandro P. S. de Moura, Ying-Cheng Lai, and Partha Dasgupta, "Topology of the conceptual network of language," cond-mat/0206530 = Physical Review E 65 (2002): 065102(R)
- Salikoko S. Mufwene, The Ecology of Language Evolution [Review by Danny Yee]
- Frederick J. Newmeyer, Language Form and Language Function [blurb]
- Johanna Nichols, Linguistic Diversity in Time and Space [In the words of a correspondent: "looked at a number of features of languages throughout the world, and argued that their distribution correlates to each other and to a possible initial migration of humans around the world"]
- Partha Niyogi, The Computational Nature of Language Learning and Evolution [Blurb]
- Elinor Ochs et al. (eds.), Interaction and Grammar
- Prashant Parikh, The Use of Language ["game-theoretic account of communication, speaker meaning, and addressee interpretation, extending this analysis to conversational implicature and the Gricean maxims, illocutionary force, miscommunication, visual representation and visual implicature, and aspects of discourse." Sounds promising.]
- Stephen Pinker, Words and Rules
- Geoffrey K. Pullum and Barbara C. Scholz
- "Empirical assessment of stimulus poverty arguments", The Linguistic Review 19 (2002): 9--50
- "Contrasting applications of logic in natural language syntactic description" in Petr Hajek, Luis Valdes-Villanueva, and Dag Westerstahl (eds.), Logic, Methodology and Philosophy of Science: Proceedings of the Twelfth International Congress, pp. 481--503 [pdf]
- Geoffrey K. Pullum and James Rogers, "Animal Pattern-Learning Experiments: Some Mathematical Background" [PDF preprint]
- Friedemann Pulvermuller, The Neuroscience of Language: On Brain Circuits of Words and Serial Order [Blurb]
- Nikolaus Ritt, Selfish Sounds and Linguistic Evolution: A Darwinian Approach to Language Change
- David Rose, "A Systemic Functional Approach to Language Evolution", Cambridge Archaeological Journal 16 (2006): 73--96
- Deb Roy, "Grounding words in perception and action: computational insights", Trends in Cognitive Sciences 9 (2005): 389--396 [I heard Roy talk about his work at the "predictive knowledge" workshop at ICML 2005; it seemed very cool, but left me wanting details...]
- P. Thomas Schoenemann, "Syntax as an Emergent Characteristic of the Evolution of Semantic Complexity", Minds and Machines 9 (1999): 309--346
- Ann Senghas, Sotaro Kita, and Asli Özyürek, "Children Creating Core Properties of Language: Evidence from an Emerging Sign Language in Nicaragua", Science 305 (2004): 1779--1782
- Chung-Chieh Shan, "Linguistic Side Effects", in Chris Barker and Pauline Jacobson (eds.), Direct Compositionality [PDF. Summary of Shan's dissertation. He glosses the latter thus: "Apparently noncompositional phenomena in natural languages can be analyzed like computational side effects in programming languages: anaphora can be analyzed like state, intensionality can be analyzed like environment, quantification can be analyzed like delimited control, and so on. We thus term apparently noncompositional phenomena in natural languages linguistic side effects. We put this new, general analogy to work in linguistics as well as programming-language theory."]
- Paul Smolensky and Géraldine Legendre, The Harmonic
Mind: From Neural Computation to Optimality-Theoretic Grammar [2 volume
set. Blurb, contents]
- Yuuya Sugita and Jun Tani, "Learning Semantic Combinatoriality from the Interaction between Linguistic and Behavioral Processes", Adaptive Behavior 13 (2005): 33--52
- John Taylor, Cognitive Grammar
- Geoff Thompson, Introducing Functional Grammar
- Michael Tomasello, Constructing a Language: A Usage-Based Theory of Languagge Acquisition
- Florian Wolf and Edward Gibson, Coherence in Natural Language: Data Structures and Applications ["The biggest step forward" in discourse research "since Aristotle" --- Mark Liberman. Blurb]
