Colloquium: Example-based learning and the dynamics of the lexicon

January 20, 3:30-5pm, Cobb 201
Janet B. Pierrehumbert, Northwestern University

A signature characteristic of human languages is an immense lexicon, something that is just as remarkable as the recursive formal structures of generative theory. Literate adult speakers know some 100,000 word types, including morphologically complex words. The lexicon of English, a language with a large and diverse population, contained 13,000,000 distinct word types in 2006 (as estimated from the Google ngram corpus). This means that English speakers encounter new words all the time.

This talk discusses the relationship between word learning in individuals and the dynamics of the lexicon at the level of the linguistic community. I develop the analogy between word types and biological species, taking imitative behavior in human populations as the mechanism by which word types replicate themselves over time. Using Usenet discussion communities as model systems, I develop the concept of a word niche in terms of the dissemination of a word with regard to people and to topics. It is already known that word frequency is a predictor of word fate at historical time scales (with low frequency words being more likely to be regularized by analogy or replaced). Controlling for frequency, I show that the relative extent of the word niche is a much more powerful predictor of word fate at short time scales. The presentation concludes with connections to recent typological results on population size and language complexity.