Hockenmaier Colloquium

November 15
Cobb 201
University of Illinois at Urbana-Champaign
Unsupervised Grammar Induction with Combinatory Categorial Grammars

In recent years, there has been a lot of interest in the unsupervised induction of grammars from part-of-speech-tagged text, but most approaches recover only unlabeled dependencies that are difficult to interpret linguistically, and disallow crossing dependencies that arise in many languages.  I will present a simple grammar induction algorithm for Combinatory Categorial Grammar (CCG) that achieves state-of-the-art performance on a number of languages by relying on a minimal number of very general linguistic principles. CCG is a linguistically expressive formalism that pairs words with language-specific categories that capture their syntactic behavior and subcategorization information. Unlike previous work on unsupervised parsing with CCGs, our approach has no prior language-specific knowledge, and discovers all categories automatically. Additionally, unlike other approaches, our grammar remains robust when parsing longer sentences, performing as well as or better than other systems. We believe this is a natural result of using an expressive grammar formalism with an extended domain of locality. I will compare three different probability models for this grammar: two simple maximum-likelhood based models, and a non-parametric Bayesian model.