The Linguistica Project

Linguistica is a program designed to explore the unsupervised learning of natural language, with primary focus on morphology (word-structure). It runs under Windows, Mac OS X and Linux, and is written in C++ within the Qt development framework. Its demands on memory depend on the size of the corpus analyzed.

In the years since 2003, several groups of students have worked with me on developing versions of Linguistica code. The central work behind Linguistica is a set of algorithms for determining the morphology of a natural language with no prior knowledge of the language. As our work has developed, there have been other functionalities that were natural to include in the package.

The Linguistica group at the University of Chicago draws its membership from the Department of Linguistics and the Department of Computer Science. Our core interest is unsupervised learning of natural language structure, but this interest has taken us to work in a number of other areas, including automatically obtaining corpora through the Internet, and the discovery of structure in bioinformatic databases.