Building rich theory from sparse data: two case studies from Tōhoku Japanese
In the last few decades, phonological research has increasingly used probabilistic grammars to not only yield analytical insights into linguistic patterns, but also to confront questions of learnability, language acquisition, and language processing. This approach can provide valuable new evidence to adjudicate between existing theoretical proposals, and spur new ideas. However, this process typically uses large quantitative datasets that are only available for high-resource languages and dialects, putting the field in jeopardy of arriving at skewed conclusions about universals of linguistic patterning or cognition. In this talk, I present a pair of case studies from the endangered Tōhoku dialect of Japanese on how theoretically-driven computational modeling combined with careful experimental design can work to redress this imbalance, allowing sparse data to inform contemporary phonological theory.
First, I present data from a dense sampling experimental paradigm to quantitatively probe a case of optional paradigm uniformity through velar nasalization (/g/ --> [ŋ]). I find that the variable process is influenced both by lexical characteristics (here, frequency) and phonological markedness, and model the data using the Voting Theory of Bases, a theory originally proposed in Breiss (2024) to account for a different kind of output-oriented phonological phenomenon, lexical conservatism. The success of the model suggests that paradigm uniformity and lexical conservatism are two special cases of a general theory of how a psycholinguistically-dynamic lexicon interacts with a probabilistic grammar. I then present data on the how the frequency-conditioning of velar nasalization renders a classical “rule feeding” process between nasalization and Rendaku (a process of compound obstruent voicing) opaque and saltatory. Saltatory alternations have been argued to be difficult to learn and diachronically unstable, and I provide converging evidence that although speakers accurately represent the saltatory patterns in real words of their language, they fail to generalize it, instead repairing to a non-opaque pattern. This finding highlights how variation in rule application can complicate apparently-simple process interactions, shedding light on inductive biases guiding learning and language change.