Carnegie Mellon University

An abstract image of blue sine waves on a black background

From Acoustic Signal to Morphosyntactic Analysis in One End-to-End Neural System

By Lori Levin, Graham Neubig, Shinji Watanabe, and David Mortensen

There are approximately 7,000 languages in the world today, but this number is declining precipitously. Even many languages that currently have thousands upon thousands of speakers are likely to fall out of use within a generation. For the speakers of these languages, this represents a tragic loss of cultural and linguistic heritage, which are important anchors of their social identity. Each language also carries irreplaceable data about language as a phenomenon of human behavior?the limits of its variation and the patterns in its structure and development. Linguists and language activists are currently working to quickly and comprehensively document as many languages as possible. In the unfortunate event that a language fades from use, documentation ensures that its data will remain available for future cultural or scientific analysis. This project partially automates the process of language documentation using tools from Natural Language Processing and Machine Learning. It differs from similar projects in using one integrated system to process the sounds of speech and the structure of words, instead of using two or more separate components. With the collaboration of native speaker scholars, the researchers are applying their methodology to four languages: Highland Puebla Nahuatl, Yoloxóchitl Mixtec, San Pedro Amuzgos Amuzgo, and North Slope Iñupiaq.

The proposed research will dramatically transform the landscape of automatic morphosyntactic and morphophonological analysis by introducing an end-to-end system that consumes speech as an input and produces interlinear annotations as an output. The research team proposes to build an end-to-end system, a single neural net that, with small amounts of labeled data produced by native speaker linguists, can directly convert recorded speech to analyzed text, producing four outputs: (1) surface transcription, (2) morphological segmentation of surface forms, (3) an underlying or canonical form for each morpheme, and (4) a gloss or standardized label for each morpheme. The proposed single end-to-end neural network represents the first attempt to integrate the four aforementioned tasks into a single neural network, avoiding the error-propagation problems that have plagued earlier attempts at creating a pipeline and mitigating the complexity of the technology for end-users. The researchers also propose innovative ways to incorporate linguistic knowledge into neural networks, including the use of differentiable weighted finite-state transducers, which are independently motivated by an iterative self-training architecture. This approach to iterative self training, in its own right, will represent an advance in machine learning -- a new algorithm for upweighting words and morphemes. The research also makes significant contributions to computational morphology. It includes a simple but expressive modification to existing schemes for segmentation and glossing, specifically for the representation of discontinuous morphemes. Furthermore, the proposal extends popular approaches to morphological analysis (e.g., UniMorph) by systematically addressing derivation as well as inflection. This proposal addresses glossing of reduplication and noun-incorporation, which earlier work has not.