Integrating Language Documentation and Computational Tools for Yupik, an Alaska Native Language

htabs

Overview

Project Abstract

Project Team PhotoOne locus of crosslinguistic variation in how languages build words is whether meaning is encoded in free morphemes (units of meaning) that stand alone as words, or whether those morphemes must combine with other morphemes to become words. While English has many free morphemes, the Alaska Native language, St. Lawrence Island/Siberian Yupik, uses the second strategy with very complex words, often sentence-sized. These properties are known as agglutination and polysynthesis. Researchers will document critical structures in the language, digitize existing Yupik materials, and build computational tools to help the community and other researchers. The data from Yupik are extremely important to language science, since many of the phenomena displayed in the language are rare and not well understood. Creating computational tools for languages with very complex words, like Yupik, is of additional benefit to computer scientists and language scientists in that it helps researchers improve computational tools for languages like English. The Native American Languages Act, passed by the US Congress in 1990, enacted into policy the recognition of the unique status and importance of Native American languages. This project will build and improve tools like a morphological analyzer, a spellchecker, and a searchable dictionary, of value to the community in revitalizing their language. Graduate students will be trained in these methods, and researchers will hold outreach meetings with high school students in the language community to teach them important computer and coding skills that will enable them to build further tools. All data gathered will be permanently archived at the Alaska Native Language Archive.

The investigators, a collaboration of language and computer scientists from the University of Illinois at Urbana-Champaign and George Mason University, will undertake this project. It involves three interconnected parts: digitization of existing materials on and in Yupik for use by community members and researchers; recording and analyzing the speech of Yupik speakers; and working with the community to build computer tools for Yupik and teaching students how to do so. A successful computational model of Yupik linguistic phenomena has implications for unsupervised and semi-supervised methods in morphology induction and grammar induction because the types of morphophonological change are pervasive, much more so than models used in other approaches to unsupervised morphology induction. This work is likely to have important implications regarding appropriate computational modeling of polysynthetic agglutinative morphosyntax. Accessing materials at several archives, the team will scan them, and clean and process the scans so they are accessible digitally and searchable. This will create a digital corpus of Yupik materials for use by the community and for linguistic investigations into grammatical mood, tense, and aspect to better understand these complex morphosemantic constructions. The data will also improve the computational tools being developed in this project, providing the Yupik community with access to modern tools like spellcheckers, electronically searchable dictionaries, and electronic books. Finally, in its tight integration of field work and the development of computational tools for the analysis of the language, this project will serve as a model for future collaborations of this kind.

Logistics Summary

This collaboration between Schwartz (1761680, U of IL) and Schreiner (1760977, George Mason U) brings together computational methods with traditional fieldwork and language description to effectively and efficiently document critical aspects of the endangered St. Lawrence Island/Siberian Yupik (Yupik) language, while developing and improving computational tools that will aid in further documentation and analysis and support pedagogical and language revitalization efforts by the Yupik speaking community. From 2019–2021, a field team of 1–3 will conduct summer and fall fieldwork in Gambell, Savoonga, Nome, and possibly other towns or villages in Alaska where Yupik speakers live to record and analyze the speech of Yupik speakers as well as work with the community to build computer tools for Yupik and teaching students how to do so. Researchers will also develop computational tools, fieldwork analysis, and digitization of existing materials on and in Yupik for use by community members and researchers.

Season Field Site
2017 Alaska - Gambell
2017 Alaska - Nome
2018 Alaska - Gambell
2019 Alaska - Fairbanks
2019 Alaska - Gambell

Keywords

St. Lawrence Island, Alaska, Yupik, computational linguistics, language documentation, polysynthetic language, Bering Strait, phonology, morphology, syntax, morphosyntax

Project Map

Members

Principal Investigator

Photo

Lane Schwartz

University of Illinois - Urbana Champaign

Schwartz - link to SAM contact for fixing fields.

Principal Investigator

Photo

Sylvia Schreiner

George Mason University

Schreiner - link to SAM contact for fixing fields.
Resources

April 2020 Lightning Talk Video for NNA Award 1760977

A Navigating the New Arctic (NNA) project update video for award 1760977 produced for the April 2020 virtual NNA Investigators meeting.

Program: 
Navigating the New Arctic
Resource Type: 
video

April 2020 Project Update Report for NNA Award 1760977

A brief Navigating the New Arctic (NNA) project update report for NSF Award 1760977 produced for the April 2020 virtual NNA Investigators meeting.

Program: 
Navigating the New Arctic
Resource Type: 
report

Dates

1 August 2018 to 31 January 2022

Location

St. Lawrence Island, AK; Bering Strait

Main URL

Project Website

Additional URLs

Program

Award Year: 2017