Abstract
While efforts to document endangered languages have steadily increased, the phonetic analysis of endangered language data remains a challenge. The transcription of large documentation corpora is, by itself, a tremendous feat. Yet, the process of segmentation remains a bottleneck for research with data of this kind. This paper examines whether a speech processing tool, forced alignment, can facilitate the segmentation task for small data sets, even when the target language differs from the training language. The authors also examined whether a phone set with contextualization outperforms a more general one. The accuracy of two forced aligners trained on English (hmalign and p2fa) was assessed using corpus data from Yoloxóchitl Mixtec. Overall, agreement performance was relatively good, with accuracy at 70.9% within 30 ms for hmalign and 65.7% within 30 ms for p2fa. Segmental and tonal categories influenced accuracy as well. For instance, additional stop allophones in hmalign's phone set aided alignment accuracy. Agreement differences between aligners also corresponded closely with the types of data on which the aligners were trained. Overall, using existing alignment systems was found to have potential for making phonetic analysis of small corpora more efficient, with more allophonic phone sets providing better agreement than general ones.
Original language | English |
---|---|
Pages (from-to) | 2235-2246 |
Number of pages | 12 |
Journal | Journal of the Acoustical Society of America |
Volume | 134 |
Issue number | 3 |
DOIs | |
Publication status | Published - 2013 Sept |
Externally published | Yes |
Bibliographical note
Funding Information:The YM corpus was elicited by Castillo García, Amith, and DiCanio with support from Hans Rausing Endangered Language Programme Grant No. MDP0201 and NSF Grant No. 0966462. The authors would like to thank Leandro DiDomenico for his help with transcription labeling. This work was supported by NSF Grant No. 0966411 to Haskins Laboratories. The first two authors listed contributed equally to the current manuscript.
ASJC Scopus subject areas
- Arts and Humanities (miscellaneous)
- Acoustics and Ultrasonics