Kalvin Chang

Gates Hillman Complex

Language Technologies Institute

Carnegie Mellon University

I am a speech researcher who aims to build support for non-standard, low-resource language varieties. I have a track record of publication in top NLP and speech conferences, with a portfolio of 7 co-first authored publications across ASR, NLP, and computational linguistics. My work in both computational historical linguistics and low-resource speech recognition uniquely positions me to pursue my research agenda, which takes the unconventional approach of applying insights from historical linguistics to boost low-resource speech recognition.

I am currently a Visiting Scholar in Shinji Watanabe and David Mortensen’s labs at Carnegie Mellon, leading two teams working on speech in-context learning for low-resource dialects and on language-universal phone recognition. I graduated with a Master’s of Language Technologies (Rank 1) and a BS in Computer Science (with University Honors) from CMU.

news

Feb 13, 2025	Accepted to the Toyota Technical Institute at Chicago’s CS PhD program, the University of Cambridge’s Engineering and Computation, Cognition, and Language PhD programs, the University of Edinburgh’s PhD in Informatics program, the University of Waterloo’s CS PhD program, and UC Berkeley’s CS PhD program.
Jan 31, 2025	Awarded a Gates Cambridge Scholarship as one of 35 / 600 US applicants.
Dec 08, 2024	Selected to attend the inaugural SDAIA Winter School on multi-modal LLMs as a Researcher to work on ASR for code-switching.
Oct 17, 2024	Presented four posters at the SANE 2024 Workshop [1] [2] [3] [4] .
Sep 10, 2024	Won Honorable Mention at the Interspeech 2024 Responsible Speech Foundation Models Special Session for “Self-supervised Speech Representations Still Struggle with African American Vernacular English” (Chang* et al., 2024).
Aug 19, 2024	Returned to CMU LTI as a Visiting Scholar in WAVLab and ChangeLingLab, advised by Professors Shinji Watanabe and David Mortensen.

selected publications

Self-supervised Speech Representations Still Struggle with African American Vernacular English

Kalvin Chang^*, Yi-Hui Chou^*, Jiatong Shi, and 4 more authors

Interspeech, 2024
Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus

Yi-Hui Chou^*, Kalvin Chang^*, Meng-Ju Wu, and 8 more authors

In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2023
Transformed Protoform Reconstruction

Young Min Kim^*, Kalvin Chang^*, Chenxuan Cui, and 1 more author

In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Jul 2023

Abs

Protoform reconstruction is the task of inferring what morphemes or words appeared like in the ancestral languages of a set of daughter languages. Meloni et al (2021) achieved the state-of-the-art on Latin protoform reconstruction with an RNN-based encoder-decoder with attention model. We update their model with the state-of-the-art seq2seq model: the Transformer. Our model outperforms their model on a suite of different metrics on two different datasets: their Romance data of 8,000 cognates spanning 5 languages and a Chinese dataset (Hou 2004) of 800+ cognates spanning 39 varieties. We also probe our model for potential phylogenetic signal contained in the model. Our code is publicly available at \urlhttps://github.com/cmu-llab/acl-2023.
Automating Sound Change Prediction for Phylogenetic Inference: A Tukanoan Case Study

Kalvin Chang^*, Nathaniel Robinson^*, Anna Cai^*, and 3 more authors

In Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change, Dec 2023

Abs

We describe a set of new methods to partially automate linguistic phylogenetic inference given (1) cognate sets with their respective protoforms and sound laws, (2) a mapping from phones to their articulatory features and (3) a typological database of sound changes.We train a neural network on these sound change data to weight articulatory distances between phones and predict intermediate sound change steps between historical protoforms and their modern descendants, replacing a linguistic expert in part of a parsimony-based phylogenetic inference algorithm. In our best experiments on Tukanoan languages, this method produces trees with a Generalized Quartet Distance of 0.12 from a tree that used expert annotations, a significant improvement over other semi-automated baselines. We discuss potential benefits and drawbacks to our neural approach and parsimony-based tree prediction. We also experiment with a minimal generalization learner for automatic sound law induction, finding it less effective than sound laws from expert annotation. Our code is publicly available.
WikiHan: A New Comparative Dataset for Chinese Languages

Kalvin Chang, Chenxuan Cui, Youngmin Kim, and 1 more author

In Proceedings of the 29th International Conference on Computational Linguistics, Oct 2022

Abs

Most comparative datasets of Chinese varieties are not digital; however, Wiktionary includes a wealth of transcriptions of words from these varieties. The usefulness of these data is limited by the fact that they use a wide range of variety-specific romanizations, making data difficult to compare. The current work collects this data into a single constituent (IPA, or International Phonetic Alphabet) and structured form (TSV) for use in comparative linguistics and Chinese NLP. At the time of writing, the dataset contains 67,943 entries across 8 varieties and Middle Chinese. The dataset is validated on a protoform reconstruction task using an encoder-decoder cross-attention architecture (Meloni et al 2021), achieving an accuracy of 54.11%, a PER (phoneme error rate) of 17.69%, and a FER (feature error rate) of 6.60%.
Phonotactic Complexity across Dialects

Ryan Soh-Eun Shim^*, Kalvin Chang^*, and David R. Mortensen

In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024

Abs

Received wisdom in linguistic typology holds that if the structure of a language becomes more complex in one dimension, it will simplify in another, building on the assumption that all languages are equally complex (Joseph and Newmeyer, 2012). We study this claim on a micro-level, using a tightly-controlled sample of Dutch dialects (across 366 collection sites) and Min dialects (across 60 sites), which enables a more fair comparison across varieties. Even at the dialect level, we find empirical evidence for a tradeoff between word length and a computational measure of phonotactic complexity from a LSTM-based phone-level language model—a result previously documented only at the language level. A generalized additive model (GAM) shows that dialects with low phonotactic complexity concentrate around the capital regions, which we hypothesize to correspond to prior hypotheses that language varieties of greater or more diverse populations show reduced phonotactic complexity. We also experiment with incorporating the auxiliary task of predicting syllable constituency, but do not find an increase in the strength of the negative correlation observed.
PWESuite: Phonetic Word Embeddings and Tasks They Facilitate

Vilém Zouhar^*, Kalvin Chang^*, Chenxuan Cui, and 4 more authors

In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024

Abs

Mapping words into a fixed-dimensional vector space is the backbone of modern NLP. While most word embedding methods successfully encode semantic information, they overlook phonetic information that is crucial for many tasks. We develop three methods that use articulatory features to build phonetically informed word embeddings. To address the inconsistent evaluation of existing phonetic word embedding methods, we also contribute a task suite to fairly evaluate past, current, and future methods. We evaluate both (1) intrinsic aspects of phonetic word embeddings, such as word retrieval and correlation with sound similarity, and (2) extrinsic performance on tasks such as rhyme and cognate detection and sound analogies. We hope our task suite will promote reproducibility and inspire future phonetic embedding research.