“Sorry, I Didn’t Quite Get That: The Misidentification of AAVE by Voice Recognition Software”

Shannon McCarty, Lam Pham, Alora Thresher, Alexandria Wasgatt, Emma Whamond

This study investigates the transcription accuracy by AI speech recognition systems using natural language processing when interpreting standard American English dialects (SAE) versus African American Vernacular English (AAVE). We inspect the percentage of misidentified words, and the degree to which the speech is misidentified, by AI speech recognition systems through analyzing authentic speech found in YouTube videos. The accuracy of voice recognition with respect to AAVE will be determined by selecting for distinct AAVE features, such as G-dropping, the [θ] sound, reduction of consonant clusters, and non-standard usages of be. The methodology includes feeding YouTube clips of both SAE and AAVE through an AI speech recognition software, as well as examining YouTube’s auto-generated transcripts, which are created by automatic speech recognition based on the audio of the YouTube video. The purpose of this study is to bring attention to the needs of diversity in technology with regard to language variation, so that AI speech systems such as Amazon’s Alexa or Apple’s Siri are more accessible to all members of society, as well as to help destigmatize a variety of American English that has carried social, cultural, and historical stigma for centuries.

Read more

Scroll to Top