USC Secures Hat Trick at Interspeech 2025 with 3 Research Awards

Venice Tang | October 14, 2025 

USC SAIL Lab’s Best Paper and Dual Grand Challenge Wins Solidify its Leadership in Speech Communication Science and Technology

USC SAIL Lab and PhD Student, Thanathai Lertpetchpun (middle), receiving Grand Challenge Awards at the 2025 Interspeech Conference.

USC SAIL Lab and PhD Student, Thanathai Lertpetchpun (middle), receiving Grand Challenge Awards at the 2025 Interspeech Conference. (Image courtesy of USC SAIL Lab)

USC’s Signal Analysis and Interpretation Laboratory (SAIL) earned three major honors at Interspeech 2025, the annual conference of the International Speech Communication Association, held Aug. 17–21 in Rotterdam, The Netherlands. The lab’s paper that developed an AI model using audio clips to map out the different vocal tract movements that distinguish British and American accents won the Best Student Paper Award. USC SAIL also secured first and second place in the conference’s Speech Emotion Recognition in Naturalistic Conditions Challenge, with breakthroughs in accurately predicting emotions through speech by accounting for speaker identity, like gender, and including speech samples expressing mixed emotions. The achievements at this highly coveted speech processing and human-centered AI conference highlight SAIL’s contributions to advancing automated speech recognition—a key component of human-centered AI and communication technologies.

“SAIL focuses on human-centered signal and information processing that address key societal needs,” said Shrikanth Narayanan, University Professor and Niki & C. L. Max Nikias Chair in Engineering, Director of the Ming Hsieh Institute, who leads the USC SAIL Lab. “Bridging science and engineering, SAILers pioneer approaches that bridge science and engineering to tackle real-world problems, from understanding human speech and emotion to improving communication technologies.”

 

USC SAIL Lab and PhD Students, Kevin Huang and Jihwan Lee receiving awards at the 2025 Interspeech Conference.

USC SAIL Lab and PhD Students, Kevin Huang and Jihwan Lee receiving awards at the 2025 Interspeech Conference. (Image courtesy of USC SAIL Lab)

Best-paper-winning paper introduces first tool combining audio data, historical speech information, and AI to map the vocal tract behind British and American accents

Ever wonder what makes a British accent different from an American one? It all comes down to the physical characteristics of how speech sounds are produced by human vocal tracts–articulatory features. The variations in these features distinguish one accent from another. British English speakers move tongues, lips, and vocal cords to form each speech sound differently from American speakers.

Today, researchers know how British English sounds different from American English but they struggle to understand what specific articulatory features are behind each accent, contributing to the acoustic differences. Existing technologies used in past studies, like electromagnetic articulography (EMA), are highly resource-intensive, requiring specialized equipment and controlled lab environments. As these instruments must be physically attached to speakers’ faces to track movements, collecting data is not only limited and invasive but also inconsistent as facial structures and vocal tract anatomy vary across sampled individuals. These constraints make large-scale studies difficult, which further contributes to the scarcity of data available.

USC SAIL lab offers a clever solution using an Artificial Intelligence (AI) model that relies on both audio recordings and existing articulatory data in the paper, “On the Relationship between Accent Strength and Articulatory Features.” Instead of the traditional lab approach to collecting data, the paper’s first author and USC PhD student, Kevin Huang, and his team takes audio samples of individuals speaking with British and American accents, and combines them with data from past EMA studies that attempted to map out acoustic features in different accents. Then, the team feeds both data into an AI framework known as a “spark encoder.” Instead of outputting absolute spatial coordinates to map out the articulatory features behind each accent, the team uses this self-supervised learning model to provide relative distances between parts of the vocal tract to create a more flexible and accurate map of how speech sounds are physically formed. This approach overcomes the limitations of traditional lab-based methods by using accessible audio data to analyze articulatory patterns, delivering a more precise prediction of the articulatory features that distinguish accents.

“This is the first time machine learning has been applied to audio data in a way that is able to predict physical movements behind different accents,” Huang explained. While the current study focuses on British and American accents, “the model can be applied to broader accent conversion or in other languages,” he added. From helping actors master new accents to supporting second-language learners and individuals with speech impairments, this new model breaks new ground in accent adaptation, language learning and speech synthesis. This is the second year in a row Kevin Huang is receiving a best paper award at Interspeech, having won recognition in 2024 for his paper “Analysis of articulatory setting for L1 and L2 English speakers using MRI data”.

USC SAIL Lab and PhD Student, Kevin Huang receiving Best Student Paper Award at the 2025 Interspeech Conference. (Image courtesy of USC SAIL Lab)

USC SAIL Lab and PhD Student, Kevin Huang receiving Best Student Paper Award at the 2025 Interspeech Conference. (Image courtesy of USC SAIL Lab)

USC SAIL Paper on Improving Speech-Based Emotion Prediction Wins Two Grand Challenge Awards

Being able to read emotions through listening to someone talking is no longer a superpower, thanks to studies in speech emotion recognition (SER)—a growing research field that teaches computers to detect and interpret human feelings from voice signals.

USC’s SAIL Lab has made a major leap forward with its new SER system, as it demonstrated groundbreaking accuracy in predicting emotional tones from day-to-day speech. Detailed in the paper “Developing a High-Performance Framework for Speech Emotion Recognition in Naturalistic Conditions,” the lab’s work achieved top results, scoring 35% higher than the first runner-up at a challenge at this year’s Interspeech conference.

Traditionally, researchers have associated certain emotions with specific vocal pitches—higher frequencies often linked with excitement or anger, and lower tones with sadness or calmness. However, these earlier studies tended to generalize across all speakers without accounting for differences in vocal range between male and female speakers, which can significantly affect how pitch relates to emotion.

To address this issue, the paper’s first authors Thanathai Lertpetchpun and Tiantian Feng from USC SAIL Lab introduced the speaker’s gender as an additional factor in its SER AI model. By teaching the system to recognize that male and female voices operate on different pitch ranges, they dramatically improved its accuracy in detecting emotions. Incorporating gender-based pitch data allowed the AI to better distinguish between such subtle differences. The team also found that including other speaker identity information, on the other hand, did not improve results and might even reduce the AI’s prediction performance.

Another key innovation came from training the system with “non-agreement” data—samples where human annotators disagreed on what emotion was being expressed, or where multiple emotions overlapped. In past studies, these samples were often thrown out. “It’s a no-brainer to keep them,” said Lertpetchpun, noting that real human emotions are rarely clear-cut. Including these complex, mixed-emotion examples made the AI model more robust and realistic in its understanding of emotional nuance.

Such breakthroughs have major implications for fields like mental health monitoring, customer service AI, and human-computer interaction, where accurate emotion detection can make technology more empathetic and responsive. With USC’s SAIL Lab leading the charge, machines are learning not just to listen—but to truly understand how we feel.

 

With 30 papers published and accepted in top conferences and journals in 2025 alone, USC SAIL continues to lead the field in integrating signal processing, machine learning and behavioral AI modeling to advance human-centric communication. As an IEEE Flanagan Award, Shannon-Nyquist technical achievement award, Deswarte Prize and  ISCA Medal winner, Narayanan leads the lab’s in developing new tools and supporting influential datasets used across academia, industry and today’s society.

Published on October 14th, 2025

Last updated on October 14th, 2025