Research Projects
My research focuses on developing deep learning models for speech data and using well-understood dependencies in speech to interpret internal representations in deep neural networks. More specifically, I build models that learn representations of spoken words from raw audio inputs. I combine machine learning and statistical models with neuroimaging and behavioral experiments to better understand how neural networks learn internal representations in speech and how humans learn to speak. I have worked and published on sound systems of various language families such as Indo-European, Caucasian, and Austronesian languages.
Understanding how AI learns
We use human language to better understand how AI models learn and we use AI models to better understand how humans learn to communicate
We discovered techniques that allow us to understand the inner workings of AI
Language, it turns out, is a window into the inner worlds of both humans and machines.
Building artificial baby language learners
What if we could build AI that learns language the way human babies do? In this line of work, I introduce an AI model that learns to speak by listening and imitating language in ways very similar to how human babies do it. (CiwGAN and fiwGAN introduced in Neural Networks)
Trained on a few words of English, the model learns to create new words like “start” and “dust” despite never hearing these words.
The model also learns to move the tongue and lips in highly similar ways as humans do when saying words. Our model not only mimics how people speak, but can also be tested using methods linguists have used over centuries to study human language. (ciwaGAN, ICASSP 2023)
How are our models (GANs) different from Large Language Models (like GPT-4):
Our models learn from raw speech (not text)
Our models learn from a few words Our models learn by imitation/imagination, "imagitation" (not next word prediction)
Our models have communicative intent
Our models have representations of mouth
Comparing the brain and AI
We found one of the most similar signals between artificial intelligence agents and the human brain reported thus far, by comparing them directly in raw untransofrmed form.
Building AI that learns language like a human can deepen our understanding of both artificial and human intelligence.
More
We have found one of the most similar signals between artificial intelligence agents and the human brain reported thus far, by comparing them directly.
These AI agents were trained to learn spoken language in a manner akin to how humans learn to speak: by immersing them in the raw sounds of language without supervision.
The study, published in Scientific Reports, is the first to directly compare raw brainwaves and AI signals without performing any transformations.
This line of work helps us better understand how AI learns, as well as identify similarities and differences between humans and machines.
🔊The sound played to humans and machines: link
🔊How this sound sounds like in the brain: link
🔊How this sound sounds like in machines1: link
🔊How this sound sounds like in machines2: link
Analyzing large language models
GPT-4 is good at language. We test the next level of its ability: meta-cognitive ability: how can GPT-4 analyze language itself? Recursion is one of the few properties of human language not found in animals.
We show that GPT-4 is the first large language model that can not only do language, but also analyze language metalinguistically
Can GPT do recursion? We set out to test whether GPT-4 can do explicit recursion (both linguistic and visual).
Quote from the preprint: “It appears that recursive reasoning with metacognitive awareness evolved in humans first and that similar behavior can emerge in deep neural network architectures trained on human language. It remains to be seen if animal communication in the wild or language-trained animals can approximate this recursive performance.”
Talk at the Simons Institute (Workshop on LLMs): link
Using Generative AI to decode whale communication
Video on using Generative AI to decode whale communication (and find meaningful properties in unknown data): link
Preprint: Using GANs developed for speech and interpretability techniques proposed in our lab to find out what is meaningful in unknown communication systems.
Preprint: A discovery that sperm whales have equivalents to human vowels and diphthongs.
Unnatural phonology
Estimating historical and cognitive influences on sound patterns in language
I combine historical and experimental approaches to phonology to better understand which aspects of human phonology are primarily influenced by historical factors ("cultural evolution") and which aspects are influenced by cognitive factors
I argue that phonology offers a unique test case for distinguishing historical from cognitive influences on human behavior. The Language paper identifies a process called catalysis that explains how learning factors directly influence typology
I develop a statistical model for deriving typology within the “historical bias” approach (Phonology paper)
Establishing the Minimal Sound Change Requirement and the Blurring Process (Journal of Linguistics 2018)
Apply the Blurring Process to final nasalization (Glossa) and intervocalic devoicing (Journal of Linguistics 2024)
Indo-European linguistics
In the paper on Vedic meter (JAOS), I argue for a new rule in the Rigveda. I show that with this rule that restores the lost v and y sounds, several previously irregular lines can be repaired as regular metrical lines
I propose a new explanation for the development -aḥ > -o in Sanskrit and Avestan (preprint)
I propose new explanation for the phonetics independent svarita: WeCIEC Proceedings
In a project on Vedic pitch accent system, I combine philological and comparative sources with acoustic analyses of present-day Vedic recitation to provide a more accurate reconstruction of the Vedic accent, one of the oldest known accent marking systems