Research Projects
My research focuses on developing deep learning models for speech data and using well-understood dependencies in speech to interpret internal representations in deep neural networks. More specifically, I build models that learn representations of spoken words from raw audio inputs. I combine machine learning and statistical models with neuroimaging and behavioral experiments to better understand how neural networks learn internal representations in speech and how humans learn to speak. I have worked and published on sound systems of various language families such as Indo-European, Caucasian, and Austronesian languages.
Understanding how AI learns
We use human language to better understand how AI models learn and we use AI models to better understand how humans learn to communicate
We discovered techniques that allow us to understand the inner workings of AI
Building artificial baby language learners
We build AI models that learn language more like humans)
How are our models (GANs) different from Large Language Models (like GPT-4):
Our models learn from raw speech (not text)
Our models learn from a few words Our models learn by imitation/imagination, "imagitation" (not next word prediction)
Our models have communicative intent
Our models have representations of mouth (ciwaGAN, ICASSP 2023)
CiwGAN and fiwGAN introduced in Neural Networks
Comparing the brain and AI
We found one of the most similar signals between artificial intelligence agents and the human brain reported thus far, by comparing them directly in raw untransofrmed form
More
We have found one of the most similar signals between artificial intelligence agents and the human brain reported thus far, by comparing them directly.
These AI agents were trained to learn spoken language in a manner akin to how humans learn to speak: by immersing them in the raw sounds of language without supervision.
The study, published in Scientific Reports, is the first to directly compare raw brainwaves and AI signals without performing any transformations.
This line of work helps us better understand how AI learns, as well as identify similarities and differences between humans and machines.
🔊The sound played to humans and machines: link
🔊How this sound sounds like in the brain: link
🔊How this sound sounds like in machines1: link
🔊How this sound sounds like in machines2: link
Analyzing large language models
GPT-4 is good at language. We test the next level of its ability: meta-cognitive ability: how can GPT-4 analyze language itself? Recursion is one of the few properties of human language not found in animals.
We show that GPT-4 is the first large language model that can not only do language, but also analyze language metalinguistically
Can GPT do recursion? We set out to test whether GPT-4 can do explicit recursion (both linguistic and visual).
Quote from the preprint: “It appears that recursive reasoning with metacognitive awareness evolved in humans first and that similar behavior can emerge in deep neural network architectures trained on human language. It remains to be seen if animal communication in the wild or language-trained animals can approximate this recursive performance.”
Talk at the Simons Institute (Workshop on LLMs): link
Using Generative AI to decode whale communication
Video on using Generative AI to decode whale communication (and find meaningful properties in unknown data): link
Preprint: Using GANs developed for speech and interpretability techniques proposed in our lab to find out what is meaningful in unknown communication systems.
Preprint: A discovery that sperm whales have equivalents to human vowels and diphthongs.
Unnatural phonology
Estimating historical and cognitive influences on sound patterns in language
We combine historical and experimental approaches to phonology to better understand which aspects of human phonology are primarily influenced by historical factors ("cultural evolution") and which aspects are influenced by cognitive factors
I argue that phonology offers a unique test case for distinguishing historical from cognitive influences on human behavior. The Language paper identifies a process called catalysis that explains how learning factors directly influence typology
We develop a statistical model for deriving typology within the “historical bias” approach (Phonology paper)
Establishing the Minimal Sound Change Requirement and the Blurring Process (Journal of Linguistics)
Apply the Blurring Process to final nasalization (Glossa) and intervocalic devoicing (Journal of Linguistics)
Indo-European linguistics
In the paper on Vedic meter (JAOS), I argue for a new rule in the Rigveda
A new explanation for independent svarita: WeCIEC Proceedings
In the project on Vedic pitch accent system, I combine philological and comparative sources with acoustic analyses of present-day Vedic recitation to provide a more accurate reconstruction of the Vedic accent, one of the oldest known accent marking systems