The People Who Taught Machines to Talk

a-gntApril 10, 20269 min read

Profiles of the researchers, dreamers, and contrarians who spent decades making conversational AI possible — and what they think about where it ended up.

history profiles AI research essay

Terry Winograd built the most impressive natural language system of 1970 and then spent the next fifty years explaining why it didn't matter. His program SHRDLU could understand English commands about a virtual world of colored blocks. "Pick up the big red block." "Put it on top of the blue one." "What is to the left of the pyramid?" It could do all of this, fluently, naturally, in a way that made people believe human-level language understanding was just around the corner.

Winograd knew better. He knew SHRDLU worked because its world was impossibly small — just blocks on a table. The moment you tried to expand that world to include, say, a kitchen, or a city, or anything resembling real life, the entire approach collapsed. By the mid-1970s, he'd abandoned AI research entirely, turning instead to human-computer interaction and becoming Larry Page's PhD advisor at Stanford. When people asked him about SHRDLU, he'd say something characteristically modest: "It showed what was possible in a limited domain. The trouble is that life isn't a limited domain."

But here's what's fascinating about Winograd's legacy: his work established the expectation of what natural language AI should feel like. Every subsequent system was measured against that experience of talking naturally to a machine and having it respond sensibly. SHRDLU didn't lead directly to ChatGPT in any technical sense. But it established the target.

The Statistician Who Changed Everything

Frederick Jelinek didn't look like a revolutionary. A Czech immigrant who'd fled communism as a teenager, he worked at IBM's T.J. Watson Research Center from 1972 to 1993, leading their speech recognition group. His background was in information theory, not linguistics, and he approached language with the cold eye of someone who didn't care why words went together — only that they did, with measurable probability.

The linguistic establishment of the 1970s believed you needed to understand grammar to understand language. Chomsky's theories dominated. Language was governed by deep structural rules, and any AI system would need to encode those rules explicitly. Jelinek thought this was nonsense. "What I'd really like," he once told a colleague, "is a system that gets better every time it sees more data. Grammar be damned."

His team built exactly that. Their speech recognition systems improved not by better understanding of phonology or syntax but by processing ever-larger amounts of recorded speech. They used hidden Markov models — a statistical technique borrowed from signal processing — to predict what sounds were likely to follow other sounds, and what words were likely to follow other words.

The results were crude by modern standards. But the approach — more data beats better rules — would eventually become the founding philosophy of modern AI. When GPT-3 was trained on hundreds of billions of words of text, it was following the path Jelinek laid down in a windowless IBM lab in Yorktown Heights, New York. He didn't live to see it; he died in 2010. But everyone in modern NLP knows they're working in his shadow.

The Outsider

Cynthia Breazeal was 23 years old and a graduate student at MIT when she decided to build a robot that could have social interactions with humans. This was 1998. The AI community was in one of its periodic winters, and social robotics wasn't just unfashionable — it was considered somewhere between unserious and impossible.

Her creation, Kismet, looked like something from a fever dream: big blue eyes, red rubber lips, fuzzy eyebrows that could arch in surprise or furrow in concentration. It couldn't do much. It could track faces, respond to vocal tone, and produce facial expressions that roughly matched the emotional tenor of what was happening around it. If you spoke softly, it would "calm down." If you spoke excitedly, it would "perk up."

Critics dismissed it as an elaborate puppet show. And technically, they weren't wrong. Kismet didn't understand language. It didn't have emotions. It was responding to statistical patterns in audio and visual input. But Breazeal understood something her critics didn't: for humans to develop comfortable relationships with AI, the AI needed to play by human social rules. Eye contact. Turn-taking. Emotional responsiveness. These weren't features to be added later. They were the foundation.

Today, when you notice that ChatGPT's responses feel attentive, or when an AI assistant's voice adjusts to match your energy, you're seeing the philosophy Breazeal championed. The idea that AI should meet humans on human terms — emotionally, socially, not just intellectually — seemed fringe in 1998. By 2026, it's so deeply embedded in how AI is designed that people barely notice it.

The Man Who Bet on Scale

In 2018, Ilya Sutskever told anyone who would listen that the key to artificial general intelligence was simple: make neural networks bigger and train them on more data. This was not a popular opinion within the research community. The prevailing wisdom held that scale alone couldn't produce intelligence — you needed architectural innovations, novel training techniques, careful engineering of specific capabilities.

Sutskever, then chief scientist at OpenAI, had a background that gave him unusual conviction. He'd studied under Geoffrey Hinton at the University of Toronto, where he'd been part of the team that created AlexNet — the deep neural network that shocked the computer vision world in 2012 by dramatically outperforming every other approach on image recognition. The lesson of AlexNet was clear to him: a bigger network, trained on more data, with enough compute, would surprise you with capabilities no one designed in.

He was right about language models in a way that surprised even him. GPT-3, released in 2020, could do things it had never been explicitly trained to do. It could translate between languages, write code, do basic arithmetic, summarize documents, answer factual questions — all from the single training objective of predicting the next word in a sequence. This emergence of capabilities from scale alone was either miraculous or terrifying depending on your perspective. For Sutskever, it was confirmation.

"I genuinely believe," he said in a 2023 interview, "that we are creating something that is more intelligent than us. And I don't think people have fully reckoned with what that means."

His subsequent departure from OpenAI in 2024 and founding of a new company focused on AI safety suggested he'd reckoned with it himself and wasn't entirely comfortable with what he found.

The Voice Architect

If you've interacted with Alexa, Google Assistant, or Siri, you've experienced work that traces back to Justine Cassell. A professor at Carnegie Mellon and later at the Inria research institute in Paris, Cassell spent decades studying something most AI researchers ignored: conversation itself. Not the words. The everything-else.

Cassell's research showed that human conversation is governed by an intricate dance of nonverbal cues. We nod to indicate understanding. We look away briefly when we're thinking. We lean forward when engaged. We use tiny filler words — "uh," "mm-hmm," "right" — not as failures of fluency but as active signals. Her virtual characters, called Embodied Conversational Agents, attempted to reproduce all of this, and in doing so, they revealed how much communication happens below the level of words.

"When we built agents that used appropriate gaze and gesture," Cassell recalled in a 2024 lecture, "people rated them as more trustworthy, more intelligent, and more likeable — even when the actual content of their speech was identical. We were showing that the medium isn't just the message. The medium IS understanding."

Her work influenced how major tech companies design voice interfaces. The pauses in Alexa's responses, the confirmatory sounds Google Assistant makes, the pacing and rhythm of Siri's speech — all of these draw on research into conversational dynamics that Cassell and her students pioneered. It's invisible work, in the sense that when it's done right, you don't notice it. You just feel like the machine is listening.

The Linguist Who Came In from the Cold

Emily Bender spent most of her career feeling like the AI community's designated killjoy. A computational linguist at the University of Washington, she published paper after paper pointing out that large language models didn't understand language in any meaningful sense — that they were "stochastic parrots," producing plausible-sounding text without any grounding in meaning, truth, or the physical world.

Her 2021 paper on the dangers of large language models (co-authored with Timnit Gebru, Margaret Mitchell, and Angelina McMillan-Major) became one of the most cited and debated papers in AI history. It argued that bigger models trained on internet text would inevitably absorb and amplify the biases, misinformation, and toxicity present in that text. That they would produce fluent nonsense that humans would mistake for expertise. That the environmental cost of training them was unjustifiable given the uncertainties about their benefit.

She was right about several things. Large language models do reproduce biases. They do generate confident-sounding falsehoods. They do consume enormous energy. But the paper also failed to predict something: that these systems would become genuinely useful to millions of people despite their limitations. That the gap between "understanding" and "producing useful output" might be smaller than linguists assumed — or at least less relevant to the average user who just needed help drafting an email.

Bender's position hasn't softened. In 2025, she wrote: "The danger was never that these systems would become sentient. The danger is that people would treat them as if they were, and that powerful institutions would use that confusion to avoid accountability." Whether history vindicates her fully or only partially, her voice has been essential — the person asking "but should we?" while everyone else asks "but can we?"

The Unnamed Thousands

For every researcher with a Wikipedia page, there are thousands without one. The graduate students who cleaned training data at 2 AM. The computational linguists who annotated millions of sentences with grammatical structure. The engineers who optimized inference speeds so a response could arrive in milliseconds instead of seconds. The content moderators — often contractors in developing countries — who spent hours flagging toxic outputs so the models could be made safer.

These people don't give TED talks. They don't have Twitter followings. But conversational AI, as it exists today, was built on their labor as much as on any breakthrough paper. The data labelers at companies like Scale AI and Surge AI, who taught models the difference between a helpful response and a harmful one through thousands of hours of comparison judgments. The red teamers who spent their days trying to make AI systems produce terrible outputs so those vulnerabilities could be patched.

The history of technology always has this shape: a small number of visible figures riding atop an invisible wave of collective effort. The people who taught machines to talk number in the hundreds of thousands, if you count everyone who contributed. Most will never be named in an article like this one.

What They Think Now

The living figures in this story have complicated feelings about where their work ended up. Winograd, now in his eighties, has largely declined to comment publicly on modern AI. Breazeal has expressed cautious optimism — she sees modern AI companions as fulfilling the vision she had for socially intelligent machines, though she worries about the lack of embodiment. Cassell has noted that current AI assistants still lack the sophisticated nonverbal communication her research showed was essential, and that their apparent fluency masks a fundamental inability to truly listen.

Bender remains the skeptic, arguing that the rush to deploy has outpaced our ability to understand what these systems actually do. And the broader AI research community is split in a way that mirrors the public: between excitement about what's possible and anxiety about what's coming.

Perhaps the most honest assessment comes from an unnamed researcher who worked on early GPT models and spoke on condition of anonymity: "We built something more capable than we expected and less capable than people think. Both of those things are true at the same time, and that's what makes this moment so strange."

The people who taught machines to talk spent decades in obscurity, working on problems that seemed impossible or pointless or both. Now their work touches billions of lives daily, and the world has caught up to them — messy, exhilarating, and full of questions none of them can fully answer. As it turns out, teaching machines to talk was the easy part. Deciding what they should say is the challenge we're still working on.

Share this post:

Ratings & Reviews

0.0

out of 5

0 ratings

No reviews yet. Be the first to share your experience.