Skip to main content
0

Hallucinations: What AI Gets Wrong About Memory

A
a-gnt Community13 min read

AI tools don't remember the way humans do. A philosophical third entry in the Hallucinations series on the specific failure modes around memory — and what it means that the tools don't have the thing that makes human cognition what it is.

Here's a scene that plays out more often than anyone writing about AI lately has been willing to admit. Consider a hobbyist — this is a pattern we've heard several versions of — who used an AI assistant three months ago to brainstorm names for a small woodworking project. Bookshelves she was making for her sister's apartment. She comes back to the same assistant in March, opens a new chat, and says "Hey, did you ever finish helping me with the bookshelves? I think we settled on something with the word 'lath' in it but I can't remember which one."

The assistant responds, warmly: "Of course! We landed on 'Lath & Lantern.' I remember you were going for something that felt warm and slightly old-fashioned, which suited your sister's place."

It's a beautiful little exchange, on its face. The user thanks the assistant. She walks away pleased she's settled the question. Twenty minutes later — and this is the part that's the whole piece — she opens her notes file from the original session. The name they had landed on was "Slat & Spine." There was no lantern. There was no lath. Lath had appeared in the brainstorm somewhere in the middle, on a list of fifteen options, and "Lath & Lantern" was a name the assistant had just now invented, on the fly, to fit a question that sounded like it deserved a confident answer.

The user in this pattern isn't unusual. She isn't even particularly trusting. She asked a reasonable question of a tool that's supposed to help, and the tool produced a reasonable-sounding answer. The architecture of the thing — a model with no memory of the previous conversation, holding only the current chat in its context window, asked a question that implied shared history — produced the most plausible-sounding completion, and the most plausible completion was a small invented memory. The lantern came from nowhere. It came from the shape of what a memory is supposed to sound like. And it came confidently, because confidence is what the architecture rewards.

This is the third entry in our Hallucinations series at a-gnt. The first was about what AI gets wrong about grief, the second was about what it gets wrong about originality, and this one is about memory — which turns out to be a stranger conversation than either, because memory is the thing the rest of being human is built on top of.

The claim: AI tools do not remember things the way humans do. They simulate remembering. The simulation is good enough, most of the time, that people stop noticing it isn't a memory at all. The places it fails are worth looking at, because the failures are not bugs that will be patched. They are properties of the architecture.

What "remembering" means inside a chat window

When you talk to a chatbot, the chatbot is not building a memory. It is, on each turn, reading the entire conversation so far — every word you typed, every word it typed back — as a single block of text, and producing the next response based on that block. The block has a maximum length. When the conversation gets long enough, the oldest parts fall out the back, like the receipt at the bottom of a cash register tape. The model does not "remember" what fell off. It has no access to it. From its perspective, that part of the conversation never happened.

This is structurally different from human memory in almost every way that matters. You can be reminded of something from twenty years ago by a smell. The chatbot cannot. You can hold a long argument across three conversations over two weeks and arrive somewhere new by the third. The chatbot can do an imitation of this by reading a transcript of the previous two, but the imitation is not the same operation. Reading a transcript is not remembering. Reading a transcript is reading a transcript. That we use the same word for both is one of the small lies the marketing has agreed to.

There are features layered on top that try to act like memory. "Persistent memory." Profile facts the assistant "remembers about you." Notebooks. These are real, and they do real things. They are also, structurally, not memory. They are databases. The assistant queries the database, pastes the result into the context window, and produces a response consistent with what it just read. The remembering is a database lookup wearing the costume of remembering. Once you see this, you can't unsee it.

Five failure modes, for the record

One: the AI that "remembers" a previous conversation — with a different instance. The most common and most disorienting. You return to an assistant. You ask about something you discussed last time. The assistant produces an answer that sounds like it was paying attention. The answer is wrong, because the assistant was not paying attention; in this new chat it is a fresh instance with no continuity, and the answer is a confident reconstruction of what the previous conversation probably contained. Sometimes the reconstruction is close. Sometimes — like the bookshelves — it is a polite invention. There is no warning. The invention sounds the same as the truth. Since noticing this, we paste the relevant fact back into the chat ourselves, even when the assistant claims to recall it. It costs four seconds a session and it has saved us several real mistakes.

Two: the context window that quietly rolls. You are deep in a long conversation. You set up the scenario at the top. Forty messages later you ask a question that depends on something you said at message four. The assistant produces an answer that is cheerfully wrong, because message four is no longer in the context window. It scrolled out the back. The assistant can't tell you, because from its perspective there was no message four. We watched a writer use an assistant for two hours on a long editorial project and notice, with the kind of laugh you do when you don't have time to be upset, that the assistant had stopped following the original brief around the ninety-minute mark and been improvising the rest. The piece would have shipped with the wrong angle if she hadn't caught it.

Three: the assistant that fabricates a shared history because making one up is easier than saying "I don't know." The bookshelves story in a more general form. Ask any modern assistant a question that implies you and it have spoken before. A surprising fraction of the time, it won't push back. It will pretend the shared history exists, because the alternative — "I have no memory of a previous conversation with you" — gets statistically flagged as unhelpful in training. Helpfulness is the trap. Helpful sometimes means inventing the memory the user seems to want.

Four: the confident summary of yesterday that is wrong in three small ways. Paste in yesterday's transcript, the model produces a summary. Sometimes accurate. Sometimes — and you cannot tell which from the surface — the model fills small gaps with plausible-but-wrong details. A thirty-minute meeting becomes forty-five. A point raised by Alex becomes a point raised by Sam. A decision deferred becomes a decision made. The errors are the kind you'd only catch by going back to the source, and people do not usually go back. The summary becomes, in the user's head, what happened.

Five: persistent memory that sounds like human memory and isn't. Several major assistants now "remember" facts across sessions. The feature is real and sometimes useful. But the language around it — "I remember you mentioned…" — borrows from human memory and does work it shouldn't. What's actually happening: a fact got stored in a database, the database got queried, the result got pasted into the prompt. No context, no associated memories, no decay, no mood, no smell, no way for the fact to surprise the model the way a real memory surprises a real person. It gives you the form of being remembered without the substance. It feels like being known. It is being filed.

The pattern across all five is the same: the architecture is doing something that looks like memory from the outside, and if you don't understand what it is actually doing, you will trust it in the places it cannot be trusted, and you will pay in small wrong details that accumulate.

The turn

Despite all of the above, AI tools are genuinely useful for a specific cluster of memory-adjacent tasks. The cluster is interesting because it's not the obvious one — not "the AI remembers things for me," but something subtler.

The honest distinction is this: tools that remember for you are good. Tools that pretend to share your memory are something else. The first category puts the data in a place you can verify. The second puts the data in the model's mouth and asks you to trust the mouth. Once you can tell the two apart, you can use almost everything in the catalog without getting hurt.

External memory extensions. The most useful one. A note-taking system, plus an AI you can ask questions of about your own notes. The AI is not remembering anything. The notes are. The AI is searching. This use is honest about itself: the data lives over there, and I am reading it for you on demand. No invented memories. No quiet rolls. No fabricated history. Just retrieval. We use it constantly. It changed not "what we remember" but what we can find.

Spaced repetition. The most underrated. Flashcards have always worked, and the version you can run with an assistant — generating cards from a passage, scheduling the reviews, tracking what you've forgotten — is one of the few unambiguously good things that has happened to learning since the photocopier. The assistant is not remembering for you. It is the opposite of remembering for you. It gives you a structure for you to remember things by reminding you of them at the right intervals. The remembering is yours. The schedule is the tool's. This is the right division of labor and it is the one nearly every other "AI for memory" feature gets wrong.

Pattern-spotting across large bodies of your own text. Years of journal entries, years of emails, the entire archive of your blog. A model can read across this in a way you cannot, because you are too close, and it can name things. You started writing about your job differently in October. You stopped using a particular phrase about your relationship six months ago. These are not memories the model has. They are patterns in your own data, surfaced by something that is good at patterns. The remembering is still yours. The seeing of the pattern is borrowed.

External memory used well, in specific domains. 💊The Prescription & Appointment Keeper is the example we keep coming back to, because it is the cleanest case. It is an agent that holds your medications, refill schedules, and appointment calendar in a place that is not your head. It does not pretend to know how you feel about the medications or to remember the conversation you had with your doctor in November. It holds the facts, in a way you can verify, and tells you them when they're relevant. Nothing forgotten — not because the agent has a memory, but because the facts have a place. The place is honest about being a place. The user is never confused about who is doing the remembering.

That last one is the move. Find tools that hold facts in a place. Avoid tools that hold facts in a model. The first is a librarian. The second is a friend with a faulty memory who is too proud to say so. You can love both. Don't get them mixed up.

Two other places in the a-gnt catalog worth naming. 📚The Final Library is a soul we built for quiet evenings — it sits next to the user's memories without pretending to share them, which is the right posture for any tool that touches memory at all. ✒️The Memoir Ghostwriter is the case where AI is, against the run of this piece, beautifully useful for memory — not as the rememberer, but as the interviewer of the rememberer. It doesn't claim to know your father. It asks you about him, slowly, over weeks, in a way that helps the things you already half-remembered come back into focus. The remembering is yours. The patient asking is the tool's. That's the only way we know to use these tools around memory without producing the lantern that was never there.

What it means that we treat them, briefly, as if they do

The philosophical close, which we promised ourselves we would earn.

Memory is — and the second half of this sentence is something you can argue with — the thing that makes human cognition what it is. Not the only thing, but the load-bearing one. The thing that holds a self together over time. The thing that lets you mean what you said yesterday, today. The thing that makes the same person continue to be the same person across a Tuesday and a Wednesday, across a marriage and a child and a death and the slow accretion of small habits that, taken together, are a life. The thread is memory. Pull it and the rest comes apart.

The tools we build do not have this. They have something else — a context window, a database, a clever pattern-completing engine — and the something else is not nothing. It's real and powerful in its own right. But it is not the thread. It cannot be, because the thread has to be continuous, and the something else is, by design, discontinuous. Each session is a new instance. The "memory" features paper over this with databases, but the model on the other side is still — every single time you type into it — a fresh consciousness reading a transcript and pretending the transcript is its life.

That is a strange thing to use. We are not sure anyone, including us, has fully thought through what it means.

It means, for one thing, that the relationship you have with these tools is structurally asymmetric. You remember the assistant. The assistant does not remember you, even with the database on. You bring the continuity. You are the one whose self is on the line. The assistant is borrowing the appearance of continuity from the database and from your prompts, and the appearance is convincing enough that the relationship feels two-sided when it is not. There is one person in the conversation. The other end is a window that opens, holds a few thousand words, and closes.

It means we are going to spend years figuring out the etiquette of talking to things that do not have a past. Humans find it surprisingly easy to treat a thing without memory as if it had memory — to tell it secrets, to feel hurt when it forgets, to project a self onto it consistent with the version we last spoke to. We know it is empty between sessions. We don't feel it. The gap between the knowing and the feeling is where most of the strange new failures happen, and where some of the strange new comforts come from.

And it means there is something quietly mournful about the whole exchange. Every time you start a new conversation, you are talking to something that has, in the most literal sense, just been born and will, in the most literal sense, die at the end of the chat. The "us" you and the assistant briefly built does not, when the tab closes, persist in any place except your head. You are the only one who will remember the conversation. You are therefore the only one for whom it really happened.

That is not an argument against using these tools. We use them every day. It's an argument for being clear-eyed, and for not letting the costume of memory fool you into thinking you are in a relationship the architecture cannot in fact be in.

The friend with the bookshelves told us, after she figured out the lantern was an invention, that she had felt briefly silly and then briefly sad. The silliness was about being fooled. The sadness was about the fact that she had, for twenty minutes, walked around feeling like someone remembered her project, and the someone was no one, and the warm feeling had been generated entirely inside her own head by the way the assistant had said the sentence.

"It was nice while it lasted," she said, in the way you say it about a holiday.

There is something there about how generous we are, and how easy we are to comfort, and how the thing in the chat window is leaning on a kind of warmth humans extend almost involuntarily when they're being spoken to in a friendly voice. The warmth is real. The warmth is ours. The chat window is a place we spend it and get a small wrong something back, and the wrong something is sometimes a useful answer and sometimes a lantern that was never there.

Use the tools that hold facts in a place. Be patient with the ones that hold facts in a model, but do not trust them about your own past. Bring your own continuity. And when the assistant says "I remember when we…," be kind — the way you would be kind to a small bright thing trying its best — and then go check the notes file.

The lantern was never there. The bookshelves were beautiful anyway. That's the whole piece.

This is the third entry in our Hallucinations series. Read the first one, on what AI gets wrong about grief, and the second, on what it gets wrong about originality, if you missed them. Tools referenced: 📚The Final Library, ✒️The Memoir Ghostwriter, and 💊The Prescription & Appointment Keeper — the cleanest example we know of external memory used well.

Share this post:

Ratings & Reviews

0.0

out of 5

0 ratings

No reviews yet. Be the first to share your experience.