ChatGPT Images 2.0: What Actually Changed (And What Still Can't Draw Your Dog)
An honest review of OpenAI's biggest image model update — text rendering, thinking mode, multi-image consistency, and the things it still gets wrong.
OpenAI shipped ChatGPT Images 2.0 on April 21st, and within 48 hours my feed was full of people generating Studio Ghibli portraits of their cats. Which is fine. That's what people do with new image tools — they point them at the nearest animal and see what happens. But underneath the anime cats, something more interesting changed, and it's worth understanding before you decide whether this matters to your actual life.
I've been testing Images 2.0 since it launched. I've run it against the kinds of tasks real people bring to AI image generators — not benchmark prompts, not art-world provocations, but the things that show up in the conversations I work in every day: a logo for a lemonade stand, a birthday party invitation, a product photo for an Etsy listing, a diagram for a school project, a headshot edit, a watercolor-style illustration for a children's story. The boring stuff. The stuff that matters.
Here's what actually changed.
Text rendering finally works (mostly)
This is the headline feature, and it's real. Previous image models treated text the way a four-year-old treats a crayon — the letters were in the neighborhood, the spelling was aspirational, and anything past six characters turned into abstract art. Images 2.0 handles text as a first-class element. I generated a poster with a 12-word headline and every letter was correct, properly kerned, and sitting on a straight baseline.
The "mostly" caveat: it still struggles with very small text (anything you'd need to squint at), with cursive or script fonts, and with text that wraps around curved surfaces. If you need a logo with text in an arc, you'll want to edit that part manually. But for straight-line text — headlines, labels, watermarks, greeting cards — it's remarkably good. This alone makes it usable for things that were impossible six months ago.
For context: The Week AI Learned to Draw Text covered the early signs of this capability. Images 2.0 is the fulfillment of that promise, and it's more reliable than I expected.
The "thinking" mode is the real upgrade
Images 2.0 has a reasoning layer — OpenAI calls it "thinking mode" — that plans the image before generating it. In practice, this means the model spends a few extra seconds understanding your prompt before committing pixels. The result is better spatial reasoning: objects go where you said they should, people have the right number of fingers more often than not, and complex scenes with multiple elements actually compose sensibly.
I asked it for "a kitchen counter with a cutting board, a chef's knife, three tomatoes, and a glass of water, with afternoon light coming from the left." Previous models would nail about 60% of that list. Images 2.0 got all of it, including the light direction. The glass even had a shadow on the correct side.
Thinking mode requires a paid plan (Plus, Pro, or Business). Free users get Images 2.0 but without the reasoning layer, which means they get better text rendering but not the spatial intelligence. That's an important distinction if you're trying to decide whether to upgrade.
Multi-image coherence
You can now generate up to eight images from a single prompt, and the model maintains character consistency across the set. I asked for "a middle-aged woman in a blue cardigan, in four different settings: a kitchen, a garden, a library, and a bus stop." All four images showed recognizably the same person — same face shape, same cardigan, same posture. Not identical (the art style wobbles slightly between frames), but consistent enough for a storyboard, a children's book spread, or a set of social media posts.
This is new. Previous models treated each image as independent. If you wanted character consistency, you needed elaborate workarounds with seed numbers and reference images. Now it's native.
What it still can't do
Honesty section. This matters.
Your dog. Upload a photo of your specific dog and ask for a watercolor version, and you'll get a generic dog in the same breed and color. The likeness problem isn't solved. Your AI Can't Draw Your Dog (Yet) is still mostly accurate — the "yet" is doing slightly less work than it was, but we're not there.
Hands in complex poses. Better than before. Still not reliable. If the image involves someone playing guitar, knitting, or holding a small object with both hands, inspect the fingers before you use it.
Photorealistic humans. Images 2.0 can produce images that look photographic, which raises the obvious question. OpenAI has added watermarking (C2PA metadata), but the images are good enough to fool a casual viewer. If you need a headshot or a portrait and you're considering AI-generating one — don't. Use a real photo. The ethical line here is clear even if the technical capability has crossed it.
Precise brand matching. If your brand uses Pantone 7462 C, the model doesn't know what that means. It'll get you "blue" but not "your blue." For brand-accurate work, AI generation is a starting point, not a deliverable.
Non-Latin scripts at small sizes. The multilingual improvements are real for Japanese, Korean, Hindi, and Bengali in display sizes. But small text in these scripts still has accuracy issues.
Who this is actually for
If you're a career creative — a designer, illustrator, photographer — Images 2.0 is a sketching tool, not a replacement for your work. Use it for mood boards, comp layouts, concept exploration, and client communication ("something like this, but..."). Don't use it for deliverables. Your eye is better than its eye, and your clients are paying for your eye.
If you're a small business owner who needs "good enough" visual assets — social media graphics, blog header images, quick product mockups, event flyers — this is a genuine upgrade. The text-rendering alone makes it viable for things that previously required Canva or a freelancer. The 📸Phone Photo Glow-Up prompt on a-gnt pairs well with this: edit the real photo for the hero shot, generate supporting graphics around it.
If you're a parent, teacher, or hobbyist who wants to make things — birthday invitations, story illustrations, classroom materials, D&D character portraits — this is the first image model I'd recommend without caveats. It does what you want most of the time, the text works, and the multi-image consistency means you can build a set, not just one image.
If you're a teenager making stuff for fun, it's genuinely good. Make the album cover for the band you don't have yet. Make the movie poster for the screenplay you're writing in English class. Make the visual novel assets for the game you're building with friends. This is the toy version of tools that professionals pay thousands for, and it's included in your ChatGPT subscription.
The bottom line
ChatGPT Images 2.0 is not a revolution. It's a competence upgrade. The things image AI was bad at — text, spatial reasoning, consistency — are now closer to good enough for real use. The things it was already good at — style transfer, concept generation, mood and composition — are slightly better.
The most important change is practical, not technical: this is the first version where a non-technical person can describe what they want, get something usable on the first or second try, and not need to learn prompt engineering to get there. The thinking mode does the translation work that used to be your job.
That's not a small thing. That's the difference between "AI image generation is a cool demo" and "AI image generation is a tool I actually use."
Whether you should upgrade to Plus for the thinking mode depends on how often you generate images and how much the spatial accuracy matters. If you generate images weekly and care about the details, it's worth it. If you generate images monthly for fun, the free tier is fine.
The cats, by the way, look great in Ghibli style. Some things you don't need thinking mode for.
Ratings & Reviews
0.0
out of 5
0 ratings
No reviews yet. Be the first to share your experience.