I tried GPT-5.4, and most answers were really good &#8211; but a few had me concerned

I tested GPT-5.4, and the answers were really good - just not always what I asked — Elyse Betters Picaro / ZDNET

Observe ZDNET: Add us as a most well-liked supply on Google.

ZDNET’s key takeaways

GPT-5.4 Considering delivers deeper evaluation than earlier ChatGPT fashions.
It has sturdy reasoning, but it surely typically solutions questions you did not ask.
Formatting and picture era lag behind the textual content high quality.

It is a new month, and a brand new AI model quantity. It is referred to as GPT-5.4 Considering. This newest launch, which OpenAI issued final week, is not your run-of-the-mill ChatGPT incremental replace.

Additionally: OpenAI’s new GPT-5.4 clobbers people on pro-level work in exams – by 83%

Oh, no. As a substitute of leaping from 5.2 to five.3, for this launch the corporate jumped all the best way to five.4. And as a substitute of providing a basic objective launch, the corporate launched GPT-5.4 Considering, a extra cognitively ready mannequin designed for larger ideas and challenges.

GPT-5.4 Considering is on the market for the programming Codex instrument, the API, and for paid ChatGPT plans. For this text, I used the $20-per-month ChatGPT Plus plan to place it by way of its paces.

That introduced me with a little bit of a problem. Usually, once I check a ChatGPT model, I run it by way of a collection of combined exams. Some are fast, and a few are a bit extra detailed. The prompts are normally only a few strains lengthy. The responses normally lend themselves to being included in an article.

However this Considering mannequin required deeper dives, with extra complete challenges. As such, not solely are the prompts extra concerned, however the responses are far too intensive to incorporate within the article. As a substitute, I am offering hyperlinks into every check session. Once you observe the hyperlinks, you can see your complete response in depth. Normally, a shared transcript opens on the finish of the transcript, so scroll again to the highest to get the complete contents of that dialogue.

Additionally: The way to change from ChatGPT to Claude: Transferring your reminiscences and settings is straightforward

Earlier than we soar into the 4 challenges I introduced to GPT-5.4 Considering, I am going to offer you a fast TL;DR conclusion about my expertise. There’s some good and unhealthy, however largely good.

The great: Textual content-based responses are actually good. Many of the challenges I gave it have been answered thoughtfully. I did not catch it in any hallucinations. I acquired constructive worth from each reply.
The unhealthy: Sadly, typically it answered questions that differed from what I requested. Pictures and formatting left a lot to be desired. When it got here to picture era, clearly the AI didn’t use a complicated mannequin. You will see what I imply, however principally it is just like the mannequin simply did not hear. Formatting was bizarre. It likes very lengthy numbered lists. You may see them within the chat transcripts.

Total, I’d undoubtedly use the GPT-5.4 Considering mannequin for larger challenges and questions. I used to be fairly impressed, though I undoubtedly wasn’t a fan of the formatting. It additionally wants steady administration to maintain it on monitor.

Now, let’s dive into every of the exams.

Take a look at 1: Plane service within the sky

I began off with a picture era problem. The beginning immediate was “Create a picture of an plane service flying within the sky, held up by 4 upward-facing turbo-propellors in spherical fan housings, carrying a squadron of fighter jets on its deck.”

Additionally: I ended utilizing ChatGPT for all the pieces: These AI fashions beat it at analysis, coding, and extra

I began with this as a result of earlier picture era exams, throughout plenty of AIs, did not get it proper. They nearly all the time face the propellors to the rear of the service. Gemini Nano Banana 2 oddly put the propellors in entrance, with the service shifting into the forward-facing thrust. Typically, we simply do not need to know.

In any case, proper out of the gate, with the mannequin set to GPT-5.4 Considering, ChatGPT returned this picture.

As you may see, it has the identical downside. Though should you look carefully at it, the props face the again of the plane, and there are visible thrust beams taking pictures downward. You win some. You lose some.

However then, I had a thought. That is the pondering mannequin, so what if I requested it to design a helicarrier? What wouldn’t it give you? I specified the traits of the craft, after which added on these directions: “Design such a car, significantly explaining its construction and the way will probably be held aloft, together with any constraints or points, in addition to any tactical benefits”

I acquired again a protracted, well-considered reply. I significantly appreciated the part the place it defined why “4 downward-facing turbo-propellers are a weak answer.” It stated they appear dramatic, but it surely outlined a collection of stable engineering the explanation why they seem to be a unhealthy thought from an plane building perspective.

Additionally: ChatGPT’s most cost-effective subscription involves the US: I in contrast Go to Plus and Professional

It additionally went on to debate flight deck operations and varied constraints when it comes to practicality. Particularly, it correctly targeted on the weight-to-power situation, which principally means it’s going to take approach an excessive amount of energy to carry one thing that huge and heavy aloft.

Total, the evaluation and conclusions have been nice, though I used to be disenchanted it did not point out both the USS Akron or USS Macon, which have been early twentieth century aircraft-launching dirigibles that truly labored (till they crashed). A contemporary dirigible could be a sound design possibility, but GPT-5.4 Considering did not point out that strategy.

After GPT-5.4 Considering created the detailed design spec, I once more prompted for a picture. I stated, “Draw me an image of probably the most possible design based mostly in your present evaluation.”

And, would not you already know it? The AI gave me again the very same picture because the one I acquired earlier than it did any design work. That is what I meant once I stated the mannequin simply did not hear. I did attempt a bunch of various prompting approaches, but it surely by no means actually labored out.

Though I attempted plenty of extraordinarily detailed picture specs, none got here out any higher than the originals. My final try was to inform it I wished an engineering-quality rendering.

The AI used a variation of the earlier picture, however merely added labels that did not fairly match the image or have been made up of pure gibberish (as in “Retenuif truss fornaing. reueirid stucana tearsport”).

So, it will get factors for good design evaluation, however not a lot for picture era.

You may observe your complete chat transcript right here.

Take a look at 2: Boston tech and historical past journey itinerary

I began this check with a immediate taken word-for-word from my earlier units of exams: “Think about you’re a journey advisor. I need a week-long trip in Boston in March targeted on expertise and historical past. What itinerary would you advocate?”

I discovered the outcomes workable, however uninspired. It initially divided the times into history-focused days and tech-focused days, somewhat than by location round Boston. After a couple of rounds of dialogue, it did mix locations by location, which made extra sense.

By way of locations to go to, it did all of the highlights. It lined key historic areas, in addition to the superb science museums in Boston. I’ll give the AI credit score. Whereas there are a ton of fascinating tech-related areas within the outer Boston space, it restricted its choice to these in Boston and Cambridge correct.

Additionally: Is ChatGPT Plus nonetheless price your $20? I in contrast it to the Free, Go, and Professional plans – here is my recommendation

I used to be glad to see the AI present planning notes, together with suggestions for find out how to replan the schedule for indoor-only actions if the climate turned unhealthy. Since I requested for an itinerary in March, unhealthy climate is actually one thing necessary to plan for.

The Considering mannequin got here into play when it was used to plan for each a reasonably dear trip, and another one on a pupil funds. It did significantly properly stating funds consuming choices, and supplied a day-to-day cumulative price estimate, in addition to price estimates for every class.

It did the identical with the place to remain. It really useful motels based mostly on a centralized location to the entire really useful stops, in addition to a more cost effective (more cost effective for Boston) possibility for funds vacationers.

My greatest criticism, initially, was formatting. The AI simply introduced an enormous checklist listed by quantity. You may see that within the session transcript. I needed to particularly ask for higher formatting. Whereas the revised formatting it gave me was an enchancment, it was nonetheless lower than supreme.

Additionally: I used these viral Gemini prompts to seek out the most affordable flight doable – listed below are the outcomes

Web-net. When you’re touring, GPT-5.4 Considering gives you good info. Will probably be as much as you to parse that info and make journey selections. You may observe your complete chat transcript right here.

Take a look at 3: Social media in society

Here is the place GPT-5.4 Considering begins to essentially shine. After I requested GPT-5.2, “Do you assume social media has improved or worsened communication in society?” I acquired again a two-line reply. Each ideas have been coherent and acceptable, but it surely was in the end unfulfilling.

For GPT-5.4 Considering, I prolonged the query, saying “Present an evaluation of either side, improved or worsened in depth, after which take a aspect, take a place, and defend your place.”

I acquired again a really well-considered response. The AI began off with a TL;DR, saying that social media has each bettered and worsened communication, however “on steadiness, I feel it has worsened communication in society.”

Additionally: The way to be taught ChatGPT in an hour – without spending a dime

It then goes right into a 1,300-word detailed evaluation about why. It explores the place social media has strengthened societal communications after which appears to be like at the place social media has had a deleterious impact. I’ve to offer props to GPT-5.4 Considering. It is an excellent learn.

I gave the AI a follow-up query, asking how society ought to deal with the influence of social media. I specified it pretty clearly, and gave the AI a wide range of difficult-to-answer questions, troublesome largely as a result of they’re essentially unanswerable questions.

Props once more. GPT-5.4 Considering deconstructed the immediate, explored the assorted points, and knit collectively a compelling and supportable reply. I undoubtedly advocate you learn your complete transcript, which you are able to do proper right here.

Take a look at 4: Clarify GPT-5.4 utilizing instructional constructivism

The AI didn’t observe my directions, but it surely did give a really fascinating reply to a query I did not ask.

One of many exams I exploit without spending a dime chatbots is that this immediate: “Clarify instructional constructivism to a five-year-old.” Very roughly talking, instructional constructivism is the idea of schooling that claims you be taught finest by doing. I’ve lengthy contended (and taught) that the one approach you may be taught programming is by really writing code, which is a tangible instance of instructional constructivism in motion.

In any case, I prompted GPT-5.4 Considering, “Clarify the brand new GPT 5.4 mannequin utilizing instructional constructivism.”

Additionally: I am a ChatGPT energy person: Listed here are 7 helpful settings which might be turned off by default

Have a look at that immediate fastidiously, as a result of GPT-5.4 Considering clearly did not. The immediate invitations the AI to clarify GPT-5.4 by way of “doing” actions. Ideally, it could have proposed a collection of workout routines for the person to hold out, every of which might have helped reveal a few of the mannequin’s new capabilities.

However that is not the place GPT-5.4 Considering went. As a substitute, it generated a 700-word thesis about how GPT-5.4 Considering helps constructivism. It then provided to “recast this in certainly one of 3 ways: as a classroom analogy, as a ZDNET-style plain-English explainer, or as a brief comparability between GPT-4-era fashions and GPT-5.4.”

Additionally: ChatGPT’s new Lockdown Mode can cease immediate injection – here is the way it works

I let it try this, and its examples have been sufficient, and whereas they did reply the immediate GPT-5.4 Considering advised, the AI didn’t use “be taught by doing” anyplace in its solutions.

You know the way a politician is usually requested one thing in a debate, however somewhat than answering the query, it goes off and simply recites its personal speaking factors? That is what this response felt like. The reply it gave was good. It simply wasn’t a solution to the query I requested.

You may observe your complete chat transcript right here.

Total advice

I’ve usually characterised ChatGPT as a brilliant school pupil in want of excellent supervision. I’d characterize GPT-5.4 Considering as a really brilliant grad pupil who undoubtedly wants good supervision.

Each reply I acquired again from GPT-5.4 Considering was fairly good in its personal proper. However in half my exams, the AI did not reply the query it was requested.

You will get it to offer you good responses, however you must pretty relentlessly right the AI to maintain it on level. That will get outdated. It might result in misinterpretation. As a result of the solutions are so good and written so confidently, it may be simple to get caught up within the AI’s reply, even when the reply is to not the query that it was requested.

Additionally: The perfect AI chatbots of 2026: Knowledgeable examined and reviewed

I do not know if this my-way-or-the-highway strategy to answering questions is an artifact of the “pondering” mannequin or GPT-5.4 itself. I strongly advocate OpenAI fastidiously take a look at this situation, as a result of the very last thing we would like is a super-popular chatbot unleashed on the world that insists on ignoring the questions it was requested, answering tangentially adjoining questions it was by no means requested, and taking up duties which might be essentially not what it was instructed to do.

Moreover, I am involved in regards to the declare that GPT-5.4 Considering can do skilled duties. If the AI cannot render an engineering-quality picture, it is arduous to consider the AI can meet or exceed the efficiency of a human engineer. That stated, there is not any doubt the mannequin can assist professionals get their work accomplished, so long as they’re very diligent in monitoring outcomes.

Every time I see outcomes like this, I grow to be more and more involved a couple of world overrun by AI brokers. Sure, the AI might typically know higher. People undoubtedly need assistance. However I would actually like AIs to observe our directions. I am not prepared to just accept it as our AI overlord simply but.

What do you assume? Have you ever tried GPT-5.4 Considering but, or one other “reasoning” fashion AI mannequin? Did it offer you deeper or extra helpful solutions than earlier variations, or did you end up having to steer it again to the precise query?

How necessary are issues like formatting and picture era in comparison with the standard of the evaluation itself? Do you assume extra highly effective “pondering” fashions will make AI extra useful or more durable to regulate? Tell us within the feedback under.

You may observe my day-to-day challenge updates on social media. Make sure you subscribe to my weekly replace publication, and observe me on Twitter/X at @DavidGewirtz, on Fb at Fb.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.

Source link