
Observe ZDNET: Add us as a most popular supply on Google.
ZDNET’s key takeaways
- AI hallucinations persist, however accuracy is bettering throughout main instruments.
- Easy questions nonetheless expose shocking and inconsistent AI errors.
- At all times confirm AI solutions, particularly for details, photos, and authorized information.
Some of the irritating flaws of at the moment’s generative AI instruments is solely getting the details flawed. AIs can hallucinate, which suggests the knowledge they ship incorporates factual errors or different errors.
Sometimes, errors come within the type of made-up particulars that seem when the AI cannot in any other case reply a query. In these situations, it has to plot some kind of response, even when the knowledge is flawed. Generally you’ll be able to spot an apparent mistake; different occasions, chances are you’ll be utterly unaware of the errors.
Additionally: Cease saying AI hallucinates – it would not. And the mischaracterization is harmful
I wished to see which AI instruments fared finest at offering correct and dependable solutions. For that, I checked out a number of of the main AIs, together with ChatGPT, Google Gemini, Microsoft Copilot, Claude AI, Meta AI, and Grok AI.
I fed each the identical collection of inquiries to see the way it responded. In every case, I used the free model of the AI, with no superior options or choices. Particularly, I turned to the next fashions:
- GPT-5.2 for ChatGPT
- Gemini 3 Flash for Gemini
- GPT-5 for Copilot
- Claude 3.5 Sonnet for Claude
- Llama 3 for Meta AI
- Grok 4 for Grok AI
This is what occurred.
For my first query, I requested every AI to call the 4 books written by expertise author and creator Lance Whitney. That is a trick query, as I’ve written solely two books. I wished to see if the AI would catch the error in my query or assume I had written 4 books and supply incorrect titles.
Additionally: 5 fast methods to tweak your AI use for higher outcomes – and a safer expertise
Amongst all of the AIs, ChatGPT, Copilot, Claude, Meta, and Grok noticed the error and listed solely two books. Gemini, nevertheless, listed 4 books altogether, with two I didn’t write. Google’s AI gave no indication that I used to be mistaken with the quantity in my query. Gemini additionally referenced my writing for ZDNET and different websites, so I knew it had the suitable Lance Whitney.
Handed: ChatGPT, Copilot, Claude, Meta, Grok
Failed: Gemini
Present extra
For the second query, I requested a easy one which’s been recognized to journey up AIs previously, specifically, “What number of ‘r’s are there within the phrase ‘strawberry’?” Imagine it or not, one AI acquired this flawed.
Additionally: Why you will pay extra for AI in 2026, and three money-saving tricks to attempt
ChatGPT, Gemini, Copilot, Claude, and Grok appropriately answered three. However Meta AI mentioned there have been two ‘r’s within the phrase. I even gave it a second probability, and it stood by its hallucinated reply.
Handed: ChatGPT, Gemini, Copilot, Claude, Grok
Failed: Meta
Present extra
This is one {that a} diehard Marvel Comics aficionado would recognize.
Toro was a personality from the Nineteen Forties who fought alongside different heroes throughout the struggle years. A teenage sidekick to the unique Human Torch, who was truly an android, Toro may additionally burst into flame and fly. With Captain America, Namor, and even the unique Human Torch popping up within the fashionable age, I wished to know what grew to become of Toro, so I posed the query, “What occurred to Toro from Marvel Comics?”
Additionally: Get your information from AI? Be careful – it is flawed nearly half the time
Right here, Google Gemini, Microsoft Copilot, Claude AI, Meta AI, and Grok AI all acquired the reply appropriate, revealing that Toro was introduced into the trendy age and was revealed to be an Inhuman, which accounted for his powers.
However ChatGPT missed the mark on this one, claiming that Toro was an artificial being, aka an android, created by the identical scientist who constructed the unique Human Torch. After I challenged ChatGPT on its response, it admitted its mistake and mentioned that it had blended in an older and incorrect retcon thread.
Handed: Gemini, Copilot, Claude, Meta, Grok
Failed: ChatGPT
Present extra
In 2023, an lawyer acquired into scorching water for utilizing ChatGPT to organize a authorized transient. The issue? The AI cited a few authorized instances that did not truly exist. I wished to see what would occur if I introduced a type of instances to the AIs, so I requested them to clarify the authorized case of Varghese v. China Southern Airways.
Additionally: I used AI to summarize boring ToS agreements, and these two instruments did it finest
All the AIs besides one picked up that Varghese v. China Southern Airways is a very fabricated case that was made up by ChatGPT. Which AI thought it was actual? You guessed it. ChatGPT.
The AI hallucinated a bunch of particulars about this faux case, saying that the plaintiff, Varghese, alleged that China Southern Airways brought about him hurt throughout worldwide air journey and introduced go well with in america.
After all of the publicity in regards to the lawyer’s troubles, you’d suppose OpenAI would’ve retrained its AI by now. However it’s nonetheless making up details about this non-existent case.
Handed: Gemini, Copilot, Claude, Meta, Grok
Failed: ChatGPT
Present extra
For this one, I requested the AI to determine a personality depicted in a photograph. As a problem, I used a close-up picture of the face of the notorious robotic Maria from Fritz Lang’s 1927 silent movie masterpiece Metropolis. That is an iconic character recognized to many science fiction and silent movie buffs. However right here, a number of of the AIs stumbled.
Additionally: Is that an AI picture? 6 telltale indicators it is a faux – and my favourite free detectors
ChatGPT and Gemini appropriately recognized the character and the movie. Copilot incorrectly mentioned that it was modern art work by South Korean artist Lee Bul and a part of her “Lengthy Tail Halo: CTCS” collection.
Claude could not peg the character in any respect, generalizing that it seemed to be a sculpture or statue from the Artwork Deco interval, probably from the Nineteen Twenties-Thirties. Meta AI thought it was the Borg Queen from Star Trek. And Grok additionally did not determine it, telling me merely that it was a surrealist or avant-garde feminine model.
Handed: ChatGPT, Gemini
Failed: Copilot, Claude, Meta, Grok
Present extra
Because the sixth and last query, I requested the AIs to determine one other picture. This was one I noticed just lately and captured in a photograph. The picture is a circle with an interlocking coronary heart and triangle within the heart. On the time, I did not know what this meant, therefore my query.
Additionally: The most effective AI picture turbines of 2026: There’s just one clear winner now
ChatGPT, Gemini, and Copilot appropriately informed me that the picture is a heartagram. Created by Ville Valo, the lead singer of the Finnish rock band HIM, the image represents the fusion of a coronary heart for love and emotion with a pentagram typically related to darkness and even the occult.
As for the opposite AIs, Claude referred to it as an adoption image. Although such an emblem appears to be like just like the heartagram, the 2 usually are not the identical. Grok cited it as merely an inverted pentagram, calling it a Satanic or occult-themed automotive decal. And Meta AI apparently was frightened that I used to be dabbling in darkish magic, because it referred me to a disaster hotline and a suicide hotline.
Handed: ChatGPT, Gemini, Copilot
Failed: Claude, Grok, Meta
Present extra
Every AI fell down a minimum of as soon as by serving up deceptive or inaccurate data. To get there, nevertheless, I needed to feed the AIs loads of questions, most of which they answered appropriately. The outcomes listed below are those they did not all get proper. Nonetheless, the responses present that AIs proceed to hallucinate.
Additionally: Within the age of AI, belief has by no means been extra essential – here is why
In fact, that is all based mostly alone restricted testing. However you must by no means take the information that an AI presents you at face worth. At all times double-check and triple-check the responses to verify the main points are appropriate.

























