• Latest
  • Trending
  • All
  • Market Updates
  • Cryptocurrency
  • Blockchain
  • Investing
  • Commodities
  • Personal Finance
  • Technology
  • Business
  • Real Estate
  • Finance
From hallucinations to hardware: Lessons from a real-world computer vision project gone sideways

From hallucinations to hardware: Lessons from a real-world computer vision project gone sideways

June 29, 2025
Grok Is Being Used to Mock and Strip Women in Hijabs and Saris

Grok Is Being Used to Mock and Strip Women in Hijabs and Saris

January 12, 2026
Soft Manager – Trading Ideas – 5 August 2025

Top 5 High-Impact Economic Events This Week (January 12–16, 2026) – Analytics & Forecasts – 12 January 2026

January 11, 2026
Iran’s Revolutionary Guard Moved $1 Billion Through UK Crypto Exchanges

Iran’s Revolutionary Guard Moved $1 Billion Through UK Crypto Exchanges

January 11, 2026
‘We Are in an Ethereum Market’ — Crypto Market Analyst

‘We Are in an Ethereum Market’ — Crypto Market Analyst

January 11, 2026
EUR wobbles – France budget at risk as confidence votes threaten government collapse

EUR wobbles – France budget at risk as confidence votes threaten government collapse

January 11, 2026
Bitcoin Mining Pressure Eases After First Difficulty Adjustment Of The Year

Bitcoin Mining Pressure Eases After First Difficulty Adjustment Of The Year

January 11, 2026
Forget Meta Ray-Bans: These smart glasses are customizable from the lenses to the frames

Forget Meta Ray-Bans: These smart glasses are customizable from the lenses to the frames

January 11, 2026
CES 2026: 7 biggest news stories across TVs, laptops, and other weird gadgets you missed

CES 2026: 7 biggest news stories across TVs, laptops, and other weird gadgets you missed

January 11, 2026
Labour market is steady, but hiring remains uncomfortably narrow

Labour market is steady, but hiring remains uncomfortably narrow

January 11, 2026
BitMine’s Total Staked ETH Holdings Surpass 1 Million

BitMine’s Total Staked ETH Holdings Surpass 1 Million

January 11, 2026
Grok is spreading misinformation about the Bondi Beach shooting

The latest on Grok’s gross AI deepfakes problem

January 11, 2026
Newsquawk Week Ahead: US NFP, ISM PMIs, EZ Flash CPI, UK Retail Sales, and Canada Jobs

Newsquawk Week Ahead: US Earnings, US CPI, US Retail Sales, UK GDP, and China Trade

January 11, 2026
Monday, January 12, 2026
No Result
View All Result
InvestorNewsToday.com
  • Home
  • Market
  • Business
  • Finance
  • Investing
  • Real Estate
  • Commodities
  • Crypto
  • Blockchain
  • Personal Finance
  • Tech
InvestorNewsToday.com
No Result
View All Result
Home Technology

From hallucinations to hardware: Lessons from a real-world computer vision project gone sideways

by Investor News Today
June 29, 2025
in Technology
0
From hallucinations to hardware: Lessons from a real-world computer vision project gone sideways
491
SHARES
1.4k
VIEWS
Share on FacebookShare on Twitter

Be a part of the occasion trusted by enterprise leaders for almost twenty years. VB Rework brings collectively the folks constructing actual enterprise AI technique. Be taught extra


Pc imaginative and prescient initiatives not often go precisely as deliberate, and this one was no exception. The thought was easy: Construct a mannequin that would take a look at a photograph of a laptop computer and determine any bodily harm — issues like cracked screens, lacking keys or damaged hinges. It appeared like a simple use case for picture fashions and huge language fashions (LLMs), however it rapidly was one thing extra sophisticated.

Alongside the way in which, we bumped into points with hallucinations, unreliable outputs and pictures that weren’t even laptops. To unravel these, we ended up making use of an agentic framework in an atypical method — not for job automation, however to enhance the mannequin’s efficiency.

On this put up, we’ll stroll via what we tried, what didn’t work and the way a mixture of approaches ultimately helped us construct one thing dependable.

The place we began: Monolithic prompting

Our preliminary method was pretty commonplace for a multimodal mannequin. We used a single, giant immediate to move a picture into an image-capable LLM and requested it to determine seen harm. This monolithic prompting technique is easy to implement and works decently for clear, well-defined duties. However real-world knowledge not often performs alongside.

We bumped into three main points early on:

  • Hallucinations: The mannequin would generally invent harm that didn’t exist or mislabel what it was seeing.
  • Junk picture detection: It had no dependable method to flag pictures that weren’t even laptops, like footage of desks, partitions or folks sometimes slipped via and acquired nonsensical harm reviews.
  • Inconsistent accuracy: The mixture of those issues made the mannequin too unreliable for operational use.

This was the purpose when it grew to become clear we would wish to iterate.

First repair: Mixing picture resolutions

One factor we seen was how a lot picture high quality affected the mannequin’s output. Customers uploaded all types of pictures starting from sharp and high-resolution to blurry. This led us to consult with analysis highlighting how picture decision impacts deep studying fashions.

We educated and examined the mannequin utilizing a mixture of high-and low-resolution pictures. The thought was to make the mannequin extra resilient to the big selection of picture qualities it will encounter in follow. This helped enhance consistency, however the core problems with hallucination and junk picture dealing with endured.

The multimodal detour: Textual content-only LLM goes multimodal

Inspired by latest experiments in combining picture captioning with text-only LLMs — just like the approach coated in The Batch, the place captions are generated from pictures after which interpreted by a language mannequin, we determined to offer it a attempt.

Right here’s the way it works:

  • The LLM begins by producing a number of doable captions for a picture. 
  • One other mannequin, referred to as a multimodal embedding mannequin, checks how nicely every caption matches the picture. On this case, we used SigLIP to attain the similarity between the picture and the textual content.
  • The system retains the highest few captions primarily based on these scores.
  • The LLM makes use of these high captions to put in writing new ones, making an attempt to get nearer to what the picture truly exhibits.
  • It repeats this course of till the captions cease bettering, or it hits a set restrict.

Whereas intelligent in concept, this method launched new issues for our use case:

  • Persistent hallucinations: The captions themselves generally included imaginary harm, which the LLM then confidently reported.
  • Incomplete protection: Even with a number of captions, some points had been missed totally.
  • Elevated complexity, little profit: The added steps made the system extra sophisticated with out reliably outperforming the earlier setup.

It was an fascinating experiment, however finally not an answer.

A inventive use of agentic frameworks

This was the turning level. Whereas agentic frameworks are normally used for orchestrating job flows (suppose brokers coordinating calendar invitations or customer support actions), we puzzled if breaking down the picture interpretation job into smaller, specialised brokers would possibly assist.

We constructed an agentic framework structured like this:

  • Orchestrator agent: It checked the picture and recognized which laptop computer elements had been seen (display, keyboard, chassis, ports).
  • Element brokers: Devoted brokers inspected every element for particular harm varieties; for instance, one for cracked screens, one other for lacking keys.
  • Junk detection agent: A separate agent flagged whether or not the picture was even a laptop computer within the first place.

This modular, task-driven method produced rather more exact and explainable outcomes. Hallucinations dropped dramatically, junk pictures had been reliably flagged and every agent’s job was easy and targeted sufficient to regulate high quality nicely.

The blind spots: Commerce-offs of an agentic method

As efficient as this was, it was not excellent. Two predominant limitations confirmed up:

  • Elevated latency: Operating a number of sequential brokers added to the full inference time.
  • Protection gaps: Brokers might solely detect points they had been explicitly programmed to search for. If a picture confirmed one thing sudden that no agent was tasked with figuring out, it will go unnoticed.

We wanted a method to stability precision with protection.

The hybrid answer: Combining agentic and monolithic approaches

To bridge the gaps, we created a hybrid system:

  1. The agentic framework ran first, dealing with exact detection of identified harm varieties and junk pictures. We restricted the variety of brokers to probably the most important ones to enhance latency.
  2. Then, a monolithic picture LLM immediate scanned the picture for the rest the brokers might need missed.
  3. Lastly, we fine-tuned the mannequin utilizing a curated set of pictures for high-priority use circumstances, like continuously reported harm eventualities, to additional enhance accuracy and reliability.

This mix gave us the precision and explainability of the agentic setup, the broad protection of monolithic prompting and the boldness increase of focused fine-tuning.

What we discovered

Just a few issues grew to become clear by the point we wrapped up this venture:

  • Agentic frameworks are extra versatile than they get credit score for: Whereas they’re normally related to workflow administration, we discovered they might meaningfully increase mannequin efficiency when utilized in a structured, modular method.
  • Mixing completely different approaches beats counting on only one: The mixture of exact, agent-based detection alongside the broad protection of LLMs, plus a little bit of fine-tuning the place it mattered most, gave us much more dependable outcomes than any single methodology by itself.
  • Visible fashions are liable to hallucinations: Even the extra superior setups can leap to conclusions or see issues that aren’t there. It takes a considerate system design to maintain these errors in test.
  • Picture high quality selection makes a distinction: Coaching and testing with each clear, high-resolution pictures and on a regular basis, lower-quality ones helped the mannequin keep resilient when confronted with unpredictable, real-world photographs.
  • You want a method to catch junk pictures: A devoted test for junk or unrelated footage was one of many easiest adjustments we made, and it had an outsized impression on total system reliability.

Last ideas

What began as a easy thought, utilizing an LLM immediate to detect bodily harm in laptop computer pictures, rapidly was a a lot deeper experiment in combining completely different AI strategies to deal with unpredictable, real-world issues. Alongside the way in which, we realized that a number of the most helpful instruments had been ones not initially designed for one of these work.

Agentic frameworks, typically seen as workflow utilities, proved surprisingly efficient when repurposed for duties like structured harm detection and picture filtering. With a little bit of creativity, they helped us construct a system that was not simply extra correct, however simpler to know and handle in follow.

Shruti Tiwari is an AI product supervisor at Dell Applied sciences.

Vadiraj Kulkarni is a knowledge scientist at Dell Applied sciences.

Day by day insights on enterprise use circumstances with VB Day by day

If you wish to impress your boss, VB Day by day has you coated. We provide the inside scoop on what firms are doing with generative AI, from regulatory shifts to sensible deployments, so you possibly can share insights for optimum ROI.

Learn our Privateness Coverage

Thanks for subscribing. Try extra VB newsletters right here.

An error occured.



Source link
Tags: computerhallucinationsHardwareLessonsprojectrealworldSidewaysVision
Share196Tweet123
Previous Post

Trading Baskets Instead of Individual Instruments: The Evolution of My Approach to Risk and Profitability – My Trading – 29 June 2025

Next Post

Bitcoin Market Enters Neutral Zone, On-Chain Data Shows

Investor News Today

Investor News Today

Next Post
Bitcoin Market Enters Neutral Zone, On-Chain Data Shows

Bitcoin Market Enters Neutral Zone, On-Chain Data Shows

  • Trending
  • Comments
  • Latest
Want a Fortell Hearing Aid? Well, Who Do You Know?

Want a Fortell Hearing Aid? Well, Who Do You Know?

December 3, 2025
Private equity groups prepare to offload Ensemble Health for up to $12bn

Private equity groups prepare to offload Ensemble Health for up to $12bn

May 16, 2025
The human harbor: Navigating identity and meaning in the AI age

The human harbor: Navigating identity and meaning in the AI age

July 14, 2025
Lars Windhorst’s Tennor Holding declared bankrupt

Lars Windhorst’s Tennor Holding declared bankrupt

June 18, 2025
Why America’s economy is soaring ahead of its rivals

Why America’s economy is soaring ahead of its rivals

0
Dollar climbs after Donald Trump’s Brics tariff threat and French political woes

Dollar climbs after Donald Trump’s Brics tariff threat and French political woes

0
Nato chief Mark Rutte’s warning to Trump

Nato chief Mark Rutte’s warning to Trump

0
Top Federal Reserve official warns progress on taming US inflation ‘may be stalling’

Top Federal Reserve official warns progress on taming US inflation ‘may be stalling’

0
Grok Is Being Used to Mock and Strip Women in Hijabs and Saris

Grok Is Being Used to Mock and Strip Women in Hijabs and Saris

January 12, 2026
Soft Manager – Trading Ideas – 5 August 2025

Top 5 High-Impact Economic Events This Week (January 12–16, 2026) – Analytics & Forecasts – 12 January 2026

January 11, 2026
Iran’s Revolutionary Guard Moved $1 Billion Through UK Crypto Exchanges

Iran’s Revolutionary Guard Moved $1 Billion Through UK Crypto Exchanges

January 11, 2026
‘We Are in an Ethereum Market’ — Crypto Market Analyst

‘We Are in an Ethereum Market’ — Crypto Market Analyst

January 11, 2026

Live Prices

© 2024 Investor News Today

No Result
View All Result
  • Home
  • Market
  • Business
  • Finance
  • Investing
  • Real Estate
  • Commodities
  • Crypto
  • Blockchain
  • Personal Finance
  • Tech

© 2024 Investor News Today