• Latest
  • Trending
  • All
  • Market Updates
  • Cryptocurrency
  • Blockchain
  • Investing
  • Commodities
  • Personal Finance
  • Technology
  • Business
  • Real Estate
  • Finance
ByteDance’s UI-TARS can take over your computer, outperforms GPT-4o and Claude

ByteDance’s UI-TARS can take over your computer, outperforms GPT-4o and Claude

January 23, 2025
Brent retreats after failing to break above 200-DMA – Société Générale

WTI remains below $57.00 due to oversupply, demand concerns

October 21, 2025
Standard Chartered lifts its China's 2025 GDP forecast to 4.9% (from 4.8%)

Standard Chartered lifts its China's 2025 GDP forecast to 4.9% (from 4.8%)

October 21, 2025
Bitcoin: Smart money holds, while STHs test the waters – What’s next?

Bitcoin: Smart money holds, while STHs test the waters – What’s next?

October 21, 2025
Empower Free Financial Review: What You Can Expect And Learn

Empower Free Financial Review: What You Can Expect And Learn

October 21, 2025
Ethereum Needs Paradigm, VCs, Despite Value Extraction: Joseph Lubin

Ethereum Needs Paradigm, VCs, Despite Value Extraction: Joseph Lubin

October 20, 2025
Zocdoc CEO: “Dr. Google is going to be replaced by Dr. AI”

Zocdoc CEO: “Dr. Google is going to be replaced by Dr. AI”

October 20, 2025
50+ Windows keyboard shortcuts that effectively improved my work productivity

50+ Windows keyboard shortcuts that effectively improved my work productivity

October 20, 2025
AURA ULTIMATE EA – HOW TO SET UP – Analytics & Forecasts – 20 October 2025

AURA ULTIMATE EA – HOW TO SET UP – Analytics & Forecasts – 20 October 2025

October 20, 2025
Goldman Sachs outlines S&P500 reaction expected to jobs report – looks for NFP sweet spot

Goldman Sachs on US CPI & jobs – labor market indicators more reliable on recession risk

October 20, 2025
How This 5-Stock AI Portfolio Outperformed Every Index

How This 5-Stock AI Portfolio Outperformed Every Index

October 20, 2025
Stocks making the biggest moves premarket: CLF, LBRT, HOLX

Stocks making the biggest moves premarket: CLF, LBRT, HOLX

October 20, 2025
Dogecoin Eyes 25% Boom as Elon Musk Posts DOGE Tweet

Dogecoin Eyes 25% Boom as Elon Musk Posts DOGE Tweet

October 20, 2025
Tuesday, October 21, 2025
No Result
View All Result
InvestorNewsToday.com
  • Home
  • Market
  • Business
  • Finance
  • Investing
  • Real Estate
  • Commodities
  • Crypto
  • Blockchain
  • Personal Finance
  • Tech
InvestorNewsToday.com
No Result
View All Result
Home Technology

ByteDance’s UI-TARS can take over your computer, outperforms GPT-4o and Claude

by Investor News Today
January 23, 2025
in Technology
0
ByteDance’s UI-TARS can take over your computer, outperforms GPT-4o and Claude
491
SHARES
1.4k
VIEWS
Share on FacebookShare on Twitter

Be part of our every day and weekly newsletters for the most recent updates and unique content material on industry-leading AI protection. Be taught Extra


A brand new AI agent has emerged from the father or mother firm of TikTok to take management of your laptop and carry out advanced workflows.

Very similar to Anthropic’s Laptop Use, ByteDance’s new UI-TARS understands graphical consumer interfaces (GUIs), applies reasoning and takes autonomous, step-by-step motion. 

Skilled on roughly 50B tokens and supplied in 7B and 72B parameter variations, the PC/MacOS brokers achieves state-of-the-art (SOTA) efficiency on 10-plus GUI benchmarks throughout efficiency, notion, grounding and general agent capabilities, constantly beating out OpenAI’s GPT-4o, Claude and Google’s Gemini.

“By means of iterative coaching and reflection tuning, UI-TARS repeatedly learns from its errors and adapts to unexpected conditions with minimal human intervention,” researchers from ByteDance and Tsinghua College write in a brand new analysis paper. 

Supply: Arxiv

How UI-TARS explains its considering

UI-TARS works throughout desktop, cellular and net purposes, utilizing multimodal inputs (textual content, photographs, interactions) to grasp visible environments.

Its UI options two tabs — one to the left exhibiting its step-by-step “considering,” and a bigger one to the correct the place it pulls up recordsdata, web sites and apps and robotically takes motion. 

For instance, in a demo video launched as we speak, the mannequin is prompted to “Discover spherical journey flights from SEA to NYC on the fifth and return on the tenth subsequent month and filter by worth in ascending order.”

In response, UI-TARS navigates to the web site for Delta Airways, fills within the “from” and “to” fields, clicks within the related dates and types and filters by worth, explaining every step in its considering field earlier than taking motion. 

In one other state of affairs, it’s instructed to put in an autoDocstring extension in VS Code. Right here is its step-by-step considering because it completes that process: 

  • It first reviews that “to proceed with putting in the autoDocstring extension in VS Code, I first have to open the VS Code utility.” 
  • Recognizing that the VS Code window has not absolutely loaded, the mannequin notes that “it’s needed to attend for the appliance to complete initializing earlier than continuing with the set up of the autoDocstring extension. This ensures that each one required functionalities can be found for the subsequent steps.” 
  • It then explains that it should entry the Extensions view in VS Code, which could be accomplished by clicking on a tab within the left sidebar. 
  • Working right into a small glitch, it causes: “The earlier try to click on on the ‘Extensions’ tab could not have been exact sufficient, so I’ll strive clicking on it once more to make sure the Extensions view opens efficiently.”
  • As soon as within the Extensions view, it varieties in ‘autoDocstring’ earlier than concluding: “For the reason that set up course of has already been initiated, there isn’t any additional motion required at this second. I’ll look forward to the set up to finish and make sure its success as soon as the method is completed.”

Outperforming its rivals

Throughout a wide range of benchmarks, researchers report that UI-TARS constantly outranked OpenAI’s GPT-4o; Anthropic’s Claude-3.5-Sonnet; Gemini-1.5-Professional and Gemini-2.0; 4 Qwen fashions; and quite a few educational fashions.

As an illustration, in VisualWebBench — which measures a mannequin’s potential to floor net components together with webpage high quality assurance and optical character recognition — UI-TARS 72B scored 82.8%, outperforming GPT-4o (78.5%) and Claude 3.5 (78.2%). 

It additionally did considerably higher on WebSRC benchmarks (understanding of semantic content material and format in net contexts) and ScreenQA-short (comprehension of advanced cellular display layouts and net construction). UI-TARS-7B achieved main scores of 93.6% on WebSRC, whereas UI-TARS-72B achieved 88.6% on ScreenQA-short, outperforming Qwen, Gemini, Claude 3.5 and GPT-4o. 

“These outcomes show the superior notion and comprehension capabilities of UI-TARS in net and cellular environments,” the researchers write. “Such perceptual potential lays the muse for agent duties, the place correct environmental understanding is essential for process execution and decision-making.”

UI-TARS additionally confirmed spectacular ends in ScreenSpot Professional and ScreenSpot v2 , which assess a mannequin’s potential to grasp and localize components in GUIs. Additional, researchers examined its capabilities in planning multi-step actions and low-level duties in cellular environments, and benchmarked it on OSWorld (which assesses open-ended laptop duties) and AndroidWorld (which scores autonomous brokers on 116 programmatic duties throughout 20 cellular apps). 

Supply: Arxiv
Supply: Arxiv

Beneath the hood

To assist it take step-by-step actions and acknowledge what it’s seeing, UI-TARS was skilled on a large-scale dataset of screenshots that parsed metadata together with ingredient description and kind, visible description, bounding bins (place info), ingredient perform and textual content from varied web sites, purposes and working programs. This enables the mannequin to supply a complete, detailed description of a screenshot, capturing not solely components however spatial relationships and general format. 

The mannequin additionally makes use of state transition captioning to establish and describe the variations between two consecutive screenshots and decide whether or not an motion — equivalent to a mouse click on or keyboard enter — has occurred. In the meantime, set-of-mark (SoM) prompting permits it to overlay distinct marks (letters, numbers) on particular areas of a picture. 

The mannequin is provided with each short-term and long-term reminiscence to deal with duties at hand whereas additionally retaining historic interactions to enhance later decision-making. Researchers skilled the mannequin to carry out each System 1 (quick, computerized and intuitive) and System 2 (gradual and deliberate) reasoning. This enables for multi-step decision-making, “reflection” considering, milestone recognition and error correction. 

Researchers emphasised that it’s vital that the mannequin have the ability to preserve constant objectives and interact in trial and error to hypothesize, check and consider potential actions earlier than finishing a process. They launched two sorts of information to help this: error correction and post-reflection information. For error correction, they recognized errors and labeled corrective actions; for post-reflection, they simulated restoration steps. 

“This technique ensures that the agent not solely learns to keep away from errors but additionally adapts dynamically once they happen,” the researchers write.

Clearly, UI-TARS displays spectacular capabilities, and it’ll be fascinating to see its evolving use instances within the more and more aggressive AI brokers area. Because the researchers be aware: “Wanting forward, whereas native brokers signify a big leap ahead, the long run lies within the integration of energetic and lifelong studying, the place brokers autonomously drive their very own studying by steady, real-world interactions.”

Researchers level out that Claude Laptop Use “performs strongly in web-based duties however considerably struggles with cellular eventualities, indicating that the GUI operation potential of Claude has not been effectively transferred to the cellular area.” 

Against this, “UI-TARS displays glorious efficiency in each web site and cellular area.” 

Each day insights on enterprise use instances with VB Each day

If you wish to impress your boss, VB Each day has you lined. We provide the inside scoop on what corporations are doing with generative AI, from regulatory shifts to sensible deployments, so you’ll be able to share insights for optimum ROI.

Learn our Privateness Coverage

Thanks for subscribing. Take a look at extra VB newsletters right here.

An error occured.



Source link
Tags: ByteDancesClaudecomputerGPT4ooutperformsUITARS
Share196Tweet123
Previous Post

Grid Manual – User Guide – Other – 23 January 2025

Next Post

Record number of US companies weigh China exit as Trump tensions rise

Investor News Today

Investor News Today

Next Post
Record number of US companies weigh China exit as Trump tensions rise

Record number of US companies weigh China exit as Trump tensions rise

  • Trending
  • Comments
  • Latest
Private equity groups prepare to offload Ensemble Health for up to $12bn

Private equity groups prepare to offload Ensemble Health for up to $12bn

May 16, 2025
The human harbor: Navigating identity and meaning in the AI age

The human harbor: Navigating identity and meaning in the AI age

July 14, 2025
Equinor scales back renewables push 7 years after ditching ‘oil’ from its name

Equinor scales back renewables push 7 years after ditching ‘oil’ from its name

February 5, 2025
Niels Troost has a staggering story to tell about how he got sanctioned

Niels Troost has a staggering story to tell about how he got sanctioned

December 14, 2024
Why America’s economy is soaring ahead of its rivals

Why America’s economy is soaring ahead of its rivals

0
Dollar climbs after Donald Trump’s Brics tariff threat and French political woes

Dollar climbs after Donald Trump’s Brics tariff threat and French political woes

0
Nato chief Mark Rutte’s warning to Trump

Nato chief Mark Rutte’s warning to Trump

0
Top Federal Reserve official warns progress on taming US inflation ‘may be stalling’

Top Federal Reserve official warns progress on taming US inflation ‘may be stalling’

0
Brent retreats after failing to break above 200-DMA – Société Générale

WTI remains below $57.00 due to oversupply, demand concerns

October 21, 2025
Standard Chartered lifts its China's 2025 GDP forecast to 4.9% (from 4.8%)

Standard Chartered lifts its China's 2025 GDP forecast to 4.9% (from 4.8%)

October 21, 2025
Bitcoin: Smart money holds, while STHs test the waters – What’s next?

Bitcoin: Smart money holds, while STHs test the waters – What’s next?

October 21, 2025
Empower Free Financial Review: What You Can Expect And Learn

Empower Free Financial Review: What You Can Expect And Learn

October 21, 2025

Live Prices

© 2024 Investor News Today

No Result
View All Result
  • Home
  • Market
  • Business
  • Finance
  • Investing
  • Real Estate
  • Commodities
  • Crypto
  • Blockchain
  • Personal Finance
  • Tech

© 2024 Investor News Today