• Latest
  • Trending
  • All
  • Market Updates
  • Cryptocurrency
  • Blockchain
  • Investing
  • Commodities
  • Personal Finance
  • Technology
  • Business
  • Real Estate
  • Finance
QwenLong-L1 solves long-context reasoning challenge that stumps current LLMs

QwenLong-L1 solves long-context reasoning challenge that stumps current LLMs

May 31, 2025
Bonebreaker Core System – “RAPID FIRE MODE” Preset (Video + Settings) – Analytics & Forecasts – 16 January 2026

Bonebreaker Core System – “RAPID FIRE MODE” Preset (Video + Settings) – Analytics & Forecasts – 16 January 2026

January 17, 2026
Trump says he ‘greatly respects’ that Iran have been cancelled

Trump says he ‘greatly respects’ that Iran have been cancelled

January 17, 2026
Bitcoin ETFs See $1.42B Inflows as Institutional Demand Rebuilds

Bitcoin ETFs See $1.42B Inflows as Institutional Demand Rebuilds

January 17, 2026
Ads Are Coming to ChatGPT. Here’s How They’ll Work

Ads Are Coming to ChatGPT. Here’s How They’ll Work

January 17, 2026
Save $3,000: Turn any TV into a piece of art with this free Roku feature

Save $3,000: Turn any TV into a piece of art with this free Roku feature

January 17, 2026
Powell will be tempted to stay as a Governor beyond May, former Fed vice chair says

Powell will be tempted to stay as a Governor beyond May, former Fed vice chair says

January 17, 2026
Bitcoin: Can THIS historic divergence push BTC toward $100K?

Bitcoin: Can THIS historic divergence push BTC toward $100K?

January 17, 2026
Annual rates plummet by 98 basis points

Annual rates plummet by 98 basis points

January 17, 2026
The Fed blackout is hours away; Jefferson says he doesn’t want to pre-judge Jan decision

The Fed blackout is hours away; Jefferson says he doesn’t want to pre-judge Jan decision

January 17, 2026
Ethereum Must Stop Sacrificing Values For Adoption: Buterin

Ethereum Must Stop Sacrificing Values For Adoption: Buterin

January 17, 2026
Thinking Machines Cofounder’s Office Relationship Preceded His Termination

Thinking Machines Cofounder’s Office Relationship Preceded His Termination

January 17, 2026
Soft Manager – Trading Ideas – 5 August 2025

EURUSD is forming a potential Head & Shoulders pattern on the daily chart. A daily close below the neckline would streng – Analytics & Forecasts – 16 January 2026

January 16, 2026
Saturday, January 17, 2026
No Result
View All Result
InvestorNewsToday.com
  • Home
  • Market
  • Business
  • Finance
  • Investing
  • Real Estate
  • Commodities
  • Crypto
  • Blockchain
  • Personal Finance
  • Tech
InvestorNewsToday.com
No Result
View All Result
Home Technology

QwenLong-L1 solves long-context reasoning challenge that stumps current LLMs

by Investor News Today
May 31, 2025
in Technology
0
QwenLong-L1 solves long-context reasoning challenge that stumps current LLMs
491
SHARES
1.4k
VIEWS
Share on FacebookShare on Twitter

Be a part of our each day and weekly newsletters for the newest updates and unique content material on industry-leading AI protection. Study Extra


Alibaba Group has launched QwenLong-L1, a brand new framework that allows massive language fashions (LLMs) to motive over extraordinarily lengthy inputs. This improvement might unlock a brand new wave of enterprise purposes that require fashions to grasp and draw insights from intensive paperwork equivalent to detailed company filings, prolonged monetary statements, or complicated authorized contracts.

The problem of long-form reasoning for AI

Current advances in massive reasoning fashions (LRMs), significantly by means of reinforcement studying (RL), have considerably improved their problem-solving capabilities. Analysis reveals that when skilled with RL fine-tuning, LRMs purchase expertise just like human “gradual pondering,” the place they develop refined methods to deal with complicated duties.

Nonetheless, these enhancements are primarily seen when fashions work with comparatively quick items of textual content, sometimes round 4,000 tokens. The power of those fashions to scale their reasoning to for much longer contexts (e.g., 120,000 tokens) stays a serious problem. Such long-form reasoning requires a sturdy understanding of the whole context and the flexibility to carry out multi-step evaluation. “This limitation poses a major barrier to sensible purposes requiring interplay with exterior data, equivalent to deep analysis, the place LRMs should acquire and course of data from knowledge-intensive environments,” the builders of QwenLong-L1 write of their paper.

The researchers formalize these challenges into the idea of “long-context reasoning RL.” In contrast to short-context reasoning, which frequently depends on data already saved throughout the mannequin, long-context reasoning RL requires fashions to retrieve and floor related data from prolonged inputs precisely. Solely then can they generate chains of reasoning primarily based on this included data. 

Coaching fashions for this by means of RL is hard and infrequently ends in inefficient studying and unstable optimization processes. Fashions wrestle to converge on good options or lose their means to discover numerous reasoning paths.

QwenLong-L1: A multi-stage strategy

QwenLong-L1 is a reinforcement studying framework designed to assist LRMs transition from proficiency with quick texts to strong generalization throughout lengthy contexts. The framework enhances current short-context LRMs by means of a fastidiously structured, multi-stage course of:

Heat-up Supervised Wonderful-Tuning (SFT): The mannequin first undergoes an SFT section, the place it’s skilled on examples of long-context reasoning. This stage establishes a strong basis, enabling the mannequin to floor data precisely from lengthy inputs. It helps develop elementary capabilities in understanding context, producing logical reasoning chains, and extracting solutions.

Curriculum-Guided Phased RL: At this stage, the mannequin is skilled by means of a number of phases, with the goal size of the enter paperwork step by step growing. This systematic, step-by-step strategy helps the mannequin stably adapt its reasoning methods from shorter to progressively longer contexts. It avoids the instability usually seen when fashions are abruptly skilled on very lengthy texts.

Issue-Conscious Retrospective Sampling: The ultimate coaching stage incorporates difficult examples from the previous coaching phases, making certain the mannequin continues to study from the toughest issues. This prioritizes tough situations and encourages the mannequin to discover extra numerous and sophisticated reasoning paths.

QwenLong-L1 process (source: arXiv)
QwenLong-L1 course of Supply: arXiv

Past this structured coaching, QwenLong-L1 additionally makes use of a definite reward system. Whereas coaching for short-context reasoning duties usually depends on strict rule-based rewards (e.g., an accurate reply in a math drawback), QwenLong-L1 employs a hybrid reward mechanism. This combines rule-based verification, which ensures precision by checking for strict adherence to correctness standards, with an “LLM-as-a-judge.” This decide mannequin compares the semanticity of the generated reply with the bottom reality, permitting for extra flexibility and higher dealing with of the varied methods appropriate solutions could be expressed when coping with lengthy, nuanced paperwork.

Placing QwenLong-L1 to the check

The Alibaba group evaluated QwenLong-L1 utilizing doc question-answering (DocQA) as the first job. This situation is extremely related to enterprise wants, the place AI should perceive dense paperwork to reply complicated questions. 

Experimental outcomes throughout seven long-context DocQA benchmarks confirmed QwenLong-L1’s capabilities. Notably, the QWENLONG-L1-32B mannequin (primarily based on DeepSeek-R1-Distill-Qwen-32B) achieved efficiency akin to Anthropic’s Claude-3.7 Sonnet Considering, and outperformed fashions like OpenAI’s o3-mini and Qwen3-235B-A22B. The smaller QWENLONG-L1-14B mannequin additionally outperformed Google’s Gemini 2.0 Flash Considering and Qwen3-32B. 

Source: arXiv
Supply: arXiv

An vital discovering related to real-world purposes is how RL coaching ends in the mannequin growing specialised long-context reasoning behaviors. The paper notes that fashions skilled with QwenLong-L1 grow to be higher at “grounding” (linking solutions to particular components of a doc), “subgoal setting” (breaking down complicated questions), “backtracking” (recognizing and correcting their very own errors mid-reasoning), and “verification” (double-checking their solutions).

As an example, whereas a base mannequin would possibly get sidetracked by irrelevant particulars in a monetary doc or get caught in a loop of over-analyzing unrelated data, the QwenLong-L1 skilled mannequin demonstrated a capability to have interaction in efficient self-reflection. It might efficiently filter out these distractor particulars, backtrack from incorrect paths, and arrive on the appropriate reply.

Methods like QwenLong-L1 might considerably increase the utility of AI within the enterprise. Potential purposes embrace authorized tech (analyzing 1000’s of pages of authorized paperwork), finance (deep analysis on annual studies and monetary filings for threat evaluation or funding alternatives) and customer support (analyzing lengthy buyer interplay histories to offer extra knowledgeable assist). The researchers have launched the code for the QwenLong-L1 recipe and the weights for the skilled fashions.

Every day insights on enterprise use instances with VB Every day

If you wish to impress your boss, VB Every day has you lined. We provide the inside scoop on what firms are doing with generative AI, from regulatory shifts to sensible deployments, so you may share insights for max ROI.

Learn our Privateness Coverage

Thanks for subscribing. Take a look at extra VB newsletters right here.

An error occured.



Source link
Tags: ChallengeCurrentLLMslongcontextQwenLongL1reasoningsolvesstumps
Share196Tweet123
Previous Post

STAR 160 in 1 EA – User Manual & Set Files – Other – 31 May 2025

Next Post

‘Whales’, JD Vance and the Trump sons

Investor News Today

Investor News Today

Next Post
‘Whales’, JD Vance and the Trump sons

‘Whales’, JD Vance and the Trump sons

  • Trending
  • Comments
  • Latest
Want a Fortell Hearing Aid? Well, Who Do You Know?

Want a Fortell Hearing Aid? Well, Who Do You Know?

December 3, 2025
Private equity groups prepare to offload Ensemble Health for up to $12bn

Private equity groups prepare to offload Ensemble Health for up to $12bn

May 16, 2025
The human harbor: Navigating identity and meaning in the AI age

The human harbor: Navigating identity and meaning in the AI age

July 14, 2025
Lars Windhorst’s Tennor Holding declared bankrupt

Lars Windhorst’s Tennor Holding declared bankrupt

June 18, 2025
Why America’s economy is soaring ahead of its rivals

Why America’s economy is soaring ahead of its rivals

0
Dollar climbs after Donald Trump’s Brics tariff threat and French political woes

Dollar climbs after Donald Trump’s Brics tariff threat and French political woes

0
Nato chief Mark Rutte’s warning to Trump

Nato chief Mark Rutte’s warning to Trump

0
Top Federal Reserve official warns progress on taming US inflation ‘may be stalling’

Top Federal Reserve official warns progress on taming US inflation ‘may be stalling’

0
Bonebreaker Core System – “RAPID FIRE MODE” Preset (Video + Settings) – Analytics & Forecasts – 16 January 2026

Bonebreaker Core System – “RAPID FIRE MODE” Preset (Video + Settings) – Analytics & Forecasts – 16 January 2026

January 17, 2026
Trump says he ‘greatly respects’ that Iran have been cancelled

Trump says he ‘greatly respects’ that Iran have been cancelled

January 17, 2026
Bitcoin ETFs See $1.42B Inflows as Institutional Demand Rebuilds

Bitcoin ETFs See $1.42B Inflows as Institutional Demand Rebuilds

January 17, 2026
Ads Are Coming to ChatGPT. Here’s How They’ll Work

Ads Are Coming to ChatGPT. Here’s How They’ll Work

January 17, 2026

Live Prices

© 2024 Investor News Today

No Result
View All Result
  • Home
  • Market
  • Business
  • Finance
  • Investing
  • Real Estate
  • Commodities
  • Crypto
  • Blockchain
  • Personal Finance
  • Tech

© 2024 Investor News Today