• Latest
  • Trending
  • All
  • Market Updates
  • Cryptocurrency
  • Blockchain
  • Investing
  • Commodities
  • Personal Finance
  • Technology
  • Business
  • Real Estate
  • Finance
Meta proposes new scalable memory layers that improve knowledge, reduce hallucinations

Meta proposes new scalable memory layers that improve knowledge, reduce hallucinations

January 8, 2025
Fund firms court ‘bored’ investors with flurry of exotic ETF launches

Fund firms court ‘bored’ investors with flurry of exotic ETF launches

June 6, 2025
Anthropic releases new “hybrid reasoning” AI model

Anthropic launches Claude Gov for military and intelligence use

June 6, 2025
How widespread — and worrisome — is the BNPL phenomenon?

How widespread — and worrisome — is the BNPL phenomenon?

June 6, 2025
The case for a Fed rate cut

The case for a Fed rate cut

June 6, 2025
CRWD, TSLA, DLTR, THO and more

CRWD, TSLA, DLTR, THO and more

June 6, 2025
TotalEnergies promotion of natural gas under fire in greenwashing trial

TotalEnergies promotion of natural gas under fire in greenwashing trial

June 6, 2025
NFP set to show US labor market cooled in May

NFP set to show US labor market cooled in May

June 6, 2025
Man Group orders quants back to office five days a week

Man Group orders quants back to office five days a week

June 6, 2025
PBOC surprises markets with mid-month liquidity injection

PBOC surprises markets with mid-month liquidity injection

June 6, 2025
Russia’s War On Illegal Mining Heats Up With Bitcoin Seizures

Russia’s War On Illegal Mining Heats Up With Bitcoin Seizures

June 6, 2025
Average 401(k) balances fall due to market volatility, Fidelity says

Average 401(k) balances fall due to market volatility, Fidelity says

June 6, 2025
Donald Trump and Elon Musk’s feud erupts over tax bill

Donald Trump and Elon Musk’s feud erupts over tax bill

June 6, 2025
Friday, June 6, 2025
No Result
View All Result
InvestorNewsToday.com
  • Home
  • Market
  • Business
  • Finance
  • Investing
  • Real Estate
  • Commodities
  • Crypto
  • Blockchain
  • Personal Finance
  • Tech
InvestorNewsToday.com
No Result
View All Result
Home Technology

Meta proposes new scalable memory layers that improve knowledge, reduce hallucinations

by Investor News Today
January 8, 2025
in Technology
0
Meta proposes new scalable memory layers that improve knowledge, reduce hallucinations
491
SHARES
1.4k
VIEWS
Share on FacebookShare on Twitter

Be part of our each day and weekly newsletters for the most recent updates and unique content material on industry-leading AI protection. Study Extra


As enterprises proceed to undertake giant language fashions (LLMs) in numerous functions, one of many key challenges they face is bettering the factual information of fashions and decreasing hallucinations. In a brand new paper, researchers at Meta AI suggest “scalable reminiscence layers,” which could possibly be one among a number of attainable options to this drawback.

Scalable reminiscence layers add extra parameters to LLMs to extend their studying capability with out requiring further compute sources. The structure is beneficial for functions the place you possibly can spare further reminiscence for factual information but additionally need the inference pace of nimbler fashions.

Dense and reminiscence layers

Conventional language fashions use “dense layers” to encode huge quantities of knowledge of their parameters. In dense layers, all parameters are used at their full capability and are principally activated on the similar time throughout inference. Dense layers can be taught complicated features, and growing their requires further computational and power sources. 

In distinction, for easy factual information, a lot easier layers with associative reminiscence architectures can be extra environment friendly and interpretable. That is what reminiscence layers do. They use easy sparse activations and key-value lookup mechanisms to encode and retrieve information. Sparse layers take up extra reminiscence than dense layers however solely use a small portion of the parameters without delay, which makes them rather more compute-efficient.

Reminiscence layers have existed for a number of years however are not often utilized in trendy deep studying architectures. They don’t seem to be optimized for present {hardware} accelerators. 

Present frontier LLMs normally use some type of “combination of consultants” (MoE) structure, which makes use of a mechanism vaguely much like reminiscence layers. MoE fashions are composed of many smaller professional elements specializing in particular duties. At inference time, a routing mechanism determines which professional turns into activated based mostly on the enter sequence. PEER, an structure not too long ago developed by Google DeepMind, extends MoE to tens of millions of consultants, offering extra granular management over the parameters that turn into activated throughout inference.

Upgrading reminiscence layers

Reminiscence layers are mild on compute however heavy on reminiscence, which presents particular challenges for present {hardware} and software program frameworks. Of their paper, the Meta researchers suggest a number of modifications that resolve these challenges and make it attainable to make use of them at scale.

Memory layers
Reminiscence layers can retailer information in parallel throughout a number of GPUs with out slowing down the mannequin (supply: arXiv)

First, the researchers configured the reminiscence layers for parallelization, distributing them throughout a number of GPUs to retailer tens of millions of key-value pairs with out altering different layers within the mannequin. In addition they carried out a particular CUDA kernel for dealing with high-memory bandwidth operations. And, they developed a parameter-sharing mechanism that helps a single set of reminiscence parameters throughout a number of reminiscence layers inside a mannequin. Which means that the keys and values used for lookups are shared throughout layers.

These modifications make it attainable to implement reminiscence layers inside LLMs with out slowing down the mannequin.

“Reminiscence layers with their sparse activations properly complement dense networks, offering elevated capability for information acquisition whereas being mild on compute,” the researchers write. “They are often effectively scaled, and supply practitioners with a pretty new route to trade-off reminiscence with compute.”

To check reminiscence layers, the researchers modified Llama fashions by changing a number of dense layers with a shared reminiscence layer. They in contrast the memory-enhanced fashions towards the dense LLMs in addition to MoE and PEER fashions on a number of duties, together with factual query answering, scientific and commonsense world information and coding.

Memory model vs dense layers
A 1.3B reminiscence mannequin (strong line) educated on 1 trillion tokens approaches the efficiency of a 7B mannequin (dashed line) on factual question-answering duties as it’s given extra reminiscence parameters (supply: arxiv)

Their findings present that reminiscence fashions enhance considerably over dense baselines and compete with fashions that use 2X to 4X extra compute. In addition they match the efficiency of MoE fashions which have the identical compute funds and parameter depend. The mannequin’s efficiency is particularly notable on duties that require factual information. For instance, on factual question-answering, a reminiscence mannequin with 1.3 billion parameters approaches the efficiency of Llama-2-7B, which has been educated on twice as many tokens and 10X extra compute. 

Furthermore, the researchers discovered that the advantages of reminiscence fashions stay per mannequin measurement as they scaled their experiments from 134 million to eight billion parameters.

“Given these findings, we strongly advocate that reminiscence layers must be built-in into all subsequent era AI architectures,” the researchers write, whereas including that there’s nonetheless much more room for enchancment. “Specifically, we hope that new studying strategies might be developed to push the effectiveness of those layers even additional, enabling much less forgetting, fewer hallucinations and continuous studying.”

Every day insights on enterprise use circumstances with VB Every day

If you wish to impress your boss, VB Every day has you lined. We provide the inside scoop on what corporations are doing with generative AI, from regulatory shifts to sensible deployments, so you possibly can share insights for max ROI.

Learn our Privateness Coverage

Thanks for subscribing. Take a look at extra VB newsletters right here.

An error occured.



Source link
Tags: hallucinationsImproveknowledgelayersmemoryMetaProposesreducescalable
Share196Tweet123
Previous Post

China's State Planner to expand the scope to implement two new policies

Next Post

China’s currency hits 16-month low on Trump tariff fears

Investor News Today

Investor News Today

Next Post
China’s currency hits 16-month low on Trump tariff fears

China’s currency hits 16-month low on Trump tariff fears

  • Trending
  • Comments
  • Latest
Equinor scales back renewables push 7 years after ditching ‘oil’ from its name

Equinor scales back renewables push 7 years after ditching ‘oil’ from its name

February 5, 2025
Best High-Yield Savings Accounts & Rates for January 2025

Best High-Yield Savings Accounts & Rates for January 2025

January 3, 2025
Suleiman Levels limited V 3.00 Update and Offer – Analytics & Forecasts – 5 January 2025

Suleiman Levels limited V 3.00 Update and Offer – Analytics & Forecasts – 5 January 2025

January 5, 2025
10 Best Ways To Get Free $10 in PayPal Money Instantly

10 Best Ways To Get Free $10 in PayPal Money Instantly

December 8, 2024
Why America’s economy is soaring ahead of its rivals

Why America’s economy is soaring ahead of its rivals

0
Dollar climbs after Donald Trump’s Brics tariff threat and French political woes

Dollar climbs after Donald Trump’s Brics tariff threat and French political woes

0
Nato chief Mark Rutte’s warning to Trump

Nato chief Mark Rutte’s warning to Trump

0
Top Federal Reserve official warns progress on taming US inflation ‘may be stalling’

Top Federal Reserve official warns progress on taming US inflation ‘may be stalling’

0
Fund firms court ‘bored’ investors with flurry of exotic ETF launches

Fund firms court ‘bored’ investors with flurry of exotic ETF launches

June 6, 2025
Anthropic releases new “hybrid reasoning” AI model

Anthropic launches Claude Gov for military and intelligence use

June 6, 2025
How widespread — and worrisome — is the BNPL phenomenon?

How widespread — and worrisome — is the BNPL phenomenon?

June 6, 2025
The case for a Fed rate cut

The case for a Fed rate cut

June 6, 2025

Live Prices

© 2024 Investor News Today

No Result
View All Result
  • Home
  • Market
  • Business
  • Finance
  • Investing
  • Real Estate
  • Commodities
  • Crypto
  • Blockchain
  • Personal Finance
  • Tech

© 2024 Investor News Today