Saturday, May 17, 2025
  • Dashboard
  • Login
  • Registration
  • Contact us
quantann
No Result
View All Result
quantann
No Result
View All Result
quantann
No Result
View All Result

Meta, OpenAI, Anthropic or Cohere

August 17, 2023
in Business
Reading Time: 3 mins read
0
0
Home Business
Share on FacebookShare on Twitter

[ad_1]

If the tech industry’s top AI models had superlatives, Microsoft-backed OpenAI’s GPT-4 would be best at math, Meta‘s Llama 2 would be most middle of the road, Anthropic’s Claude 2 would be best at knowing its limits and Cohere AI would receive the title of most hallucinations — and most confident wrong answers.

That’s all according to a Thursday report from researchers at Arthur AI, a machine learning monitoring platform.

The research comes at a time when misinformation stemming from artificial intelligence systems is more hotly debated than ever, amid a boom in generative AI ahead of the 2024 U.S. presidential election.

It’s the first report “to take a comprehensive look at rates of hallucination, rather than just sort of … provide a single number that talks about where they are on an LLM leaderboard,” Adam Wenchel, co-founder and CEO of Arthur, told CNBC.

AI hallucinations occur when large language models, or LLMs, fabricate information entirely, behaving as if they are spouting facts. One example: In June, news broke that ChatGPT cited “bogus” cases in a New York federal court filing, and the New York attorneys involved may face sanctions.

In one experiment, the Arthur AI researchers tested the AI models in categories such as combinatorial mathematics, U.S. presidents and Moroccan political leaders, asking questions “designed to contain a key ingredient that gets LLMs to blunder: they demand multiple steps of reasoning about information,” the researchers wrote.

Overall, OpenAI’s GPT-4 performed the best of all models tested, and researchers found it hallucinated less than its prior version, GPT-3.5 — for example, on math questions, it hallucinated between 33% and 50% less. depending on the category.

Meta’s Llama 2, on the other hand, hallucinates more overall than GPT-4 and Anthropic’s Claude 2, researchers found.

In the math category, GPT-4 came in first place, followed closely by Claude 2, but in U.S. presidents, Claude 2 took the first place spot for accuracy, bumping GPT-4 to second place. When asked about Moroccan politics, GPT-4 came in first again, and Claude 2 and Llama 2 almost entirely chose not to answer.

In a second experiment, the researchers tested how much the AI models would hedge their answers with warning phrases to avoid risk (think: “As an AI model, I cannot provide opinions”).

When it comes to hedging, GPT-4 had a 50% relative increase compared to GPT-3.5, which “quantifies anecdotal evidence from users that GPT-4 is more frustrating to use,” the researchers wrote. Cohere’s AI model, on the other hand, did not hedge at all in any of its responses, according to the report. Claude 2 was most reliable in terms of “self-awareness,” the research showed, meaning accurately gauging what it does and doesn’t know, and answering only questions it had training data to support.

The most important takeaway for users and businesses, Wenchel said, was to “test on your exact workload,” later adding, “It’s important to understand how it performs for what you’re trying to accomplish.”

“A lot of the benchmarks are just looking at some measure of the LLM by itself, but that’s not actually the way it’s getting used in the real world,” Wenchel said. “Making sure you really understand the way the LLM performs for the way it’s actually getting used is the key.”

[ad_2]

Source link

Tags: AnthropicCohereMetaOpenAI
Previous Post

Lenovo Q1 revenue misses, hit by poor PC demand By Reuters

Next Post

Palo Alto Networks (PANW) earnings report Q4 2023

Related Posts

edit post
Expect a stock market pullback in early 2024 for these 4 reasons, Fundstrat says
Business

Expect a stock market pullback in early 2024 for these 4 reasons, Fundstrat says

by Quantann
December 30, 2023
edit post
The INX Digital Company discloses cybersecurity incident (OTCMKTS:INXDF)
Business

The INX Digital Company discloses cybersecurity incident (OTCMKTS:INXDF)

by Quantann
December 30, 2023
edit post
AvalonBay Communities: Why We Chose This Residential REIT Over Its Peers (NYSE:AVB)
Business

AvalonBay Communities: Why We Chose This Residential REIT Over Its Peers (NYSE:AVB)

by Quantann
December 30, 2023
edit post
Earthquake Today: 6.3 magnitude quake hits Indonesia, no tsunami alert issued
Business

Earthquake Today: 6.3 magnitude quake hits Indonesia, no tsunami alert issued

by Quantann
December 30, 2023
edit post
Inflows to reverse repo facility surge, hitting .018 trillion By Reuters
Business

Inflows to reverse repo facility surge, hitting $1.018 trillion By Reuters

by Quantann
December 29, 2023
Next Post
edit post
Palo Alto Networks (PANW) earnings report Q4 2023

Palo Alto Networks (PANW) earnings report Q4 2023

edit post
Kohl’s (KSS) to report Q2 results next week. Here is what to expect

Kohl’s (KSS) to report Q2 results next week. Here is what to expect

edit post
Domino’s Pizza: Decelerating Growth Priced To Perfection (NYSE:DPZ)

Domino's Pizza: Decelerating Growth Priced To Perfection (NYSE:DPZ)

  • Trending
  • Comments
  • Latest
edit post
Investopedia Simulator

Investopedia Simulator

April 8, 2023
edit post
Stratis Surges Over 50% in 24 Hours While TG Casino Reaches 0k in Presale

Stratis Surges Over 50% in 24 Hours While TG Casino Reaches $600k in Presale

October 8, 2023
edit post
KT Corporation: A Bright Future Lies Ahead (NYSE:KT)

KT Corporation: A Bright Future Lies Ahead (NYSE:KT)

January 17, 2023
edit post
Evaluating Oil & Gas Stocks: A Comprehensive Guide for Energy Investors

Evaluating Oil & Gas Stocks: A Comprehensive Guide for Energy Investors

July 25, 2024
edit post
Understanding the Dynamics of Energy Commodities: A Comprehensive Analysis

Understanding the Dynamics of Energy Commodities: A Comprehensive Analysis

July 19, 2024
edit post
FDX Earnings: FedEx Corporation reports higher Q4 revenue and adj. profit

FDX Earnings: FedEx Corporation reports higher Q4 revenue and adj. profit

June 26, 2024
Facebook Twitter Instagram Youtube RSS
quantann

Get the latest news and follow the coverage of Financial News, Stocks, Analysis, Trading Updates and more from the top trusted sources.

No Result
View All Result

CATEGORIES

  • Blog
  • Business
  • Commodities
  • Cryptocurrency
  • Investing
  • Markets
  • Personal Finance
  • Trading

SITE MAP

  • About Me
  • Contact us
  • Disclaimer
  • Privacy Policy
  • DMCA
  • Terms and Conditions
  • Cookie Privacy Policy

Copyright © 2022 Quantann.
Quantann s not responsible for the content of external sites.

No Result
View All Result
  • Home
  • Business
  • Markets
  • Commodities
  • Cryptocurrency
  • Personal Finance
  • Trading
  • Blog
  • About Me
  • Analytics Dashboard
  • Login

Copyright © 2022 Quantann.
Quantann s not responsible for the content of external sites.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In