Chinese AI Tops Hugging Face's Revamped Chatbot Leaderboard

erek

[H]F Junkie
Joined
Dec 19, 2005
Messages
11,310
Fascinating.. also DeepSeek Coder outperforms GPT-4?

"Alibaba's Qwen models dominated Hugging Face's latest LLM leaderboard, securing three top-ten spots. The new benchmark, launched Thursday, tests open-source models on tougher criteria including long-context reasoning and complex math. Meta's Llama3-70B also ranked highly, but several Chinese models outperformed Western counterparts. (Closed-source AIs like ChatGPT were excluded.) The leaderboard replaces an earlier version deemed too easy to game."

Source: https://slashdot.org/story/24/06/27...ps-hugging-faces-revamped-chatbot-leaderboard
 
Microsoft relatively small model score well here, but excluding the new Claude, Geminy and GPT can make it hard to have a good view of how good they are.

DeepSeek seem to beat the older Claude 3 and GPT-4 which is really good, does not include test against 3.5 or gpt 4-o.

can by tried easily here (much slower but still fast enough):
https://chat.deepseek.com/coder
 
  • Like
Reactions: erek
like this
It’s been exciting to see the advancements in the health terminology leaderboards.

Twice the public models have leapfrogged our internal models… of which I have no endless fun pointing out to our data scientists…

😈
 
  • Like
Reactions: erek
like this
It’s been exciting to see the advancements in the health terminology leaderboards.

Twice the public models have leapfrogged our internal models… of which I have no endless fun pointing out to our data scientists…

😈

😈
 
Back
Top