Chinese AI Tops Hugging Face's Revamped Chatbot Leaderboard

erek · Thursday at 3:50 PM

Fascinating.. also DeepSeek Coder outperforms GPT-4?

"Alibaba's Qwen models dominated Hugging Face's latest LLM leaderboard, securing three top-ten spots. The new benchmark, launched Thursday, tests open-source models on tougher criteria including long-context reasoning and complex math. Meta's Llama3-70B also ranked highly, but several Chinese models outperformed Western counterparts. (Closed-source AIs like ChatGPT were excluded.) The leaderboard replaces an earlier version deemed too easy to game."

Source: https://slashdot.org/story/24/06/27...ps-hugging-faces-revamped-chatbot-leaderboard

LukeTbk · Thursday at 5:19 PM

Microsoft relatively small model score well here, but excluding the new Claude, Geminy and GPT can make it hard to have a good view of how good they are.

DeepSeek seem to beat the older Claude 3 and GPT-4 which is really good, does not include test against 3.5 or gpt 4-o.

can by tried easily here (much slower but still fast enough):
https://chat.deepseek.com/coder

Thatguybil · Thursday at 7:22 PM

It’s been exciting to see the advancements in the health terminology leaderboards.

Twice the public models have leapfrogged our internal models… of which I have no endless fun pointing out to our data scientists…

erek · Friday at 12:21 AM

Thatguybil said:
It’s been exciting to see the advancements in the health terminology leaderboards.

Twice the public models have leapfrogged our internal models… of which I have no endless fun pointing out to our data scientists…

CookieFactory · 2024-06-29T07:49:29-0400

BABA most undervalued AI play IMO. Not to mention how good the rest of their business is looking.

Chinese AI Tops Hugging Face's Revamped Chatbot Leaderboard

erek

[H]F Junkie

LukeTbk

Supreme [H]ardness

Thatguybil

Limp Gawd

erek

[H]F Junkie

CookieFactory

Limp Gawd