Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
m-ricย 
posted an update about 15 hours ago
Post
951
๐Ÿ”ฅ ๐๐ฐ๐ž๐ง ๐ซ๐ž๐ฅ๐ž๐š๐ฌ๐ž๐ฌ ๐ญ๐ก๐ž๐ข๐ซ ๐Ÿ.๐Ÿ“ ๐Ÿ๐š๐ฆ๐ข๐ฅ๐ฒ ๐จ๐Ÿ ๐ฆ๐จ๐๐ž๐ฅ๐ฌ: ๐๐ž๐ฐ ๐’๐Ž๐“๐€ ๐Ÿ๐จ๐ซ ๐š๐ฅ๐ฅ ๐ฌ๐ข๐ณ๐ž๐ฌ ๐ฎ๐ฉ ๐ญ๐จ ๐Ÿ•๐Ÿ๐!

The Chinese LLM maker just dropped a flurry of different models, ensuring there will be a Qwen SOTA model for every application out there:
Qwen2.5: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B
Qwen2.5-Coder: 1.5B, 7B, and 32B on the way
Qwen2.5-Math: 1.5B, 7B, and 72B.

And they didn't sleep: the performance is top of the game for each weight category!

๐Š๐ž๐ฒ ๐ข๐ง๐ฌ๐ข๐ ๐ก๐ญ๐ฌ:

๐ŸŒ All models have ๐Ÿญ๐Ÿฎ๐Ÿด๐—ธ ๐˜๐—ผ๐—ธ๐—ฒ๐—ป ๐—ฐ๐—ผ๐—ป๐˜๐—ฒ๐˜…๐˜ ๐—น๐—ฒ๐—ป๐—ด๐˜๐—ต

๐Ÿ“š Models pre-trained on 18T tokens, even longer than the 15T of Llama-3

๐Ÿ’ช The flagship ๐—ค๐˜„๐—ฒ๐—ป๐Ÿฎ.๐Ÿฑ-๐Ÿณ๐Ÿฎ๐—• ๐—ถ๐˜€ ~๐—ฐ๐—ผ๐—บ๐—ฝ๐—ฒ๐˜๐—ถ๐˜๐—ถ๐˜ƒ๐—ฒ ๐˜„๐—ถ๐˜๐—ต ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿญ-๐Ÿฐ๐Ÿฌ๐Ÿฑ๐—•, ๐—ฎ๐—ป๐—ฑ ๐—ต๐—ฎ๐˜€ ๐—ฎ ๐Ÿฏ-๐Ÿฑ% ๐—บ๐—ฎ๐—ฟ๐—ด๐—ถ๐—ป ๐—ผ๐—ป ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿญ-๐Ÿณ๐Ÿฌ๐—• ๐—ผ๐—ป ๐—บ๐—ผ๐˜€๐˜ ๐—ฏ๐—ฒ๐—ป๐—ฐ๐—ต๐—บ๐—ฎ๐—ฟ๐—ธ๐˜€.

๐Ÿ‡ซ๐Ÿ‡ท On top of this, it ๐˜๐—ฎ๐—ธ๐—ฒ๐˜€ ๐˜๐—ต๐—ฒ #๐Ÿญ ๐˜€๐—ฝ๐—ผ๐˜ ๐—ผ๐—ป ๐—บ๐˜‚๐—น๐˜๐—ถ๐—น๐—ถ๐—ป๐—ด๐˜‚๐—ฎ๐—น ๐˜๐—ฎ๐˜€๐—ธ๐˜€ so it might become my standard for French

๐Ÿ’ป Qwen2.5-Coder is only 7B but beats competing models up to 33B (DeeSeek-Coder 33B-Instruct). Let's wait for their 32B to come out!

๐Ÿงฎ Qwen2.5-Math sets a new high in the ratio of MATH benchmark score to # of parameters. They trained it by "aggregating more high-quality mathematical data, particularly in Chinese, from web sources, books, and codes across multiple recall cycles."

๐Ÿ“„ Technical report to be released "very soon"

๐Ÿ”“ All models have the most permissive license apache2.0, except the 72B models that have a custom license mentioning "you can use it for free EXCEPT if your product has over 100M users"

๐Ÿค— All models are available on the HF Hub! โžก๏ธ Qwen/qwen25-66e81a666513e518adb90d9e

the Math one is absolutely incredible , the demo is great :-)

Up to 2.0, Qwen's Japanese language performance was not very good, but with 2.5 it suddenly took a leap forward.
As far as I have tested it on 7B and 14B, I think it is at a level that can compete with Nemo. Even at 3B, the vocabulary is small but the output does not break down, making it comparable to the upper tier of the current 4B class.