Telegram Group & Telegram Channel
​​DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Громкая статья от китайцев про модели DeepSeek-R1-Zero и DeepSeek-R1. DeepSeek-R1-Zero обучена исключительно на RL без SFT и демонстрирует отличные способности к reasoning. Однако у неё есть проблемы: плохая читаемость предсказаний и language mixing (прям вот так - текст на двух языках). DeepSeek-R1 решает эти проблемы благодаря multi-stage training и использованию cold-start data перед RL и достигает результаты сравнимые с OpenAI-o1-1217.

Плюс авторы выложили обе модели и шесть дистиллированных в open-source.

Кстати, первый автор в прошлом выиграл много соревнований по ML - возможно это внесло свой вклад.

Paper
Project
Hugging Face page
Code

Мои обзоры:
Personal blog
Medium
Linkedin Pulse

#paperreview



group-telegram.com/datastorieslanguages/361
Create:
Last Update:

​​DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Громкая статья от китайцев про модели DeepSeek-R1-Zero и DeepSeek-R1. DeepSeek-R1-Zero обучена исключительно на RL без SFT и демонстрирует отличные способности к reasoning. Однако у неё есть проблемы: плохая читаемость предсказаний и language mixing (прям вот так - текст на двух языках). DeepSeek-R1 решает эти проблемы благодаря multi-stage training и использованию cold-start data перед RL и достигает результаты сравнимые с OpenAI-o1-1217.

Плюс авторы выложили обе модели и шесть дистиллированных в open-source.

Кстати, первый автор в прошлом выиграл много соревнований по ML - возможно это внесло свой вклад.

Paper
Project
Hugging Face page
Code

Мои обзоры:
Personal blog
Medium
Linkedin Pulse

#paperreview

BY Data, Stories and Languages




Share with your friend now:
group-telegram.com/datastorieslanguages/361

View MORE
Open in Telegram


Telegram | DID YOU KNOW?

Date: |

One thing that Telegram now offers to all users is the ability to “disappear” messages or set remote deletion deadlines. That enables users to have much more control over how long people can access what you’re sending them. Given that Russian law enforcement officials are reportedly (via Insider) stopping people in the street and demanding to read their text messages, this could be vital to protect individuals from reprisals. And indeed, volatility has been a hallmark of the market environment so far in 2022, with the S&P 500 still down more than 10% for the year-to-date after first sliding into a correction last month. The CBOE Volatility Index, or VIX, has held at a lofty level of more than 30. The fake Zelenskiy account reached 20,000 followers on Telegram before it was shut down, a remedial action that experts say is all too rare. Channels are not fully encrypted, end-to-end. All communications on a Telegram channel can be seen by anyone on the channel and are also visible to Telegram. Telegram may be asked by a government to hand over the communications from a channel. Telegram has a history of standing up to Russian government requests for data, but how comfortable you are relying on that history to predict future behavior is up to you. Because Telegram has this data, it may also be stolen by hackers or leaked by an internal employee. The last couple days have exemplified that uncertainty. On Thursday, news emerged that talks in Turkey between the Russia and Ukraine yielded no positive result. But on Friday, Reuters reported that Russian President Vladimir Putin said there had been some “positive shifts” in talks between the two sides.
from br


Telegram Data, Stories and Languages
FROM American