Tati's Wonderland | Telegram Webview: tatiwonderland/68 -

Telegram Group & Telegram Channel

Tati's Wonderland

Кстати, сейчас в bay area проходит mooc курс Advanced LLM agents
с лекциями на youtube, которые могут смотреть все (как мы любим, без регистрации и смс).

Сегодня как раз одна такая лекция "Learning to Self-Improve & Reason with LLMs", 4pm SF time, но посмотреть можно и потом. Они часто начинают позднее.

Перепосчу.
Our 2nd lecture will be happening today @4:00pm PST! You can find the livestream here: https://www.youtube.com/live/_MNlLhU33H0.

Today, our amazing guest speaker Jason Weston will be presenting, "Learning to Self-Improve & Reason with LLMs."

We describe some recent methods for LLMs whereby they can self-learn how to perform better at tasks relevant to human users, from reasoning or math tasks to creative tasks. In particular we describe the methods of Iterative DPO (https://arxiv.org/abs/2312.16682), Self-Rewarding LLMs (https://arxiv.org/abs/2401.10020), Iterative Reasoning Preference Optimization (https://arxiv.org/abs/2404.19733), Thinking LLMs (https://arxiv.org/abs/2410.10630), Meta-Rewarding LLMs (https://arxiv.org/abs/2407.19594), and more!

CS 194/294-280 (Advanced LLM Agents) - Lecture 2, Jason Weston

www.group-telegram.com/ye/tatiwonderland.com/68

1.2K viewsTanya, edited Feb 3 at 17:59

group-telegram.com/tatiwonderland/68

Create: 2025-02-03
Last Update: 2025-02-15 16:10:57

Кстати, сейчас в bay area проходит mooc курс Advanced LLM agents
с лекциями на youtube, которые могут смотреть все (как мы любим, без регистрации и смс).

Сегодня как раз одна такая лекция "Learning to Self-Improve & Reason with LLMs", 4pm SF time, но посмотреть можно и потом. Они часто начинают позднее.

Перепосчу.
Our 2nd lecture will be happening today @4:00pm PST! You can find the livestream here: https://www.youtube.com/live/_MNlLhU33H0.

Today, our amazing guest speaker Jason Weston will be presenting, "Learning to Self-Improve & Reason with LLMs."

We describe some recent methods for LLMs whereby they can self-learn how to perform better at tasks relevant to human users, from reasoning or math tasks to creative tasks. In particular we describe the methods of Iterative DPO (https://arxiv.org/abs/2312.16682), Self-Rewarding LLMs (https://arxiv.org/abs/2401.10020), Iterative Reasoning Preference Optimization (https://arxiv.org/abs/2404.19733), Thinking LLMs (https://arxiv.org/abs/2410.10630), Meta-Rewarding LLMs (https://arxiv.org/abs/2407.19594), and more!

BY Tati's Wonderland

Share with your friend now:
group-telegram.com/tatiwonderland/68

Open in Telegram

Telegram | DID YOU KNOW?

Date: 2025-02-15|

I want a secure messaging app, should I use Telegram? And while money initially moved into stocks in the morning, capital moved out of safe-haven assets. The price of the 10-year Treasury note fell Friday, sending its yield up to 2% from a March closing low of 1.73%. Messages are not fully encrypted by default. That means the company could, in theory, access the content of the messages, or be forced to hand over the data at the request of a government. False news often spreads via public groups, or chats, with potentially fatal effects. Emerson Brooking, a disinformation expert at the Atlantic Council's Digital Forensic Research Lab, said: "Back in the Wild West period of content moderation, like 2014 or 2015, maybe they could have gotten away with it, but it stands in marked contrast with how other companies run themselves today."
from ye

Telegram Tati's Wonderland
FROM American