Telegram Group & Telegram Channel
Tasty AI Papers | 01-31 August 2024

Robotics.

🔘Body Transformer: Leveraging Robot Embodiment for Policy Learning

what: one transformer to control whole body.
- propose Body Transformer (BoT)
- vanilla transformer with special attention mask, which reflects interconnection of the different body parts.

🔘CrossFormer Scaling Cross-Embodied Learning for Manipulation, Navigation, Locomotion, and Aviation

what: One transformer that can control various robot types.
- trained on 900K trajectories from 20 different robots.
- matches or beats specialized algorithms for each robot type.
- works on arms, wheeled bots, quadrupeds, and even drones.

Diffusion + AR Transformers

🟢Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

what: merge AR decoder with vanilla diffusion.
- train model with two objectives: causal language loss + diffusion objective
- deal with discrete and continuous in the same model.

🟡 Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

what: propose diffusion for discrete distribution
- beats other diffusion approach for text generation
- outperforms gpt-2.

🟡Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

what: combine AR transformer with MaskGIT.
- can generate image and understand them.
- text tokenization + image tokenization. Use MaskGIT losses for image tokens.
Please open Telegram to view this post
VIEW IN TELEGRAM



group-telegram.com/neural_cell/179
Create:
Last Update:

Tasty AI Papers | 01-31 August 2024

Robotics.

🔘Body Transformer: Leveraging Robot Embodiment for Policy Learning

what: one transformer to control whole body.
- propose Body Transformer (BoT)
- vanilla transformer with special attention mask, which reflects interconnection of the different body parts.

🔘CrossFormer Scaling Cross-Embodied Learning for Manipulation, Navigation, Locomotion, and Aviation

what: One transformer that can control various robot types.
- trained on 900K trajectories from 20 different robots.
- matches or beats specialized algorithms for each robot type.
- works on arms, wheeled bots, quadrupeds, and even drones.

Diffusion + AR Transformers

🟢Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

what: merge AR decoder with vanilla diffusion.
- train model with two objectives: causal language loss + diffusion objective
- deal with discrete and continuous in the same model.

🟡 Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

what: propose diffusion for discrete distribution
- beats other diffusion approach for text generation
- outperforms gpt-2.

🟡Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

what: combine AR transformer with MaskGIT.
- can generate image and understand them.
- text tokenization + image tokenization. Use MaskGIT losses for image tokens.

BY the last neural cell




Share with your friend now:
group-telegram.com/neural_cell/179

View MORE
Open in Telegram


Telegram | DID YOU KNOW?

Date: |

'Wild West' "Markets were cheering this economic recovery and return to strong economic growth, but the cheers will turn to tears if the inflation outbreak pushes businesses and consumers to the brink of recession," he added. Despite Telegram's origins, its approach to users' security has privacy advocates worried. Some privacy experts say Telegram is not secure enough Oleksandra Matviichuk, a Kyiv-based lawyer and head of the Center for Civil Liberties, called Durov’s position "very weak," and urged concrete improvements.
from us


Telegram the last neural cell
FROM American