Adam Yanxiao Zhao @sdpkjc_adam

🧑‍🎓 CS PhD Student @UCAS1978 | 🤖 Reinforcement Learner | 🏄‍♂️ Research Intern @Zai_org sdpkjc.com Joined June 2018

Tweets

129
Followers

49
Following

291
Likes

351

Xiao Liu (Shaw) @ShawLiu12

a month ago

🚨Thrilled to share our latest progress on Computer Use Agent, ComputerRL, an end-to-end RL method which achieves 48.1% success rate on OSWorld Benchmark with only 9B open model, beating OpenAI Operator, Claude Sonnet 4.0, and other previous models, state-of-the-art performance.…

Z.ai @Zai_org

a month ago

10 104 578 70K 290

Download Image

1 5 42 4K 16

Adam Yanxiao Zhao @sdpkjc_adam

a month ago

Lucky to have collaborated with an amazing team on this work! 🎉🚀😃

Z.ai @Zai_org

a month ago

Lucky to have collaborated with an amazing team on this work! 🎉🚀😃

10 104 578 70K 290

Download Image

0 0 2 52 0

Tanishq Mathew Abraham, Ph.D. @iScienceLuvr

a month ago

ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents "To support scalable and robust training, we develop a distributed RL infrastructure capable of orchestrating thousands of parallel virtual desktop environments to accelerate large-scale…

3 24 116 9K 98

Download Image

Z.ai @Zai_org

2 months ago

Introducing GLM-4.5 and GLM-4.5 Air: new flagship models designed to unify frontier reasoning, coding, and agentic capabilities. GLM-4.5: 355B total / 32B active parameters GLM-4.5-Air: 106B total / 12B active parameters API Pricing (per 1M tokens): GLM-4.5: $0.6 Input / $2.2…

268 643 3K 1.2M 1K

Download Image

TNG Technology Consulting GmbH @tngtech

5 months ago

Today we release DeepSeek-R1T-Chimera, an open weights model adding R1 reasoning to @deepseek_ai V3-0324 with a novel construction method. In benchmarks, it appears to be as smart as R1 but much faster, using 40% fewer output tokens. The Chimera is a child LLM, using V3s…

26 107 608 80K 269

Download Image

will brown @willccbb

8 months ago

trying to make it really really easy to build LLM RL envs

8 22 357 45K 271

Download Image

Joseph Suarez 🐡 @jsuarez5341

10 months ago

x.com/i/article/1863…

9 26 296 29K 175

Adam Yanxiao Zhao @sdpkjc_adam

11 months ago

🚀

Roger Creus Castanyer @creus_roger

11 months ago

🚀

5 11 43 7K 22

0 0 0 111 0

Jarek Liesen @JarekLiesen

a year ago

🥳 I'm releasing Rejax, a lightweight library of fully vectorizable RL algorithms! ⚡ Enjoy lightning-fast speed using jax.jit on the training function 🧬Use vmap and pmap on hyperparameters 🔙 Log using flexible callbacks 🌐 Available @ github.com/kerajli/rejax 📸 Take a tour!

4 29 171 24K 88

Download Video

Quentin Gallouédec @QGallouedec

a year ago

Sorry to hear that @jsuarez5341, Open RL Benchmark was also rejected from RLC, and we mostly feel the same way about review quality (LLM-generated?). Among other things, we read that "the meaning of "metrics" is never made clear", whereas we have a section dedicated to metrics,…

Joseph Suarez 🐡 @jsuarez5341

a year ago

4 2 29 9K 9

1 2 14 2K 2

Quentin Gallouédec @QGallouedec

a year ago

The Open RL Leaderboard now fully supports all Stable Baselines 3 models! 🚀 Thanks to this update, it now compares over 10,000 models! 📈🎉 🏆 Leaderboard: huggingface.co/spaces/open-rl… 🐙 RL Zoo 3: github.com/DLR-RM/rl-base…

2 8 42 6K 11

Download Image

Quentin Gallouédec @QGallouedec

a year ago

🆕 LeRobot 🤖 github.com/huggingface/le… 📈 Pre-trained robotics models 💾 Datasets of human collected demos 🔩 Modular architecture This is part of our efforts @huggingface to make 🤖 more accessible. By @RemiCadene @asoare159 @alibert_s @Thom_Wolf @AdilZtn , @HaixuanT ...

1 3 18 2K 8

Download Video

Quentin Gallouédec @QGallouedec

a year ago

Which is the best RL agent on the Hub? Now you can, thanks to the Open RL leaderboard 🏆 ! 🧩 Features: - Automatic evaluation of models on the 🤗 Hub - Compatible with all torch-based RL libraries - Supports 87 environments, with more to come 🔥 huggingface.co/spaces/open-rl…

3 19 56 12K 21

Machine Learning @Memoirs

2 years ago

Snapshot Reinforcement Learning: Leveraging Prior Trajectories for Efficiency. arxiv.org/abs/2403.00673

0 1 2 116 1

RL Beyond Rewards Workshop @RLBRew_RLC

2 years ago

Announcing the Reinforcement Learning Beyond Rewards workshop at the first @RL_Conference. Think that rewards aren't enough for RL? Working on RLHF? Thinking of alternative ways of alignment? Creating a foundational model for RL? or have ideas on task-agnostic RL algo? Join us