Seungwook Han @seungwookh

phd-ing @MIT_CSAIL, prev @MITIBMLab @columbia hanseungwook.github.io Joined June 2017

Tweets

139
Followers

411
Following

589
Likes

659

Jascha Sohl-Dickstein @jaschasd

3 days ago

Title: Advice for a young investigator in the first and last days of the Anthropocene Abstract: Within just a few years, it is likely that we will create AI systems that outperform the best humans on all intellectual tasks. This will have implications for your research and…

55 245 2K 304K 2K

Download Image

Jeremy Bernstein @jxbz

5 days ago

I wrote this blog post that tries to go further toward design principles for neural nets and optimizers The post presents a visual intro to optimization on normed manifolds and a Muon variant for the manifold of matrices with unit condition number x.com/thinkymachines…

Thinking Machines @thinkymachines

5 days ago

114 459 3K 1.4M 2K

Download Image

21 52 464 60K 150

Seungwook Han @seungwookh

4 weeks ago

Why do models forget less with RL than SFT?

Jyo Pari @jyo_pari

4 weeks ago

Why do models forget less with RL than SFT?

18 143 890 140K 772

Download Image

0 0 2 413 0

Jyo Pari @jyo_pari

a month ago

We have a fun collaboration of @GPU_MODE x @scaleml coming up! We’re hosting a week-long online bootcamp that explores the core components of GPT-OSS while also diving into cutting-edge research that pushes beyond what’s currently in GPT-OSS! For example, how can MoE's power…

1 20 71 23K 28

Download Image

Seungwook Han @seungwookh

2 months ago

uncertainty-aware reasoning, akin to how humans leverage our confidence

Mehul Damani @MehulDamani2

2 months ago

uncertainty-aware reasoning, akin to how humans leverage our confidence

13 268 896 95K 611

Download Image

0 1 3 363 1

Seungwook Han @seungwookh

2 months ago

was actually wondering with @hyundongleee the fundamental differences between diffusion and autoregressive modeling other than the structure imposed in the modeling of the sequential conditional distribution and how they manifest. a poignant paper that addresses this thought

Mihir Prabhudesai @mihirp98

2 months ago

127 195 1K 234K 939

Download Image

0 1 13 1K 2

Seungwook Han @seungwookh

2 months ago

omw to trying this out 👀

Pika @pika_labs

2 months ago

omw to trying this out 👀

234 205 1K 515K 722

0 0 0 270 0

Seungwook Han @seungwookh

2 months ago

how particles can act differently under different scales and conditions and how we can equip it as part of design is cool

MIT Architecture @MITarchitecture

9 months ago

how particles can act differently under different scales and conditions and how we can equip it as part of design is cool

0 0 2 887 0

Download Image

0 0 4 483 0

Laker Newhouse @LakerNewhouse

2 months ago

[1/9] We created a performant Lipschitz transformer by spectrally regulating the weights—without using activation stability tricks: no layer norm, QK norm, or logit softcapping. We think this may address a “root cause” of unstable training.

14 78 585 140K 568

Download Image

Seungwook Han @seungwookh

3 months ago

But actually this is the og way of doing it and should stop by E-2103 to see @jxbz and Laker Newhouse whiteboard the whole paper.

Jeremy Bernstein @jxbz

3 months ago

But actually this is the og way of doing it and should stop by E-2103 to see @jxbz and Laker Newhouse whiteboard the whole paper. https://t.co/NjV3qnxCaK

3 21 200 30K 99

Download Image

1 6 75 8K 11

Download Image

Jyo Pari @jyo_pari

3 months ago

If you are interested in questioning how we should pretrain models and create new architectures for general reasoning - then checkout E606 @ ICML, our position by @seungwookh and I on potential directions for the next generation reasoning models!

0 6 22 2K 7

Download Image

Seungwook Han @seungwookh

3 months ago

At #ICML 🇨🇦 this week. I'm convinced that the core computations are shared across modalities (vision, text, audio, etc). The real question is the (synthetic) generative process that ties them. Reach out if you have thoughts or want to chat!

0 3 16 2K 3

Seungwook Han @seungwookh

3 months ago

wholeheartedly agree with this direction that games can be a good playground for learning reasoning. makes us think what other synthetic environments we can design and grow over complexity

Bo Liu (Benjamin Liu) @Benjamin_eecs

3 months ago

wholeheartedly agree with this direction that games can be a good playground for learning reasoning. makes us think what other synthetic environments we can design and grow over complexity