Tech Dev Notes @techdevnotes, Twitter Profile

Tech Dev Notes @techdevnotes

7 months ago

Claude 3.7 Sonnet Extended Thinking vs Grok 3 Thinking

6 5 49 3K 6

Download Image

f1shy-dev @vishyfishy2

7 months ago

@techdevnotes hmm... GPQA/AIME: Sonnet 3.7's high scores use internal scoring with parallel test time compute, while o1 and Grok 3's high results use majority voting with N=64 samples.

0 1 1 218 0

aurelien @aurelien0012

7 months ago

@techdevnotes damn so grok 3 performs better across the board? holy fuck

0 1 9 198 0

Jeffrey @JefeMcOwnage

7 months ago

@techdevnotes Lol @Kr00ney when shown that Grok 3 is better than Claude 3.7

0 0 1 132 0

Download Gif

Nathan Spencer @NateSpencerWx

7 months ago

@techdevnotes Do better

0 0 0 68 0