Thinh @thinhphp_vt

PhD student @VT_CS, supervised by @tuvllms. Interested in search-augmented LLMs. Ex AI resident @VinAI_Research thinhphp.github.io Blacksburg, VA Joined July 2023

Tweets

22
Followers

81
Following

522
Likes

77

Sentient @SentientAGI

4 weeks ago

Announcing ROMA (Recursive Open Meta Agent): our new multi-agent framework that sets SOTA in reasoning + search. Seal-0: 45.6% FRAMES: 81.7% SimpleQA: 93.9% 🧵 Read more about how recursive coordination lets agents tackle complex queries.

714 640 3K 454K 208

Download Image

Rohan Paul @rohanpaul_ai

4 weeks ago

OpenAI realesed new paper. "Why language models hallucinate" Simple ans - LLMs hallucinate because training and evaluation reward guessing instead of admitting uncertainty. The paper puts this on a statistical footing with simple, test-like incentives that reward confident…

97 344 2K 368K 2K

Download Image

Ken Liu @kenziyuliu

a month ago

New paper! We explore a radical paradigm for AI evals: assessing LLMs on *unsolved* questions. Instead of contrived exams where progress ≠ value, we eval LLMs on organic, unsolved problems via reference-free LLM validation & community verification. LLMs solved ~10/500 so far:

15 75 368 64K 201

Download Image

Thinh @thinhphp_vt

a month ago

DeepSeek achieved a strong result on SEAL0, a challenging benchmark for reasoning with conflicting search results. 🎊

DeepSeek @deepseek_ai

a month ago

DeepSeek achieved a strong result on SEAL0, a challenging benchmark for reasoning with conflicting search results. 🎊

11 53 852 189K 65

Download Image

0 1 5 230 0

Tu Vu @tuvllms

a month ago

Excited to share that our paper on efficient model development has been accepted to #EMNLP2025 Main conference @emnlpmeeting. Congratulations to my students @linusdd44804 and @Sub_RBala on their first PhD paper! 🎉

Tu Vu @tuvllms

6 months ago

14 93 445 47K 278

Download Image

0 10 49 5K 10

basvanopheusden @basvanopheusden

2 months ago

A few weeks ago, I started a new job at @OpenAI. I wrote a document about my interview process and recommendations for anyone on the job market for AI research positions. I hope it's helpful! docs.google.com/document/d/1ZV…

63 353 4K 325K 7K

Sheryl Hsu @SherylHsu02

2 months ago

1/n I’m thrilled to share that our @OpenAI reasoning system scored high enough to achieve gold 🥇🥇 in one of the world’s top programming competitions - the 2025 International Olympiad in Informatics (IOI) - placing first among AI participants! 👨‍💻👨‍💻

208 288 3K 2.4M 449

Download Image

Intelligent Internet @ii_posts

2 months ago

Most search models need the cloud. II-Search-4B doesn’t. 4B model tuned for reasoning with search tools, built for local use. Performance of models 10x its size. Search that is small, smart, and open.

22 116 670 499K 521

Download Video

Thinh @thinhphp_vt

2 months ago

🥳Congrats @ii_posts for an impressive result on SEAL-0, a challenging benchmark for search-augmented LLMs. 🤩Looking forward to the evaluation standards it shapes in this field. 📚Read more: arxiv.org/abs/2506.01062

Intelligent Internet @ii_posts

2 months ago

1 2 23 4K 5

Download Image

0 0 6 173 0

Peter H. Diamandis, MD @PeterDiamandis

2 months ago

. @EMostaque came back on the show to chat about: --how we can't compete against AI agents --his solution for a POSITIVE AI world --Why UBI won't work but UBAI might.. --we need to be focused on incentivizing the right outcomes -- Nations need sovereign AI stacks or…

24 74 262 27K 71

Download Video

Jasper Dekoninck @j_dekoninck

3 months ago

We just released the evaluation of LLMs on the 2025 IMO on MathArena! Gemini scores best, but is still unlikely to achieve the bronze medal with its 31% score (13/42). 🧵(1/4)

13 40 221 37K 61

Download Image

Thinh @thinhphp_vt

3 months ago

We just evaluated Grok 4 on our SEAL-0 dataset 👍Try it: huggingface.co/datasets/vtllm…

0 2 14 3K 1

Download Image

Sukjun (June) Hwang @sukjun_hwang

3 months ago

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

98 744 5K 746K 4K

Download Gif

Thinh @thinhphp_vt

3 months ago

🔥 SEAL-0 Leaderboard 📈 Our results on SEAL-0 show a large room for improvement in LLMs' ability to reason over conflicting evidence. 🤯 👉Checkout our paper: arxiv.org/abs/2506.01062 👉Dataset: huggingface.co/datasets/vtllm…

0 5 14 2K 5

Download Image

Thinh @thinhphp_vt

4 months ago

My first work done during my PhD 🥳🥳🥳

Tu Vu @tuvllms

4 months ago

My first work done during my PhD 🥳🥳🥳

4 40 147 17K 67

Download Image

3 1 21 3K 2

Tu Vu @tuvllms

4 months ago

✨ New paper ✨ 🚨 Scaling test-time compute can lead to inverse or flattened scaling!! We introduce SealQA, a new challenge benchmark w/ questions that trigger conflicting, ambiguous, or unhelpful web search results. Key takeaways: ➡️ Frontier LLMs struggle on Seal-0 (SealQA’s…