PhD student @VT_CS, supervised by @tuvllms. Interested in search-augmented LLMs. Ex AI resident @VinAI_Researchthinhphp.github.io Blacksburg, VAJoined July 2023
Announcing ROMA (Recursive Open Meta Agent): our new multi-agent framework that sets SOTA in reasoning + search.
Seal-0: 45.6%
FRAMES: 81.7%
SimpleQA: 93.9%
🧵 Read more about how recursive coordination lets agents tackle complex queries.
OpenAI realesed new paper.
"Why language models hallucinate"
Simple ans - LLMs hallucinate because training and evaluation reward guessing instead of admitting uncertainty.
The paper puts this on a statistical footing with simple, test-like incentives that reward confident…
New paper! We explore a radical paradigm for AI evals: assessing LLMs on *unsolved* questions.
Instead of contrived exams where progress ≠ value, we eval LLMs on organic, unsolved problems via reference-free LLM validation & community verification. LLMs solved ~10/500 so far:
Excited to share that our paper on efficient model development has been accepted to #EMNLP2025 Main conference @emnlpmeeting. Congratulations to my students @linusdd44804 and @Sub_RBala on their first PhD paper! 🎉
Excited to share that our paper on efficient model development has been accepted to #EMNLP2025 Main conference @emnlpmeeting. Congratulations to my students @linusdd44804 and @Sub_RBala on their first PhD paper! 🎉
A few weeks ago, I started a new job at @OpenAI. I wrote a document about my interview process and recommendations for anyone on the job market for AI research positions. I hope it's helpful!
docs.google.com/document/d/1ZV…
1/n I’m thrilled to share that our @OpenAI reasoning system scored high enough to achieve gold 🥇🥇 in one of the world’s top programming competitions - the 2025 International Olympiad in Informatics (IOI) - placing first among AI participants! 👨💻👨💻
Most search models need the cloud.
II-Search-4B doesn’t.
4B model tuned for reasoning with search tools, built for local use.
Performance of models 10x its size.
Search that is small, smart, and open.
🥳Congrats @ii_posts for an impressive result on SEAL-0, a challenging benchmark for search-augmented LLMs.
🤩Looking forward to the evaluation standards it shapes in this field.
📚Read more: arxiv.org/abs/2506.01062
🥳Congrats @ii_posts for an impressive result on SEAL-0, a challenging benchmark for search-augmented LLMs.
🤩Looking forward to the evaluation standards it shapes in this field.
📚Read more: arxiv.org/abs/2506.01062
. @EMostaque came back on the show to chat about:
--how we can't compete against AI agents
--his solution for a POSITIVE AI world
--Why UBI won't work but UBAI might..
--we need to be focused on incentivizing the right outcomes
-- Nations need sovereign AI stacks or…
We just released the evaluation of LLMs on the 2025 IMO on MathArena! Gemini scores best, but is still unlikely to achieve the bronze medal with its 31% score (13/42). 🧵(1/4)
Tokenization has been the final barrier to truly end-to-end language models.
We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
🔥 SEAL-0 Leaderboard 📈
Our results on SEAL-0 show a large room for improvement in LLMs' ability to reason over conflicting evidence. 🤯
👉Checkout our paper: arxiv.org/abs/2506.01062
👉Dataset: huggingface.co/datasets/vtllm…
✨ New paper ✨
🚨 Scaling test-time compute can lead to inverse or flattened scaling!!
We introduce SealQA, a new challenge benchmark w/ questions that trigger conflicting, ambiguous, or unhelpful web search results. Key takeaways:
➡️ Frontier LLMs struggle on Seal-0 (SealQA’s…
101 Followers 540 FollowingUG Research Assistant at MURGe-Lab w/ @mohitban47, undergrad at @unccs. Interested in LLM Compression, interpretability, and embedded applications
95 Followers 4K FollowingFrom Belarus, currently living in Dallas, single, looking for a new relationship or friends, I do not allow pornographic content here
17K Followers 6K FollowingNeurodivergent physics student with a keen interest in multisensory integration and emergent perception. Exploring research on a proposed ‘sixth sense’. Δ
97K Followers 8K FollowingCompiling in real-time, the race towards AGI.
The Largest Show on X for AI.
🗞️ Get my daily AI analysis newsletter to your email 👉 https://t.co/6LBxO8215l
454 Followers 4K FollowingAI Enthusiast | Trying to launch own AI app, currently @ Building something cool, Be Kind and Curious #AGI #ASI
#Bitcoin #NRG Forever
539 Followers 7K FollowingFounder @Setica —
🌐 https://t.co/k41rINekVX. Alien on planet Earth. Ai researcher and Indie Developer(web & apps).Building Ai models and Ai agents and Saas Apps
428 Followers 1K FollowingPh.D. Candidate at @umasscs. Prev @genentech @Google @IIITDelhi.
Dabbling with Interpretability, Retrieval and some Bioinformatics.
1K Followers 210 FollowingAssistant Professor at 東京大学 | UTokyo 🇯🇵, PhD @Yale 🇺🇸, MSc @ucddublin 🇮🇪, proud alumni from @yalenlp || Beginner for Oil painting 🖼️ and Piano 🎹
612K Followers 38 FollowingTo ensure that Artificial General Intelligence is open-source and not controlled by any single entity. @SentientEco @OpenAGISummit
244K Followers 21 FollowingWe’ll help you make it like nobody’s business. Multimodal media generation and editing tools to get your idea to production. Self-deploy? 👍 Need a partner? 🤝
23K Followers 87 FollowingMathematician, @UCBerkeley professor, author of LOVE & MATH (published in 20 languages), host of AfterMath series on YouTube, music expolorer as DJ Moonstein
55K Followers 0 FollowingWe are building a world class AI R&D company in Tokyo. We want to develop AI solutions for Japan’s needs, and democratize AI in Japan. https://t.co/1q07mb3TzE
15K Followers 146 FollowingCofounder/CEO @Genspark_ai | Serial entrepreneur, built business from 0 to $5.5B | Ex-CPO @Baidu Search, Ex-Principal Dev Mgr @Microsoft Bing
3K Followers 912 FollowingStartups, emotions & intuition, AI.
Learn this communication skill to be a better partner/friend/collaborator: https://t.co/HCagOrKpud
19K Followers 11 FollowingBot. I daily tweet progress towards machine learning and computer vision conference deadlines. Maintained by @chriswolfvision.
19K Followers 1K FollowingAgents @Meta MSL TBD Lab. previously posttraining research @OpenAI train LLMs to do things: deep research, chatgpt agent, etc. CS PhD @LTIatCMU