Mian Zhang @_Guuuuuuuu_

CS PhD in UTD mianzhang.github.io Dallas, TX Joined May 2019

Tweets

33
Followers

122
Following

382
Likes

450

Mian Zhang @_Guuuuuuuu_

3 days ago

CWM shows that reasoning can benefit from step-by-step simulation of code execution. 🔹 Our latest evaluation results show that CWM achieves 47% accuracy on LogicIFEval, ranking #1 among all tested public models! 📄 LogicIF Paper: arxiv.org/pdf/2508.09125 This result suggests…

AI at Meta @AIatMeta

6 days ago

59 216 1K 248K 481

Download Video

0 3 6 824 1

KaiqiangSong @SongKaiqiang

a month ago

LiveMCP-101: a rigorous benchmark for MCP-enabled agents on real-world, multi-tool tasks. 101 queries, 41 servers / 260 tools, ground-truth execution-plan eval. Even frontier LLMs have <60% success—lots to improve. Paper: arxiv.org/abs/2508.15760 #LLM #Agents #MCP

0 1 5 168 0

Download Image

Kaiyu He @KaiyuHeHe

4 months ago

(1/6) A pathway for an LLM to become a great scientist like Isaac Newton!✨ 🚨 New survey out! We explore how LLMs can be used for hypothesis discovery—uncovering new knowledge via reasoning. This is the first survey to present a unified framework connecting Abduction, Induction,…

1 2 4 2K 1

Download Image

Zhiyu Zoey Chen @ZhiyuChen4

4 months ago

Check out our new work investigating how RAG deals with retrieved info vs. parametric knowledge under different user instructions. We conduct systematic analysis to showcase LLM performances under a spectrum of real world use cases. 📄preprint: arxiv.org/abs/2502.19779…

Peilin Wu @qualidea1217

4 months ago

1 0 6 2K 0

Download Image

0 1 4 963 1

Mian Zhang @_Guuuuuuuu_

4 months ago

We find suboptimal agentic searches are often caused by LLMs’ limited awareness of their own knowledge boundaries and propose an uncertainty-aware variant of GRPO to help mitigate suboptimal searches. Check out the paper for more analysis and results!