terminal-bench A benchmark for LLMs on complicated tasks in the terminal. github.com/laude-institut…
0
8
8
2K
2