Wanghan Xu's picture

In a Training Loop 🔄

Wanghan Xu

black-yt

·

https://black-yt.github.io/

AI & ML interests

LLMs, Agents, AI Scientists, Generative AI

Recent Activity

updated a dataset 1 day ago

InternScience/ResearchClawBench

updated a Space 1 day ago

InternScience/ResearchHarness

repliedto their post 2 days ago

Hey all — our ResearchClawBench leaderboard just updated 🔥 We let AI do real science: 40 tasks across 10 disciplines, compared to human papers. Hard example? 🏔️ Glacier mass change — AI must integrate 233 datasets from 35 teams, 4 methods, reproduce 6542±387 Gt ice loss vs IPCC. No toy problems. Latest leaderboard (2026-06-09) 📊: Agents: 🥇 Claude Code 21.5 (50 = match human), $5.3; 🥈 EvoScientist 18.8, $4.1; 🥉 Codex CLI 18.4, just $2.0 LLMs+Harness: 🥇 Claude-Opus-4.8 21.1, $4.0; 🥈 Claude-Opus-4.7 20.7; 🥉 MiniMax-M3 19.8, only $0.45; Qwen3.7-Max 18.7, $0.42, 11min 💥 Claude still king, but MiniMax/Qwen/DeepSeek are crazy cheap and competitive. Expensive isn't always better. 📎 Code & star: https://github.com/InternScience/ResearchClawBench 🏠 Website: https://internscience.github.io/ResearchClawBench-Home/ 🤗 Upvote paper: https://huggingface.co/papers/2606.07591

View all activity

Organizations

black-yt 's datasets 1

black-yt/Manalyzer

Viewer • Updated 16 days ago • 6.66k • 306