Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
In a Training Loop ๐
52
23
11
Wanghan Xu
black-yt
Follow
WenlongZhang517's profile picture
leouon's profile picture
SaylorTwift's profile picture
8 followers
ยท
3 following
https://black-yt.github.io/
wanghan_xu
black-yt
wanghan-xu
AI & ML interests
LLMs, Agents, AI Scientists, Generative AI
Recent Activity
updated
a dataset
1 day ago
InternScience/ResearchClawBench
updated
a Space
1 day ago
InternScience/ResearchHarness
replied
to
their
post
2 days ago
Hey all โ our ResearchClawBench leaderboard just updated ๐ฅ We let AI do real science: 40 tasks across 10 disciplines, compared to human papers. Hard example? ๐๏ธ Glacier mass change โ AI must integrate 233 datasets from 35 teams, 4 methods, reproduce 6542ยฑ387 Gt ice loss vs IPCC. No toy problems. Latest leaderboard (2026-06-09) ๐: Agents: ๐ฅ Claude Code 21.5 (50 = match human), $5.3; ๐ฅ EvoScientist 18.8, $4.1; ๐ฅ Codex CLI 18.4, just $2.0 LLMs+Harness: ๐ฅ Claude-Opus-4.8 21.1, $4.0; ๐ฅ Claude-Opus-4.7 20.7; ๐ฅ MiniMax-M3 19.8, only $0.45; Qwen3.7-Max 18.7, $0.42, 11min ๐ฅ Claude still king, but MiniMax/Qwen/DeepSeek are crazy cheap and competitive. Expensive isn't always better. ๐ Code & star: https://github.com/InternScience/ResearchClawBench ๐ Website: https://internscience.github.io/ResearchClawBench-Home/ ๐ค Upvote paper: https://huggingface.co/papers/2606.07591
View all activity
Organizations
black-yt
's datasets
1
Sort:ย Recently updated
black-yt/Manalyzer
Viewer
โข
Updated
16 days ago
โข
6.66k
โข
306