Seungyeop Yi's picture

5 1

Seungyeop Yi

devpotatopotato

·

devpotatopotato

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

liked a dataset 8 days ago

amphora/ResearchMath-14k

authored a paper 8 days ago

ResearchMath-14K: Scaling Research-Level Mathematics via Agents

View all activity

Organizations

upvoted a paper 2 days ago

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

Paper • 2606.02404 • Published 5 days ago • 52

liked a dataset 8 days ago

amphora/ResearchMath-14k

Viewer • Updated 6 days ago • 14.1k • 1.74k • 46

authored a paper 8 days ago

ResearchMath-14K: Scaling Research-Level Mathematics via Agents

Paper • 2605.28003 • Published 10 days ago • 49

upvoted a paper 9 days ago

ResearchMath-14K: Scaling Research-Level Mathematics via Agents

Paper • 2605.28003 • Published 10 days ago • 49

upvoted a paper 11 days ago

Self-Improving CAD Generation Agents with Finite Element Analysis as Feedback

Paper • 2605.17448 • Published 20 days ago • 19

authored 2 papers 22 days ago

CostNav: A Navigation Benchmark for Real-World Economic-Cost Evaluation of Physical AI Agents

Paper • 2511.20216 • Published Nov 25, 2025

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

Paper • 2605.09063 • Published 28 days ago • 80

upvoted a paper 25 days ago

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

Paper • 2605.09063 • Published 28 days ago • 80

updated a model 2 months ago

pi-research/qwen3-8b-op-tr-20260326

8B • Updated Apr 5 • 1

published a model 2 months ago

pi-research/qwen3-8b-op-tr-20260326

8B • Updated Apr 5 • 1

upvoted a paper 3 months ago

Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math

Paper • 2602.06291 • Published Feb 6 • 24