53 22 69

Ryan Marten

ryanmarten

https://ryanmarten.com

AI & ML interests

None yet

Recent Activity

published a dataset 1 day ago

harborframework/terminal-bench-3.0-lfs

new activity about 2 months ago

harborframework/parity-experiments:SpreadsheetBench adapter parity (claude-code + Haiku 4.5, 400 tasks × 3 trials)

new activity 2 months ago

harborframework/terminal-bench-2.0:Define 'harbor' as eval framework 🎉

View all activity

Organizations

published a dataset 1 day ago

harborframework/terminal-bench-3.0-lfs

Updated 1 day ago • 4

New activity in harborframework/parity-experiments about 2 months ago

SpreadsheetBench adapter parity (claude-code + Haiku 4.5, 400 tasks × 3 trials)

#106 opened about 2 months ago by

ryanmarten

New activity in harborframework/terminal-bench-2.0 2 months ago

Define 'harbor' as eval framework 🎉

#3 opened 2 months ago by

burtenshaw

updated a dataset 2 months ago

harborframework/terminal-bench-2.0

Benchmark • Updated Feb 17 • 3.74k • 26

New activity in harborframework/terminal-bench-2.0 2 months ago

Add an eval yaml to integrate this benchmark into Community Evals.

#1 opened 2 months ago by

burtenshaw

published a dataset 2 months ago

harborframework/terminal-bench-2.0

Benchmark • Updated Feb 17 • 3.74k • 26

liked a dataset 2 months ago

zai-org/terminal-bench-2-verified

Updated Feb 27 • 2.81k • 68

liked a dataset 4 months ago

open-thoughts/OpenThoughts-Agent-v1-SFT

Viewer • Updated Jan 27 • 15.2k • 2.53k • 87

updated a Space 5 months ago

README

🦀

liked a dataset 6 months ago

jupyter-agent/jupyter-agent-dataset

Viewer • Updated Sep 10, 2025 • 95.8k • 1.19k • 166

updated 2 datasets 8 months ago

ryanmarten/OpenThoughts-1k-sample

Viewer • Updated Aug 31, 2025 • 2k • 518k • 4

open-thoughts/OpenThoughts-114k

Viewer • Updated Aug 31, 2025 • 228k • 155k • 832

published a dataset 8 months ago

ryanmarten/OpenThoughts-1k-sample

Viewer • Updated Aug 31, 2025 • 2k • 518k • 4

liked a dataset 8 months ago

SWE-bench/SWE-smith-trajectories

Viewer • Updated Jul 19, 2025 • 76k • 2.91k • 58

liked a Space 10 months ago

OpenThoughts Benchmark Explorer

📊

Explore benchmark correlations and model performance

liked a model 11 months ago

open-thoughts/OpenThinker3-7B

Text Generation • 8B • Updated Jun 9, 2025 • 4.57k • • 135

updated 2 collections 11 months ago

Reasoning Models

Collection

53 items • Updated Jun 8, 2025 • 1

Reasoning Datasets

Collection

50 items • Updated Jun 8, 2025 • 11

liked a dataset 11 months ago

open-thoughts/OpenThoughts3-1.2M

Viewer • Updated Jun 9, 2025 • 1.2M • 16k • 224

authored a paper 11 months ago

OpenThoughts: Data Recipes for Reasoning Models

Paper • 2506.04178 • Published Jun 4, 2025 • 54