CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves
Abstract
CurveBench presents a benchmark for hierarchical topological reasoning using visual inputs, demonstrating significant challenges in exact topology-aware visual reasoning even with advanced models.
We introduce CurveBench, a benchmark for hierarchical topological reasoning from visual input. CurveBench consists of 756 images of pairwise non-intersecting Jordan curves across easy, polygonal, topographic-inspired, maze-like, and dense counting configurations. Each image is annotated with a rooted tree encoding the containment relations between planar regions. We formulate the task as structured prediction: given an image, a model must recover the full rooted containment tree induced by the curves. Despite the visual simplicity of the task, the strongest evaluated model, Gemini 3.1 Pro, achieves only 71.1\% tree-generation accuracy on CurveBench-Easy and 19.1\% on CurveBench-Hard. We further demonstrate benchmark utility through RLVR-style fine-tuning of open-weight vision-language models. Our trained Qwen3-VL-8B model improves over Qwen-3-VL-8B-Thinking from 2.8\% to 33.3\% tree-generation accuracy on CurveBench-Easy, exceeding GPT-5.4 and Claude Opus 4.5 under our evaluation protocol. The remaining gap, especially on CurveBench-Hard, shows that exact topology-aware visual reasoning remains far from solved.
Community
We introduce CurveBench, a benchmark for testing whether vision-language models can recover hierarchical region-containment trees from images of non-intersecting Jordan curves. The task targets visual topology and structured reasoning beyond simple object recognition, counting, or OCR.
The Hugging Face collection includes the paper, the CurveBench and CurveBench-Easy datasets, evaluation code, ground-truth generation resources, and fine-tuning artifacts. Our results show that even strong frontier VLMs struggle substantially on the harder settings, while fine-tuned open models improve but remain far from solving the task.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- SketchVLM: Vision language models can annotate images to explain thoughts and guide users (2026)
- Fine-tuning a vision-language model for fracture-surface morphology recognition (2026)
- From Pixels to BFS: High Maze Accuracy Does Not Imply Visual Planning (2026)
- 3D Primitives are a Spatial Language for VLMs (2026)
- TraversalBench: Challenging Paths to Follow for Vision Language Models (2026)
- Chain-of-Procedure: Hierarchical Visual-Language Reasoning for Procedural QA (2026)
- LiveMathematicianBench: A Live Benchmark for Mathematician-Level Reasoning with Proof Sketches (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.14068 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 3
AmirMohseni/curvebench-gemma-3-12b
Datasets citing this paper 3
AmirMohseni/CurveBench
AmirMohseni/CurveBench-Easy
Spaces citing this paper 0
No Space linking this paper