| - dataset: |
| id: allenai/olmOCR-bench |
| task_id: overall |
| value: 83.2 |
| notes: "H&F rewards omission, not transcription thus a model that outputs nothing scores perfectly. Excluded to keep Overall focused on real OCR quality." |
| source: |
| url: https://huggingface.co/papers/2601.14251 |
| name: LightOnOCR technical report |
| user: nielsr |
| - dataset: |
| id: allenai/olmOCR-bench |
| task_id: arxiv_math |
| value: 89.6 |
| source: |
| url: https://huggingface.co/papers/2601.14251 |
| name: LightOnOCR technical report |
| user: nielsr |
| - dataset: |
| id: allenai/olmOCR-bench |
| task_id: old_scans_math |
| value: 85.6 |
| source: |
| url: https://huggingface.co/papers/2601.14251 |
| name: LightOnOCR technical report |
| user: nielsr |
| - dataset: |
| id: allenai/olmOCR-bench |
| task_id: table_tests |
| value: 89.0 |
| source: |
| url: https://huggingface.co/papers/2601.14251 |
| name: LightOnOCR technical report |
| user: nielsr |
| - dataset: |
| id: allenai/olmOCR-bench |
| task_id: old_scans |
| value: 42.2 |
| source: |
| url: https://huggingface.co/papers/2601.14251 |
| name: LightOnOCR technical report |
| user: nielsr |
| - dataset: |
| id: allenai/olmOCR-bench |
| task_id: multi_column |
| value: 84.8 |
| source: |
| url: https://huggingface.co/papers/2601.14251 |
| name: LightOnOCR technical report |
| user: nielsr |
| - dataset: |
| id: allenai/olmOCR-bench |
| task_id: long_tiny_text |
| value: 91.4 |
| source: |
| url: https://huggingface.co/papers/2601.14251 |
| name: LightOnOCR technical report |
| user: nielsr |
| - dataset: |
| id: allenai/olmOCR-bench |
| task_id: headers_footers |
| value: 19.7 |
| notes: "Instead of removing headers and footers, our model is trained for full-page transcription and explicitly rewards their presence (via flipped RLVR tests), which lowers this score under the original benchmark scoring." |
| source: |
| url: https://huggingface.co/papers/2601.14251 |
| name: LightOnOCR technical report |
| user: staghado |
| - dataset: |
| id: allenai/olmOCR-bench |
| task_id: baseline |
| value: 99.6 |
| source: |
| url: https://huggingface.co/papers/2601.14251 |
| name: LightOnOCR technical report |
| user: staghado |