Tiny models used for testing
Inference Optimization
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
Qwen3.6-35B-A3B mixed-precision HIGGS model variants, plus base FP16/FP8/NVFP4 references.
-
inference-optimization/Qwen3.6-35B-A3B-5.0-bits-mode-heuristic
Image-Text-to-Text • 24B • Updated • 98 -
inference-optimization/Qwen3.6-35B-A3B-5.0-bits-mode-hybrid
Image-Text-to-Text • 24B • Updated • 97 -
inference-optimization/Qwen3.6-35B-A3B-5.0-bits-mode-noise
Image-Text-to-Text • 24B • Updated • 61 -
inference-optimization/Qwen3.6-35B-A3B-5.5-bits-mode-heuristic
Image-Text-to-Text • 26B • Updated • 43
Tiny models used for testing
Qwen3.6-35B-A3B mixed-precision HIGGS model variants, plus base FP16/FP8/NVFP4 references.
-
inference-optimization/Qwen3.6-35B-A3B-5.0-bits-mode-heuristic
Image-Text-to-Text • 24B • Updated • 98 -
inference-optimization/Qwen3.6-35B-A3B-5.0-bits-mode-hybrid
Image-Text-to-Text • 24B • Updated • 97 -
inference-optimization/Qwen3.6-35B-A3B-5.0-bits-mode-noise
Image-Text-to-Text • 24B • Updated • 61 -
inference-optimization/Qwen3.6-35B-A3B-5.5-bits-mode-heuristic
Image-Text-to-Text • 26B • Updated • 43
models 381
inference-optimization/Qwen3-8B-speculator.dflash.swa.non-qwen3-step56712
2B • Updated • 264
inference-optimization/Laguna-XS.2-speculator.dflash-Qwen235B-500k-ckpt3
0.6B • Updated • 84
inference-optimization/Laguna-XS.2-speculator.dflash-Qwen235B-500k-ckpt2
0.6B • Updated • 44
inference-optimization/Qwen3-8B-speculator.dflash.swa.non-qwen3-step21k
2B • Updated • 67
inference-optimization/Qwen3-8B-from-Qwen3-8B_regen-speculators.eagle3-qwen3arch-ckpt1
1B • Updated • 10
inference-optimization/Laguna-XS.2-speculator.dflash-Qwen235B-500k-ckpt1-20260609-0052
0.6B • Updated • 5
inference-optimization/Qwen3-8B-speculator.dflash.swa.non-qwen3-ep0p11
2B • Updated • 108
inference-optimization/Laguna-XS.2-speculator.dflash-Qwen235B-500k-ckpt1
0.6B • Updated • 221
inference-optimization/Laguna-XS.2-speculator.dflash-Qwen235B-500k-ckpt0.5
0.6B • Updated • 14
inference-optimization/Qwen3-8B-speculator.dflash.swa.unified-ep0p28
2B • Updated
datasets 24
inference-optimization/every-eval-ever-demo
Updated • 17
inference-optimization/Qwen3.5-4B-responses
Viewer • Updated • 7.47k • 39
inference-optimization/Qwen3.5-0.8B-responses
Viewer • Updated • 7.47k • 72
inference-optimization/Qwen3.5-9B-responses
Viewer • Updated • 7.67k • 45
inference-optimization/Qwen3-8B-Regenerated-Collection
Preview • Updated • 189
inference-optimization/Qwen3-30B-A3B-responses
Preview • Updated • 62
inference-optimization/Qwen3-32B-responses
Preview • Updated • 40
inference-optimization/ctest-Qwen3.6-27B-speculator-dataset
Viewer • Updated • 5.61k • 34
inference-optimization/Gemma4-Responses-Nemotron
Viewer • Updated • 762k • 64 • 1
inference-optimization/Longbench_Samples_Specdec
Viewer • Updated • 160 • 67