Tiny models used for testing
Inference Optimization
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
Stitched HIGGS Llama3 8B mixed-precision model variants.
-
meta-llama/Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 9.89M • • 6.05k -
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic
Text Generation • 8B • Updated • 40.2k • 9 -
RedHatAI/Llama-3.1-8B-Instruct-NVFP4
Text Generation • 5B • Updated • 15.3k • 1 -
Qwen/Qwen3-8B
Text Generation • 8B • Updated • 10.8M • • 1.13k
Mixed Precision Models
-
meta-llama/Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 9.89M • • 6.05k -
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic
Text Generation • 8B • Updated • 40.2k • 9 -
RedHatAI/Llama-3.1-8B-Instruct-NVFP4
Text Generation • 5B • Updated • 15.3k • 1 -
inference-optimization/Llama-3.1-8B-Instruct_5_bits_mode_hybrid
6B • Updated • 1
Qwen3.6-35B-A3B mixed-precision HIGGS model variants, plus base FP16/FP8/NVFP4 references.
-
inference-optimization/Qwen3.6-35B-A3B-5.0-bits-mode-heuristic
Image-Text-to-Text • 24B • Updated • 98 -
inference-optimization/Qwen3.6-35B-A3B-5.0-bits-mode-hybrid
Image-Text-to-Text • 24B • Updated • 99 -
inference-optimization/Qwen3.6-35B-A3B-5.0-bits-mode-noise
Image-Text-to-Text • 24B • Updated • 62 -
inference-optimization/Qwen3.6-35B-A3B-5.5-bits-mode-heuristic
Image-Text-to-Text • 26B • Updated • 45
-
meta-llama/Llama-3.2-1B-Instruct
Text Generation • 1B • Updated • 7.4M • • 1.47k -
inference-optimization/Llama-3.2-1B-Instruct-FP8-Dynamic
1B • Updated • 5 -
inference-optimization/Llama-3.2-1B-Instruct-NVFP4
0.8B • Updated • 86 -
inference-optimization/Llama-3.2-1B-Instruct-5-bits-mode-heuristic-per-tensor
1B • Updated • 1
-
inference-optimization/Qwen3-Next-80B-A3B-Instruct-MTP-ultrachat-epoch3
2B • Updated • 1 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct-MTP-ultrachat-epoch1
2B • Updated • 5 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct-MTP-ultrachat-epoch2
2B • Updated • 1 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct-GSM8K-MTP-finetuned
81B • Updated • 3
Tiny models used for testing
Qwen3.6-35B-A3B mixed-precision HIGGS model variants, plus base FP16/FP8/NVFP4 references.
-
inference-optimization/Qwen3.6-35B-A3B-5.0-bits-mode-heuristic
Image-Text-to-Text • 24B • Updated • 98 -
inference-optimization/Qwen3.6-35B-A3B-5.0-bits-mode-hybrid
Image-Text-to-Text • 24B • Updated • 99 -
inference-optimization/Qwen3.6-35B-A3B-5.0-bits-mode-noise
Image-Text-to-Text • 24B • Updated • 62 -
inference-optimization/Qwen3.6-35B-A3B-5.5-bits-mode-heuristic
Image-Text-to-Text • 26B • Updated • 45
Stitched HIGGS Llama3 8B mixed-precision model variants.
-
meta-llama/Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 9.89M • • 6.05k -
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic
Text Generation • 8B • Updated • 40.2k • 9 -
RedHatAI/Llama-3.1-8B-Instruct-NVFP4
Text Generation • 5B • Updated • 15.3k • 1 -
Qwen/Qwen3-8B
Text Generation • 8B • Updated • 10.8M • • 1.13k
-
meta-llama/Llama-3.2-1B-Instruct
Text Generation • 1B • Updated • 7.4M • • 1.47k -
inference-optimization/Llama-3.2-1B-Instruct-FP8-Dynamic
1B • Updated • 5 -
inference-optimization/Llama-3.2-1B-Instruct-NVFP4
0.8B • Updated • 86 -
inference-optimization/Llama-3.2-1B-Instruct-5-bits-mode-heuristic-per-tensor
1B • Updated • 1
Mixed Precision Models
-
meta-llama/Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 9.89M • • 6.05k -
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic
Text Generation • 8B • Updated • 40.2k • 9 -
RedHatAI/Llama-3.1-8B-Instruct-NVFP4
Text Generation • 5B • Updated • 15.3k • 1 -
inference-optimization/Llama-3.1-8B-Instruct_5_bits_mode_hybrid
6B • Updated • 1
-
inference-optimization/Qwen3-Next-80B-A3B-Instruct-MTP-ultrachat-epoch3
2B • Updated • 1 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct-MTP-ultrachat-epoch1
2B • Updated • 5 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct-MTP-ultrachat-epoch2
2B • Updated • 1 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct-GSM8K-MTP-finetuned
81B • Updated • 3