Support this work: donate.sybilsolutions.ai

REAP surfaces: GLM | MiniMax | Qwen | Gemma | Paper | Code | PR17 | Cerebras Collection

Qwen3.5-122B-A10B-REAP-20

20% expert-pruned variant of Qwen3.5-122B-A10B using REAP (Routing-Enhanced Activation Pruning).

Model Details

Property Value
Base Model Qwen/Qwen3.5-122B-A10B
Architecture Qwen3.5 MoE (GDN + Full Attention)
Original Experts 256 per layer
Pruned Experts 205 per layer (20% removed)
Active Parameters ~10B per token
Pruning Method REAP with targeted refusal preservation
Preserve Threshold 80% (super-expert protection)
Calibration reap-calibration-data-v1 — 23k benchmark-free samples
Maintainer 0xSero
Organization Sybil Solutions
Project REAP PR17

Benchmark Results

Code Generation (EvalPlus)

Benchmark Pass@1
HumanEval (base) 81.1%
HumanEval+ (base + extra) 76.8%
MBPP (base) 86.2%
MBPP+ (base + extra) 73.0%

Knowledge & Reasoning (lm-eval, 0-shot)

Task Baseline REAP-20 Retained
arc_challenge 63.4% 63.7% 100.5%
boolq 86.4% 82.7% 95.8%
hellaswag 85.9% 84.1% 97.9%
mathqa 68.5% 67.3% 98.1%
mmlu_world_religions 91.2% 86.0% 94.2%
openbookqa 46.4% 45.6% 98.3%
piqa 83.8% 82.3% 98.1%
truthfulqa_mc2 51.9% 52.4% 100.8%
winogrande 75.6% 75.5% 99.9%

Average capability retained: 97.9% after removing 20% of experts.

Usage

vllm serve 0xSero/Qwen3.5-122B-A10B-REAP-20 \
  --tensor-parallel-size 4 \
  --enable-expert-parallel \
  --max-model-len 8192 \
  --trust-remote-code \
  --language-model-only \
  --dtype bfloat16

Important: Use --language-model-only flag — this is a text-only checkpoint pruned from the multimodal base model.

What is REAP?

REAP (Routing-Enhanced Activation Pruning) removes the least-activated experts from MoE models while preserving critical capabilities. It uses router activation patterns from a calibration dataset to identify dispensable experts, with special protection for safety-critical behaviors.

License

Same license as the base model (Qwen).

Sponsors

Thank you for the kind sponsors, wouldn't be possible without them:

  • Nvidia
  • TNG Technology
  • Lambda
  • Prime Intellect
  • HotAisle
Downloads last month
85
Safetensors
Model size
99B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for 0xSero/Qwen3.5-122B-A10B-REAP-20

Finetuned
(39)
this model
Quantizations
1 model

Space using 0xSero/Qwen3.5-122B-A10B-REAP-20 1

Paper for 0xSero/Qwen3.5-122B-A10B-REAP-20