Support this work: donate.sybilsolutions.ai

REAP surfaces: GLM | MiniMax | Qwen | Gemma | Paper | Code | PR17 | Cerebras Collection

Qwen3.5-122B-A10B-REAP-20

20% expert-pruned variant of Qwen3.5-122B-A10B using REAP (Routing-Enhanced Activation Pruning).

Model Details

Property	Value
Base Model	Qwen/Qwen3.5-122B-A10B
Architecture	Qwen3.5 MoE (GDN + Full Attention)
Original Experts	256 per layer
Pruned Experts	205 per layer (20% removed)
Active Parameters	~10B per token
Pruning Method	REAP with targeted refusal preservation
Preserve Threshold	80% (super-expert protection)
Calibration	reap-calibration-data-v1 — 23k benchmark-free samples
Maintainer	0xSero
Organization	Sybil Solutions
Project	REAP PR17

Benchmark Results

Code Generation (EvalPlus)

Benchmark	Pass@1
HumanEval (base)	81.1%
HumanEval+ (base + extra)	76.8%
MBPP (base)	86.2%
MBPP+ (base + extra)	73.0%

Knowledge & Reasoning (lm-eval, 0-shot)

Task	Baseline	REAP-20	Retained
arc_challenge	63.4%	63.7%	100.5%
boolq	86.4%	82.7%	95.8%
hellaswag	85.9%	84.1%	97.9%
mathqa	68.5%	67.3%	98.1%
mmlu_world_religions	91.2%	86.0%	94.2%
openbookqa	46.4%	45.6%	98.3%
piqa	83.8%	82.3%	98.1%
truthfulqa_mc2	51.9%	52.4%	100.8%
winogrande	75.6%	75.5%	99.9%

Average capability retained: 97.9% after removing 20% of experts.

Usage

vllm serve 0xSero/Qwen3.5-122B-A10B-REAP-20 \
  --tensor-parallel-size 4 \
  --enable-expert-parallel \
  --max-model-len 8192 \
  --trust-remote-code \
  --language-model-only \
  --dtype bfloat16

Important: Use --language-model-only flag — this is a text-only checkpoint pruned from the multimodal base model.

What is REAP?

REAP (Routing-Enhanced Activation Pruning) removes the least-activated experts from MoE models while preserving critical capabilities. It uses router activation patterns from a calibration dataset to identify dispensable experts, with special protection for safety-critical behaviors.

License

Same license as the base model (Qwen).

Model tree for 0xSero/Qwen3.5-122B-A10B-REAP-20

Base model

Qwen/Qwen3.5-122B-A10B

Finetuned

(39)

this model

Quantizations

1 model

Space using 0xSero/Qwen3.5-122B-A10B-REAP-20 1

Paper for 0xSero/Qwen3.5-122B-A10B-REAP-20

REAP the Experts: Why Pruning Prevails for One-Shot MoE compression

Paper • 2510.13999 • Published Oct 15, 2025 • 19

0xSero
/

Qwen3.5-122B-A10B-REAP-20