🪐 Qwen3.5-27B-GLM5.1-Distill-v1

📌 Model Overview

Model Name: Jackrong/Qwen3.5-27B-GLM5.1-Distill-v1
Base Model: Qwen3.5-27B
Training Type: Supervised Fine-Tuning
Parameter Scale: 27B
Training Framework: Unsloth

This model is a distilled variant of Qwen3.5-27B, trained on high-quality reasoning data derived from GLM-5.1.

The primary goals are to:

Improve structured reasoning ability
Enhance instruction-following consistency
Activate latent knowledge via better reasoning structure

📊 Training Data

Main Dataset

Jackrong/GLM-5.1-Reasoning-1M-Cleaned
Cleaned from the original Kassadin88/GLM-5.1-1000000x dataset.
Generated from a GLM-5.1 teacher model
Approximately 700x the scale of Qwen3.5-reasoning-700x
Training used a filtered subset, not the full source dataset.

Auxiliary Dataset

Jackrong/Qwen3.5-reasoning-700x

Training used Jackrong/GLM-5.1-Reasoning-1M-Cleaned, a cleaned derivative of Kassadin88/GLM-5.1-1000000x. Special thanks to Kassadin88 ❤️ for the original dataset. Please support the original author with a follow and a like. Only a quality-filtered subset was used for distillation, rather than the full original dataset.

🗺️ Training Pipeline Overview

Base Model (Qwen3.5-27B)
 │
 ▼
Qwen3.5-27B fine-tuned with Unsloth
 │
 ▼
Supervised Fine-Tuning (SFT) + LoRA
Distillation from GLM-5.1 reasoning data
 │
 ▼
Jackrong/Qwen3.5-27B-GLM5.1-Distill-v1

🧠 Example of Learned Reasoning Scaffold

This model learns a reasoning structure distilled from GLM-5.1 traces, rather than the previous Qwopus / Claude-style scaffold.

From the GLM-5.1 distillation data, the reasoning pattern is usually more task-first and structure-driven:

identify the core topic and task type
extract key constraints from the prompt
break the problem into smaller reasoning steps
connect mechanisms, formulas, or domain concepts
verify important assumptions before the final answer
produce a clear and organized response

A typical abstract scaffold looks like:

Example:

The user is asking about [Topic / Problem] under [Specific Constraints].
This is mainly a [reasoning / coding / math / STEM / instruction-following] task.

Understand the task
- What is being asked?
- What constraints or conditions must be satisfied?
Break down the problem
- Identify the key concepts, variables, or mechanisms.
- Separate the problem into smaller steps.
Reason step by step
- Apply the relevant principles or methods.
- Compare possible interpretations when needed.
- Check whether the assumptions are consistent.
Construct the final answer
- Present the result clearly.
- Keep the response organized and aligned with the user’s request.

Compared with the previous Claude-style reasoning scaffold, this GLM-5.1 distillation data is more focused on structured task decomposition, domain-aware reasoning, and final-answer organization.
For a 27B student model, the goal is not to copy the teacher perfectly, but to learn a cleaner reasoning procedure and produce more stable outputs.

✨ Data Advantages

Compared to typical SFT datasets:

High-quality chain-of-thought structure
Strong problem decomposition patterns
Wide domain coverage
Multilingual reasoning capability
Consistent instruction → reasoning → answer alignment

📈 Expected Improvements

This model is intended to deliver incremental but meaningful improvements in practical use:

Better multi-step reasoning stability
More structured and readable outputs
Improved instruction adherence
Slight improvements in complex problem solving

For 27B-scale models, gains from SFT are typically gradual rather than dramatic. The main benefit is usually better consistency, clearer reasoning, and stronger answer organization, rather than a sudden jump in raw capability.

🧩 Distillation Philosophy

This model treats distillation as more than simple output imitation.

The goal is not to make a 27B model copy the teacher token by token, but to transfer a stronger reasoning structure and problem-solving style into Qwen3.5-27B.

In this project, high-quality teacher data is valuable because it provides:

clearer reasoning organization
more consistent instruction-following behavior
better task decomposition patterns
cleaner reasoning-to-answer alignment

High-quality reasoning supervision can help the student model better use its existing knowledge, rather than simply replacing it with teacher outputs.

In practice, the expected gain is not necessarily a dramatic capability jump, but improved stability, structure, and consistency in complex reasoning tasks.

🔬 Supporting Evidence

Recent work:

Ren et al., 2026 — Rethinking Generalization in Reasoning SFT (arXiv:2604.06628)

Short-epoch reasoning SFT can underestimate generalization — in-domain gains may appear early, while out-of-domain improvements often require sufficient optimization.

This paper shows that generalization in reasoning SFT is not fixed, but conditional — depending on optimization, data quality, and model capability.

Key takeaways:

Reasoning SFT can generalize when sufficiently trained (often showing a dip → recovery pattern)
High-quality long-CoT data enables cross-domain transfer
Stronger models learn reasoning structure, not just longer outputs (14B/27B/32B)
Gains are asymmetric — reasoning improves, while safety may degrade

For this project, that evidence matters because it supports a more patient interpretation of distillation-style SFT. If reasoning supervision is clean and sufficiently optimized, the resulting gain is not necessarily immediate or linear, but it can still be real and transferable.

This aligns closely with the philosophy of this release:

use clean, high-quality teacher data
avoid over-reading short training runs
treat reasoning SFT as a dynamic optimization process, not a static one-shot outcome
focus on whether the student learns better reasoning structure, not just longer outputs

This suggests that the improvement is not simply memorization or dataset overlap. Instead, sufficiently optimized reasoning SFT can help the student model:

🧠 Better utilize existing knowledge

🔍 Activate latent knowledge through structured reasoning

🏗️ Learn reasoning procedures, not just output format

📚 Resources & Guides

👉 GitHub Repository: Jackrong-llm-finetuning-guide Visit the repo to dive into the codebase and reproduce the results locally or on Colab.

📥 Core Technical Document

🔗 Qwopus3.5-27b Complete Fine-Tuning Guide (PDF)

The Full Pipeline: A step-by-step walkthrough—from downloading the base model and unifying heterogeneous data, to configuring trainer hyperparameters and publishing to Hugging Face.
Beginner Friendly: Includes an introductory guide to getting started with Google Colab and Unsloth.

A Note: My goal isn't just to detail a workflow, but to demystify LLM training. Beyond the social media hype, fine-tuning isn't an unattainable ritual—often, all you need is a Google account, a standard laptop, and relentless curiosity. All training and testing for this project were self-funded. If you find this model or guide helpful, a Star ⭐️ on GitHub would be the greatest encouragement. Thank you! 🙏

⚠️ Limitations & Intended Use

Hallucination Risk: While reasoning is strong, the model remains an autoregressive LLM; external facts provided during the thinking sequence may occasionally contain hallucinations if verifying real-world events.
Intended Scenario: Best suited for offline analytical tasks, coding, math, and heavy logic-dependent prompting where the user needs to transparently follow the AI's internal logic.
This model is a test version intended solely for learning and demonstration purposes, and is for academic research and technical exploration use only.
Developer Disclaimer: This is an independent, personal project. Since the developer lacks the specialized technical resources and infrastructure of a large-scale industrial lab, the model's reasoning chain (CoT) may occasionally exhibit instability, logic loops, or reasoning drift. Users are advised to use this model with these experimental limitations in mind.

🙏 Acknowledgements

This project would not have been possible without the support and contributions of the open-source community.

Special thanks to the Unsloth AI team for making efficient fine-tuning of large language models more accessible. This qwen3_5 model was trained with Unsloth and Hugging Face's TRL library, enabling a significantly faster and more practical fine-tuning workflow.

I would also like to acknowledge:

The GLM-5.1 team for inspiring this distillation direction and providing a strong teacher-model reference.
Special thanks to Kassadin88 ❤️ for creating the original GLM-5.1-1000000x dataset that this training pipeline ultimately builds upon.
Jackrong/GLM-5.1-Reasoning-1M-Cleaned for making the source data more consistent and practical for distillation training.
Qwen for providing the strong base model foundation.
Kyle @KyleHessling1 for testing, feedback, and community support.
The broader open-source community for continuously sharing tools, datasets, evaluation methods, and technical discussions.

📖 Citation

If you use this model in your research or projects, please cite:

@misc{jackrong_qwen35_27b_glm51_distill_v1,
  title        = {Jackrong/Qwen3.5-27B-GLM5.1-Distill-v1},
  author       = {Jackrong},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Jackrong/Qwen3.5-27B-GLM5.1-Distill-v1}}
}