Instructions to use IQuestLab/Fleming-R1-32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use IQuestLab/Fleming-R1-32B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="IQuestLab/Fleming-R1-32B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("IQuestLab/Fleming-R1-32B")
model = AutoModelForCausalLM.from_pretrained("IQuestLab/Fleming-R1-32B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use IQuestLab/Fleming-R1-32B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "IQuestLab/Fleming-R1-32B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IQuestLab/Fleming-R1-32B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/IQuestLab/Fleming-R1-32B

SGLang

How to use IQuestLab/Fleming-R1-32B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "IQuestLab/Fleming-R1-32B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IQuestLab/Fleming-R1-32B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "IQuestLab/Fleming-R1-32B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IQuestLab/Fleming-R1-32B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use IQuestLab/Fleming-R1-32B with Docker Model Runner:
```
docker model run hf.co/IQuestLab/Fleming-R1-32B
```

Fleming-R1-32B / README.md

thinksoso

Update README.md

0cc4bc8 verified 8 months ago

preview code

raw

history blame contribute delete

5.27 kB

	---
	library_name: transformers
	license: apache-2.0
	license_link: https://huggingface.co/UbiquantAI/Fleming-R1-32B/blob/main/LICENSE
	pipeline_tag: text-generation
	---

	# Fleming-R1-32B
	<p align="center" style="margin: 0;">
	<a href="https://github.com/UbiquantAI/Fleming-R1" aria-label="GitHub Repository" style="text-decoration:none;">
	<span style="display:inline-flex;align-items:center;gap:.35em;">
	<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16"
	width="16" height="16" aria-hidden="true"
	style="vertical-align:text-bottom;fill:currentColor;">
	<path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0016 8c0-4.42-3.58-8-8-8Z"/>
	</svg>
	<span>GitHub</span>
	</span>
	</a>
	<span style="margin:0 .75em;opacity:.6;">•</span>
	<a href="https://arxiv.org/abs/2509.15279" aria-label="Paper">📑 Paper</a>
	</p>

	## Highlights

	## 📖 Model Overview

	Fleming-R1 is a reasoning model for medical scenarios that can perform step-by-step analysis of complex problems and produce reliable answers. The model follows a training paradigm of “chain-of-thought cold start” plus large-scale reinforcement learning. On multiple medical benchmarks, the 7B version achieves SOTA among models of a similar size; the 32B version performs close to the much larger GPT-OSS-120B and shows stronger results on Chinese tasks.

	Model Features:

	* Reasoning-oriented data strategy Combines public medical datasets with knowledge graphs to improve coverage of rare diseases, medications, and multi-hop reasoning chains;
	* Chain-of-thought cold start Uses high-quality reasoning traces distilled from teacher models to guide the model in learning basic reasoning patterns;
	* Two-stage reinforcement learning Employs adaptive hard-negative mining to strengthen the model’s reasoning when facing difficult problems.

	## 📦 Releases

	- Fleming-R1-7B —— Trained on Qwen2.5-7B
	🤗 [`UbiquantAI/Fleming-R1-7B`](https://huggingface.co/UbiquantAI/Fleming-R1-7B)
	- Fleming-R1-32B —— Trained on Qwen3-32B
	🤗 [`UbiquantAI/Fleming-R1-32B`](https://huggingface.co/UbiquantAI/Fleming-R1-32B)

	## 📊 Performance

	### Main Benchmark Results

	<div align="center">
	<img src="images/exp_result.png" alt="Benchmark Results" width="60%">
	</div>

	### Reasoning Ability Comparison

	On the MedXpertQA benchmark, which evaluates medical reasoning ability, Fleming-R1 surpasses models of similar—and even larger—sizes, and is on par with certain closed-source models.

	<div align="center">
	<img src="images/size_compare.png" alt="Size comparison" width="60%">
	</div>

	## 🔧 Quick Start

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "UbiquantAI/Fleming-R1-32B"

	# load the tokenizer and the model
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)

	# prepare the model input
	prompt = "What should I do if I suddenly develop a fever?"
	messages = [
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	# conduct text completion
	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=32768
	)
	output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

	# parsing thinking content
	try:
	# rindex finding 151668 (</think>)
	index = len(output_ids) - output_ids[::-1].index(151668)
	except ValueError:
	index = 0

	thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
	content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

	print("thinking content:", thinking_content)
	print("content:", content)

	```

	## ⚠️ Safety Statement

	This project is for research and non-clinical reference only; it must not be used for actual diagnosis or treatment decisions.
	The generated reasoning traces are an auditable intermediate process and do not constitute medical advice.
	In medical scenarios, results must be reviewed and approved by qualified professionals, and all applicable laws, regulations, and privacy compliance requirements in your region must be followed.

	## 📚 Citation

	```bibtex
	@misc{flemingr1,
	title={Fleming-R1: Toward Expert-Level Medical Reasoning via Reinforcement Learning},
	author={Chi Liu and Derek Li and Yan Shu and Robin Chen and Derek Duan and Teng Fang and Bryan Dai},
	year={2025},
	eprint={2509.15279},
	archivePrefix={arXiv},
	primaryClass={cs.LG},
	url={https://arxiv.org/abs/2509.15279},
	}
	```