Instructions to use IQuestLab/Fleming-R1-32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use IQuestLab/Fleming-R1-32B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="IQuestLab/Fleming-R1-32B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("IQuestLab/Fleming-R1-32B") model = AutoModelForCausalLM.from_pretrained("IQuestLab/Fleming-R1-32B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use IQuestLab/Fleming-R1-32B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "IQuestLab/Fleming-R1-32B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "IQuestLab/Fleming-R1-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/IQuestLab/Fleming-R1-32B
- SGLang
How to use IQuestLab/Fleming-R1-32B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "IQuestLab/Fleming-R1-32B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "IQuestLab/Fleming-R1-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "IQuestLab/Fleming-R1-32B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "IQuestLab/Fleming-R1-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use IQuestLab/Fleming-R1-32B with Docker Model Runner:
docker model run hf.co/IQuestLab/Fleming-R1-32B
| library_name: transformers | |
| license: apache-2.0 | |
| license_link: https://huggingface.co/UbiquantAI/Fleming-R1-32B/blob/main/LICENSE | |
| pipeline_tag: text-generation | |
| # Fleming-R1-32B | |
| <p align="center" style="margin: 0;"> | |
| <a href="https://github.com/UbiquantAI/Fleming-R1" aria-label="GitHub Repository" style="text-decoration:none;"> | |
| <span style="display:inline-flex;align-items:center;gap:.35em;"> | |
| <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" | |
| width="16" height="16" aria-hidden="true" | |
| style="vertical-align:text-bottom;fill:currentColor;"> | |
| <path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0016 8c0-4.42-3.58-8-8-8Z"/> | |
| </svg> | |
| <span>GitHub</span> | |
| </span> | |
| </a> | |
| <span style="margin:0 .75em;opacity:.6;">•</span> | |
| <a href="https://arxiv.org/abs/2509.15279" aria-label="Paper">📑 Paper</a> | |
| </p> | |
| ## Highlights | |
| ## 📖 Model Overview | |
| Fleming-R1 is a reasoning model for medical scenarios that can perform step-by-step analysis of complex problems and produce reliable answers. The model follows a training paradigm of “chain-of-thought cold start” plus large-scale reinforcement learning. On multiple medical benchmarks, the 7B version achieves SOTA among models of a similar size; the 32B version performs close to the much larger GPT-OSS-120B and shows stronger results on Chinese tasks. | |
| **Model Features:** | |
| * **Reasoning-oriented data strategy** Combines public medical datasets with knowledge graphs to improve coverage of rare diseases, medications, and multi-hop reasoning chains; | |
| * **Chain-of-thought cold start** Uses high-quality reasoning traces distilled from teacher models to guide the model in learning basic reasoning patterns; | |
| * **Two-stage reinforcement learning** Employs adaptive hard-negative mining to strengthen the model’s reasoning when facing difficult problems. | |
| ## 📦 Releases | |
| - **Fleming-R1-7B** —— Trained on Qwen2.5-7B | |
| 🤗 [`UbiquantAI/Fleming-R1-7B`](https://huggingface.co/UbiquantAI/Fleming-R1-7B) | |
| - **Fleming-R1-32B** —— Trained on Qwen3-32B | |
| 🤗 [`UbiquantAI/Fleming-R1-32B`](https://huggingface.co/UbiquantAI/Fleming-R1-32B) | |
| ## 📊 Performance | |
| ### Main Benchmark Results | |
| <div align="center"> | |
| <img src="images/exp_result.png" alt="Benchmark Results" width="60%"> | |
| </div> | |
| ### Reasoning Ability Comparison | |
| On the MedXpertQA benchmark, which evaluates medical reasoning ability, Fleming-R1 surpasses models of similar—and even larger—sizes, and is on par with certain closed-source models. | |
| <div align="center"> | |
| <img src="images/size_compare.png" alt="Size comparison" width="60%"> | |
| </div> | |
| ## 🔧 Quick Start | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model_name = "UbiquantAI/Fleming-R1-32B" | |
| # load the tokenizer and the model | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_name, | |
| torch_dtype="auto", | |
| device_map="auto" | |
| ) | |
| # prepare the model input | |
| prompt = "What should I do if I suddenly develop a fever?" | |
| messages = [ | |
| {"role": "user", "content": prompt} | |
| ] | |
| text = tokenizer.apply_chat_template( | |
| messages, | |
| tokenize=False, | |
| add_generation_prompt=True | |
| ) | |
| model_inputs = tokenizer([text], return_tensors="pt").to(model.device) | |
| # conduct text completion | |
| generated_ids = model.generate( | |
| **model_inputs, | |
| max_new_tokens=32768 | |
| ) | |
| output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() | |
| # parsing thinking content | |
| try: | |
| # rindex finding 151668 (</think>) | |
| index = len(output_ids) - output_ids[::-1].index(151668) | |
| except ValueError: | |
| index = 0 | |
| thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n") | |
| content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n") | |
| print("thinking content:", thinking_content) | |
| print("content:", content) | |
| ``` | |
| ## ⚠️ Safety Statement | |
| This project is for research and non-clinical reference only; it must not be used for actual diagnosis or treatment decisions. | |
| The generated reasoning traces are an auditable intermediate process and do not constitute medical advice. | |
| In medical scenarios, results must be reviewed and approved by qualified professionals, and all applicable laws, regulations, and privacy compliance requirements in your region must be followed. | |
| ## 📚 Citation | |
| ```bibtex | |
| @misc{flemingr1, | |
| title={Fleming-R1: Toward Expert-Level Medical Reasoning via Reinforcement Learning}, | |
| author={Chi Liu and Derek Li and Yan Shu and Robin Chen and Derek Duan and Teng Fang and Bryan Dai}, | |
| year={2025}, | |
| eprint={2509.15279}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.LG}, | |
| url={https://arxiv.org/abs/2509.15279}, | |
| } | |
| ``` | |