Nemotron-Orchestrator-8B-Qwen3-BF16-NOESIS

BF16 reference checkpoint of nvidia/Nemotron-Orchestrator-8B, losslessly cast from the original FP32 release.

Released as part of the NOESIS Professional Multilingual Dubbing Automation Platform (framework: DHCF-FNO — Deterministic Hybrid Control Framework for Frozen Neural Operators).

Founder: Ilia Bolotnikov
Organization: AMAImedia.com
X (Twitter): @AMAImediacom
LinkedIn: Ilia Bolotnikov
Telegram: @djbionicl
NOESIS version: v14.6
Release date: 2026-04

⚠️ License notice

This model inherits the NVIDIA Open Model License from the upstream nvidia/Nemotron-Orchestrator-8B. The base model is designated by NVIDIA as "for research and development only".

This BF16 derivative is published as a bandwidth-friendly reference checkpoint for the broader research and development community. Users are responsible for compliance with NVIDIA's license terms — see the LICENSE file in this repository for the full text.

Why this BF16 release exists

The original NVIDIA release ships in FP32 (~32 GB on disk). Most modern inference and quantization tooling (HuggingFace Transformers, vLLM, SGLang, AutoAWQ, AutoGPTQ, llama.cpp BF16 conversion) immediately casts to BF16 on load. Publishing a pre-cast BF16 checkpoint:

Halves download bandwidth (16 GB vs 32 GB)
Halves disk footprint
Skips a slow load-time cast for users
Provides a clean BF16 baseline for downstream quantization recipes

The cast is performed via torch.Tensor.to(dtype=torch.bfloat16) with IEEE 754 round-to-nearest-even (PyTorch default). BF16 has the same 8-bit exponent range as FP32 and 7 bits of mantissa, which is lossless for inference-time use of weight tensors.

Model summary

Property	Value
Base model	nvidia/Nemotron-Orchestrator-8B
Underlying architecture	Qwen3-8B (decoder-only transformer, dense, NOT MoE)
Source precision	FP32
This release precision	BF16
Vocab size	151936
Language	English (per base model)
Disk footprint	~16 GB
Inference VRAM	~17 GB BF16 (full-resident on 24 GB+ GPU)

For low-VRAM (6-12 GB) inference, see the AWQ INT4 sibling release: amaimedia/Nemotron-Orchestrator-8B-Qwen3-AWQ-INT4-NOESIS.

How to use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "amaimedia/Nemotron-Orchestrator-8B-Qwen3-BF16-NOESIS"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

prompt = "Plan a multi-step task: find recent AWQ papers, summarize the top three."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(out[0], skip_special_tokens=True))

NOESIS context

This BF16 checkpoint is the source artifact for the AWQ INT4 quantization used as the English orchestration teacher for NOESIS Specialist M9-ORCH-4B during knowledge distillation.

NOESIS is a 9-specialist dubbing automation platform — see the NOESIS collection for the full specialist family.

Acknowledgements & citation

Base model: ToolOrchestra by NVIDIA & University of Hong Kong.

@misc{toolorchestra,
  title  = {ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration},
  author = {Hongjin Su and Shizhe Diao and Ximing Lu and others},
  year   = {2025},
  eprint = {2511.21689},
  archivePrefix = {arXiv}
}

NOESIS:

@misc{noesis_v14,
  title  = {NOESIS v14.6: DHCF-FNO Multilingual Dubbing Platform},
  author = {Bolotnikov, Ilia},
  year   = {2026},
  publisher = {AMAImedia},
  url    = {https://amaimedia.com}
}