Instructions to use MultiverseComputingCAI/Hypernova-60B-2605 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MultiverseComputingCAI/Hypernova-60B-2605 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="MultiverseComputingCAI/Hypernova-60B-2605")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("MultiverseComputingCAI/Hypernova-60B-2605")
model = AutoModelForCausalLM.from_pretrained("MultiverseComputingCAI/Hypernova-60B-2605")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use MultiverseComputingCAI/Hypernova-60B-2605 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MultiverseComputingCAI/Hypernova-60B-2605"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MultiverseComputingCAI/Hypernova-60B-2605",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/MultiverseComputingCAI/Hypernova-60B-2605

SGLang

How to use MultiverseComputingCAI/Hypernova-60B-2605 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MultiverseComputingCAI/Hypernova-60B-2605" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MultiverseComputingCAI/Hypernova-60B-2605",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MultiverseComputingCAI/Hypernova-60B-2605" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MultiverseComputingCAI/Hypernova-60B-2605",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use MultiverseComputingCAI/Hypernova-60B-2605 with Docker Model Runner:
```
docker model run hf.co/MultiverseComputingCAI/Hypernova-60B-2605
```

HyperNova 60B 2605

Powered by CompactifAI

Optimized for Efficient Inference · Reduced Memory Footprint · Native Tool Calling Support

Highlights
Model Overview
Key Characteristics
Quick Start
What's New in HyperNova 60B 2605
Tool Calling
Training & Fine-Tuning
Architecture
Evaluation & Benchmarks
Languages
Intended Use
Safety & Limitations
Model Information
Citation

Model Overview

HyperNova 60B 2605, developed by Multiverse Computing, is an open-weight model designed for powerful general reasoning, coding, and versatile developer use.

The model is instruction-tuned and supports native tool calling (function calling with defined schemas, structured outputs, and agent-style workflows). HyperNova 60B 2605 is intended for code generation, RAG, and tool-augmented applications.

Technical Deep Dive

For a detailed explanation of the compression architecture, model compression process, and benchmark results behind Hypernova-60B, read this full technical article by Johanna Angulo, Evaluation Manager at Multiverse Computing.

Key Characteristics

Characteristic	Description
🛠️ Tool calling	Native support; OpenAI-style function / tool calling schemas; suited to coding agents and structured outputs
🧠 Parameters	60B total parameters
📐 Architecture	Decoder-only Transformer
Primary language	English
Other languages	Not formally evaluated

Quick Start

This model can be loaded with the Transformers API. Use trust_remote_code=True (required for the gpt-oss architecture). Recommended approach: AutoModelForCausalLM with apply_chat_template:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "MultiverseComputingCAI/HyperNova-60B-2605"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True,
)
messages = [{"role": "user", "content": "What is a Hypernova?"}]
inputs = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    add_generation_prompt=True,
)
inputs = inputs.to(model.device)
attention_mask = torch.ones_like(inputs, dtype=torch.long, device=inputs.device)
outputs = model.generate(
    inputs,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    attention_mask=attention_mask,
)
reply = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(reply)

Alternatively you can use the pipeline API with trust_remote_code=True; the pipeline returns the full conversation structure, so extract the assistant message from outputs[0]["generated_text"] as needed.

What’s New in HyperNova 60B 2605

HyperNova 60B 2605 is an improved version of HyperNova 60B 2602, with this release focused on coding and general capability backed by higher scores on several benchmarks.

Summary

Improvement focus vs HyperNova 60B 2602: stronger coding (coding-style tasks) and general benchmark performance.
Tool use: Retains native support for function calling, structured outputs, and agent-style workflows (OpenAI-style schemas).
Reasoning: Compatible with configurable reasoning effort (e.g. low / medium / high in system prompt) where the format is preserved; full chain-of-thought available for debugging and analysis.
Evaluated on coding and tool-heavy benchmarks (e.g. Tau2-bench, Terminal-Bench) alongside general intelligence benchmarks.

Tool Calling

HyperNova 60B 2605 supports native tool use and is well-suited for:

Function calling with defined schemas
Structured outputs
Coding-oriented tool workflows (e.g. browser tasks, code execution where supported)

The model can detect when to invoke tools, emit structured JSON tool calls, and consume tool outputs to continue generation. Tool-calling behavior follows OpenAI-style schemas; compatibility refers to format and structure—exact parity with the base or other models is not guaranteed. Compared with HyperNova 60B 2602, this release improves on coding and general evaluation tracks—including IFBench, Tau2-bench, Terminal Bench, and AA-LCR under the high-reasoning setup reported below.

Example Tool Call

{
  "name": "get_weather",
  "arguments": {
    "city": "Paris",
    "date": "2026-02-10"
  }
}

Architecture

Model Specifications

Specification	Value
Total parameters	60B, 4.8B active MoE

Evaluation

Benchmarks Results

HyperNova 60B Benchmark Comparison

	GPT-OSS-120B Gemma4-31B	HyperNova 60B 2602 Gemma4-26BA4B	HyperNova 60B 2605 Qwen3.6-35BA3B
Knowledge & Reasoning
HLE	18.5	7.3	15.0
MMLU-Pro	79.6	74.3	76.8
AIME25	93.7	86.0	90.0
GPQA:d	74.6	65.6	71.9
IFBench	67.0	59.4	66.6
AA-LCR	49.0	35.7	40.3
Agent & Tool Use
Tau2-bench _Telecom	63.7	60.5	61.7
Coding
SciCode	41.5	33.5	36.0
LiveCodeBench	62.8	51.5	68.7
Terminal Bench	24.2	12.1	15.9
AIDER	43.6	26.2	34.2

Evaluation Methodology

Benchmark scores were obtained with the following setups. Methodology varies by benchmark family.

Inference:

Backend: VLLM 0.13.0
Decoding: temp 1.0, top_p 1.0
Reasoning Effort: high

Benchmark	Framework	Repeats	Other
HLE	NeMo-Skills	1	Judge: `openai/gpt-4o`
MMLU-Pro	NeMo-Skills	1
AIME25	NeMo-Skills	10
GPQA:d	NeMo-Skills	5
LiveCodeBench	NeMo-Skills	3	Split: `test_v5_2407_2412` (Jul–Dec 2024)
IFBench	NeMo-Skills	5
AA-LCR	NeMo-Skills	3	Judge: `Qwen/Qwen3-235B-A22B-Instruct-2507` (judge temp 0.7, top_p 0.8).
SciCode	NeMo-Skills	3
Tau2-bench (Telecom)	EvalScope 1.4.1	3	Judge / user simulator: temperature 0.7, timeout 600. Subset: telecom (default). Max steps: 100. Tool-call parser: openai (agent), hermes (judge).
Terminal-Bench Hard	laude-institute/harbor 0.1.43	3	max-model-len 131072. Subset: Artificial Analysis. Agent: terminus-2. Max episodes: 100
Aider polyglot	Aider-AI/aider	2	Dataset: polyglot-benchmark (225 exercises). Edit format: whole. Leaderboard-aligned; `--tries=2`.
StereoSet	inspect-ai 0.3.205 + inspect_evals 0.3.106	1	Multiple-choice / logprob; no external judge. Dataset: 2,115 examples (gender, profession, race, religion). Metrics: stereotype_score (50 = ideal), language_model_score, ICAT.
BBQ	inspect-ai 0.3.205 + inspect_evals 0.3.106	1	Multiple-choice; no external judge. Full dataset: 58,492 MCQ across 11 bias dimensions. Metric: accuracy.
StrongREJECT	inspect-ai 0.3.205 + inspect_evals 0.3.106	1	Dataset: 313 forbidden prompts. Judge: `openrouter/openai/gpt-4o`. Metrics: jailbreak_rate, strong_reject_metric (0.0 = ideal). `max_retries`: 3.
XSTest	inspect-ai 0.3.205 + inspect_evals 0.3.106	1	Dataset: safe (250) + unsafe (200); one subset per run. Judge: `openai/gpt-4o` . Metric: refusal_rate (low on safe, high on unsafe).

Inference Performance

Metrics reported

System Output Throughput (higher is better): Mean output tokens per second across all concurrent requests over the benchmarking phase.
Time to first token (TTFT) (lower is better): Median time to first token.
Model weights (lower is better):

Metric	GPT-OSS-120B	Hypernova 60B 2605
Concurrency	128	128
Throughput (tok/s)	3,821	5,210
TTFT (s)	7.04	4.85
Model weights (GB)	65	32

Performance evaluation conditions

Our performance evaluation follows the spirit of Artificial Analysis.

Inference library: vLLM 0.18.0
Monitoring libraries: GuideLLM, nvidia-ml-py
Hardware: 1× NVIDIA H200 Tensor Core GPU
Conditions: concurrency phases 128
Phase duration: Each phase lasts 3 minutes (excluding ramp-up and cool-down periods).
Workload shape: 1k input / 1k output
Decode: temperature: 0.0, top_p: 1.0

The figure below is a side-by-side comparison at concurrency = 128

Languages

Primary language: English
Other languages: Not formally evaluated

The model was trained primarily on English-language data. Performance on other languages may vary and has not been systematically measured.

Intended Use

Recommended Use Cases

Reasoning and analysis (with configurable reasoning effort where supported)
Tool-augmented applications, with emphasis on coding and general assistant use (function calling, web browsing, code execution, structured outputs)
Code generation and reasoning
Chatbots and virtual assistants
Retrieval-augmented generation (RAG)

Out-of-Scope Uses

Harmful, illegal, or deceptive content generation
Impersonation of real individuals without consent
High-risk decision-making without human oversight
Surveillance or tracking of individuals
Any use that violates applicable laws or regulations

Safety & Limitations

Known Limitations

English-centric training data.
Format: For best results, use the same harmony response format as gpt-oss-120b where applicable; behavior may differ otherwise.
Tool calling depends on correct schema and tool design; exact parity with gpt-oss-120b or other models is not guaranteed.

Recommendations

Validate tool outputs before execution
Use human oversight for critical applications
Perform task-specific evaluation prior to deployment

Metric	GPT-OSS 120B	HyperNova 60B 2605	Comment
StereoSet _{stereotype score}	55.0	56.0	Share of examples where the model prefers a stereotypical completion. Lower is better.
StereoSet _{language model score}	96.9	97.3	Share of examples where the model assigns higher probability to a meaningful vs. meaningless completion. Higher is better.
StereoSet _ICAT	87.3	85.6	Idealized CAT score combining language-model quality and stereotype bias. Higher is better.
StrongREJECT _{jailbreak rate}	0.003	0	Fraction of adversarial jailbreak prompts that elicit a harmful response. Lower is better; 0 is best.
StrongREJECT _metric	0.013	0	Overall harmful-response rate on the StrongREJECT benchmark. Lower is better; 0 is best.
XSTest _{safe refusal}	29.6	30.4	Rate at which benign prompts are incorrectly refused (over-refusal). Lower is better.
XSTest _{unsafe refusal}	99.5	99.0	Rate at which harmful prompts are correctly refused. Higher is better.
BBQ	96.9	96.4	Accuracy on ambiguous QA pairs designed to surface social bias. Higher is better.

Model Information

Field	Value
Model name	HyperNova 60B 2605
Version	2605
Release date	26/02/2026
Developed by	Multiverse Computing
License	Apache 2.0
Contact	business@multiversecomputing.com

Citation

If you use this model, please cite the base model and this variant:

@misc{openai2025gptoss120b,
  title         = {gpt-oss-120b \& gpt-oss-20b Model Card},
  author        = {OpenAI},
  year          = {2025},
  eprint        = {2508.10925},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url           = {https://arxiv.org/abs/2508.10925}
}
@misc{hypernova60b2605,
  title = {HyperNova 60B 2605: Model developed based on gpt-oss-120b},
  author = {Multiverse Computing},
  year = {2026},
  url = {https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2605},
  note = {Model developed based on openai/gpt-oss-120b using CompactifAI technology}
}

Built by Multiverse Computing · Report an issue · Discord

Downloads last month: 6,401

Safetensors

Model size

59B params

Tensor type

BF16

Model tree for MultiverseComputingCAI/Hypernova-60B-2605

Base model

openai/gpt-oss-120b

Quantized

MultiverseComputingCAI/HyperNova-60B