CosmicFish-HRM

Paper: CosmicFish-HRM: Adaptive Reasoning via Hierarchical Recurrent Mechanisms in Compact Language Models

CosmicFish-HRM is a compact 82.77M parameter causal language model built around a Hierarchical Reasoning Module (HRM) that dynamically allocates reasoning compute during inference. Rather than applying a fixed number of forward-pass layers to every input, the model iterates through high-level and low-level reasoning cycles and uses a learned halting head to decide when to stop. Harder inputs trigger deeper reasoning trajectories while simpler ones halt early.

Built at Mistyoz AI, Hyderabad.

Architecture

Input Blocks (Transformer) -> HRM Core (H + L levels, variable steps) -> Output Blocks (Transformer) -> LM Head

The HRM core maintains two interacting recurrent states operating at different abstraction levels. The high-level module captures slower, more abstract reasoning while the low-level module handles finer-grained local computation. After each reasoning step a lightweight halting head decides whether to continue or stop, conditioned on the mean-pooled high-level state.

Key components:

Grouped-Query Attention (GQA) with 8 query heads and 4 KV heads
Rotary Positional Embeddings (RoPE)
SwiGLU feedforward layers
RMSNorm (pre-norm for I/O blocks, post-norm inside HRM)
Learned halt/continue Q-head controlling per-input reasoning depth
Step penalty in the training loss encouraging efficient halting

Model Specs

Parameter	Value
Total parameters	82.77M
Embedding dimension	448
Vocabulary size	50,304
Context length	512
Input transformer layers	6
Output transformer layers	6
HRM H-layers	4
HRM L-layers	4
Max HRM steps	16
Attention heads	8 (4 KV, GQA)

Evaluation

Zero-shot benchmark results:

Model	HellaSwag	PIQA	WinoGrande
CosmicFish-HRM (82M)	26.2	58.1	50.7
GPT-2 Small (117M)	29.7	62.5	50.7
OPT-125M	30.6	62.6	52.9
Pythia-160M	29.4	62.1	52.8

At compact scale a portion of the parameter budget is allocated to the HRM reasoning infrastructure rather than raw language modeling capacity, which accounts for the gap versus fixed-depth baselines of similar size. The paper argues this tradeoff becomes more favorable as model scale increases.

Adaptive Reasoning Behavior

The primary contribution of CosmicFish-HRM is not benchmark accuracy but adaptive compute allocation. The model uses different numbers of reasoning steps depending on input complexity:

Prompt	Mean HRM Steps
"The capital of France is"	2.78
"Photosynthesis is the process by which plants"	4.77
"If all roses are flowers and some flowers fade quickly..."	7.03
"A bat and a ball cost $1.10 in total..."	8.40

Average steps across benchmarks stay well below the 16-step maximum, with high variance across samples, confirming the halting mechanism is input-sensitive rather than collapsing to a fixed depth.

Benchmark	Mean Steps	Std Dev
HellaSwag	3.03	6.26
PIQA	1.87	5.13
WinoGrande	0.95	3.78
Overall	2.68	5.95

Usage

This model uses a custom architecture. The model code is included in this repo as modeling_hrm_cosmicfish.py.

Standalone chat script (downloads automatically):

pip install torch safetensors huggingface-hub transformers termcolor
python chat.py

Load manually:

import torch
import json
import tiktoken
from safetensors.torch import load_file
from huggingface_hub import snapshot_download
from modeling_hrm_cosmicfish import HRMCosmicFish, HRMCosmicFishConfig

cache_dir = snapshot_download("MistyozAI/CosmicFish-HRM")

with open(f"{cache_dir}/config.json") as f:
    cfg = json.load(f)

config = HRMCosmicFishConfig(
    vocab_size=cfg["vocab_size"],
    n_embd=cfg["n_embd"],
    block_size=cfg["block_size"],
    n_head=cfg["n_head"],
    n_kv_head=cfg["n_kv_head"],
    n_input_layers=cfg["n_input_layers"],
    n_output_layers=cfg["n_output_layers"],
    hrm_H_layers=cfg["hrm_H_layers"],
    hrm_L_layers=cfg["hrm_L_layers"],
    hrm_H_cycles=cfg["hrm_H_cycles"],
    hrm_L_cycles=cfg["hrm_L_cycles"],
    hrm_max_steps=cfg["hrm_max_steps"],
    dropout=0.0,
)

state_dict = load_file(f"{cache_dir}/model.safetensors")
model = HRMCosmicFish(config)
model.load_state_dict(state_dict)
model.eval()

tokenizer = tiktoken.get_encoding("gpt2")
prompt = "Artificial intelligence is"
tokens = tokenizer.encode(prompt)
idx = torch.tensor(tokens, dtype=torch.long).unsqueeze(0)

with torch.no_grad():
    output = model.generate(idx, max_new_tokens=100, temperature=0.7, top_k=40)

print(tokenizer.decode(output[0].tolist()))

Pytorch File: CF.pt

Pytorch File: Base.pt

Mistyoz AI, Hyderabad

Downloads last month: 46

Safetensors

Model size

0.1B params

Tensor type

F16

Paper for MistyozAI/CosmicFish-HRM

CosmicFish-HRM: Adaptive Reasoning via Hierarchical Recurrent Mechanisms in Compact Language Models

Paper • 2605.28919 • Published 4 days ago