CosmicFish-HRM

Paper: CosmicFish-HRM: Adaptive Reasoning via Hierarchical Recurrent Mechanisms in Compact Language Models

GitHub: MistyozAI/CosmicFish-HRM

CosmicFish-HRM is a compact 82.77M parameter causal language model built around a Hierarchical Reasoning Module (HRM) that dynamically allocates reasoning compute during inference. Rather than applying a fixed number of forward-pass layers to every input, the model iterates through high-level and low-level reasoning cycles and uses a learned halting head to decide when to stop. Harder inputs trigger deeper reasoning trajectories while simpler ones halt early.

Built at Mistyoz AI, Hyderabad.


Architecture

Architecture

Input Blocks (Transformer) -> HRM Core (H + L levels, variable steps) -> Output Blocks (Transformer) -> LM Head

The HRM core maintains two interacting recurrent states operating at different abstraction levels. The high-level module captures slower, more abstract reasoning while the low-level module handles finer-grained local computation. After each reasoning step a lightweight halting head decides whether to continue or stop, conditioned on the mean-pooled high-level state.

Key components:

  • Grouped-Query Attention (GQA) with 8 query heads and 4 KV heads
  • Rotary Positional Embeddings (RoPE)
  • SwiGLU feedforward layers
  • RMSNorm (pre-norm for I/O blocks, post-norm inside HRM)
  • Learned halt/continue Q-head controlling per-input reasoning depth
  • Step penalty in the training loss encouraging efficient halting

Model Specs

Parameter Value
Total parameters 82.77M
Embedding dimension 448
Vocabulary size 50,304
Context length 512
Input transformer layers 6
Output transformer layers 6
HRM H-layers 4
HRM L-layers 4
Max HRM steps 16
Attention heads 8 (4 KV, GQA)

Evaluation

Zero-shot benchmark results:

Model HellaSwag PIQA WinoGrande
CosmicFish-HRM (82M) 26.2 58.1 50.7
GPT-2 Small (117M) 29.7 62.5 50.7
OPT-125M 30.6 62.6 52.9
Pythia-160M 29.4 62.1 52.8

At compact scale a portion of the parameter budget is allocated to the HRM reasoning infrastructure rather than raw language modeling capacity, which accounts for the gap versus fixed-depth baselines of similar size. The paper argues this tradeoff becomes more favorable as model scale increases.

Adaptive Reasoning Behavior

The primary contribution of CosmicFish-HRM is not benchmark accuracy but adaptive compute allocation. The model uses different numbers of reasoning steps depending on input complexity:

Prompt Mean HRM Steps
"The capital of France is" 2.78
"Photosynthesis is the process by which plants" 4.77
"If all roses are flowers and some flowers fade quickly..." 7.03
"A bat and a ball cost $1.10 in total..." 8.40

Average steps across benchmarks stay well below the 16-step maximum, with high variance across samples, confirming the halting mechanism is input-sensitive rather than collapsing to a fixed depth.

Benchmark Mean Steps Std Dev
HellaSwag 3.03 6.26
PIQA 1.87 5.13
WinoGrande 0.95 3.78
Overall 2.68 5.95

Usage

This model uses a custom architecture. The model code is included in this repo as modeling_hrm_cosmicfish.py.

Standalone chat script (downloads automatically):

pip install torch safetensors huggingface-hub transformers termcolor
python chat.py

Load manually:

import torch
import json
import tiktoken
from safetensors.torch import load_file
from huggingface_hub import snapshot_download
from modeling_hrm_cosmicfish import HRMCosmicFish, HRMCosmicFishConfig

cache_dir = snapshot_download("MistyozAI/CosmicFish-HRM")

with open(f"{cache_dir}/config.json") as f:
    cfg = json.load(f)

config = HRMCosmicFishConfig(
    vocab_size=cfg["vocab_size"],
    n_embd=cfg["n_embd"],
    block_size=cfg["block_size"],
    n_head=cfg["n_head"],
    n_kv_head=cfg["n_kv_head"],
    n_input_layers=cfg["n_input_layers"],
    n_output_layers=cfg["n_output_layers"],
    hrm_H_layers=cfg["hrm_H_layers"],
    hrm_L_layers=cfg["hrm_L_layers"],
    hrm_H_cycles=cfg["hrm_H_cycles"],
    hrm_L_cycles=cfg["hrm_L_cycles"],
    hrm_max_steps=cfg["hrm_max_steps"],
    dropout=0.0,
)

state_dict = load_file(f"{cache_dir}/model.safetensors")
model = HRMCosmicFish(config)
model.load_state_dict(state_dict)
model.eval()

tokenizer = tiktoken.get_encoding("gpt2")
prompt = "Artificial intelligence is"
tokens = tokenizer.encode(prompt)
idx = torch.tensor(tokens, dtype=torch.long).unsqueeze(0)

with torch.no_grad():
    output = model.generate(idx, max_new_tokens=100, temperature=0.7, top_k=40)

print(tokenizer.decode(output[0].tolist()))

Pytorch File: CF.pt

Pytorch File: Base.pt


Mistyoz AI, Hyderabad

Downloads last month
46
Safetensors
Model size
0.1B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for MistyozAI/CosmicFish-HRM