amewebstudio
/

sparseflow-chat-v8

sparse-attention

Model card Files Files and versions

SparseFlow v8

Efficient language model with sparse attention and persistent memory.

📊 REAL Measured Metrics

Metric	Value
Parameters	71,359,746
Perplexity	14.77
Attention Sparsity	87.5%
Channel Sparsity	75.0%
Peak Memory	3.67 GB

🏗️ Architecture

Sparse Token Attention: Attends to top-64 tokens per position
Sparse Channel FFN: Activates top-128 channels
Persistent Memory: 20,000 memory vectors
8 Transformer layers with 512 dim

📚 Training Data

Open source datasets only:

GSM8K, MATH (mathematics)
ARC, OpenBookQA, SciQ (science & reasoning)
CommonsenseQA, PIQA (common sense)
TriviaQA, Natural Questions (factual)
WikiText-103 (language modeling)

👨‍💻 Author

Logo (Mike Amega) — Ame Web Studio

Downloads last month: 3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train amewebstudio/sparseflow-chat-v8