AI & ML interests

Create the best 1M-1B parameters thinking model(All on CPU)

Recent Activity

AxionLab-official  updated a Space about 1 month ago
AxionLab-Co/README
AxionLab-official  updated a model about 2 months ago
AxionLab-Co/DogeAI2.5-0.2B
AxionLab-official  published a model about 2 months ago
AxionLab-Co/DogeAI2.5-0.2B
View all activity

AxionLab-official 
posted an update 3 days ago
AxionLab-official 
posted an update 6 days ago
view post
Post
3368
# An Open Letter from SupraLabs.

Over the past few days, SupraLabs has been mentioned in a public discussion regarding small language models, scaling laws, and training methodology. We'd like to clarify our position.

Before anything else, we want to make one thing absolutely clear: we have great respect for Lane and the work being done at Glint Research. At no point was our intention to disrespect Lane, Glint Research, or their research. What began as a technical discussion about model scaling and training methodology unfortunately became much more personal than we ever intended. From our perspective, it was simply an exchange of technical opinions, and we sincerely hope it remains that way.
We'd also like to acknowledge that one of our own comments during the discussion was poorly worded. Referring to a benchmark as "fake" was imprecise. What we intended to criticize was the comparison methodology, not the integrity of the evaluation itself. Comparing a merged checkpoint against a single checkpoint is, in our view, not an apples-to-apples comparison.

That said, this was never the core of the discussion.

Our disagreement was not about SLERP, model merging, or whether training a small model on massive amounts of data is an interesting research direction. We support experimentation and unconventional ideas.

The actual point of disagreement was much simpler.

The statement that a 1M parameter model trained on 1 trillion tokens will become a "100M killer" is, today, a prediction, not an experimental result.
Could it happen? Perhaps.
Would it be exciting if it did? Absolutely.

But until benchmark results, reproducible evaluations, and independent validation exist, we believe such statements should be presented as hypotheses rather than established conclusions.
Research advances by testing ideas, not by assuming their outcomes.

We sincerely wish Lane and everyone at Glint Research success in their experiments.

Thank you to everyone who read it.
  • 1 reply
·
AxionLab-official 
posted an update 19 days ago
view post
Post
10924
THIS IS CRAZY! THE MODEL ON THE IMAGE(Supra-50M-Reasoning) answered correctly and its QUANTIZED IN 2BIT! THE RESPONSE IS CORRECT, IN A 15MB SIZE FILE!
  • 14 replies
·
AxionLab-official 
posted an update 21 days ago
AxionLab-official 
posted an update about 1 month ago
view post
Post
268
Someone ran Supra-50M-Instruct ON A 1GHZ 1999 CPU

https://www.reddit.com/r/LocalLLM/comments/1tm21ar/i_see_your_strix_halo_and_raise_you_a_vintage/

"As a fun experiment, I decided to try running the recently released Supra-50m on a 26-year-old machine I keep for retro Windows 9.X games. Although the model was somewhat silly and inconsistent, the performance wasn't bad, reaching around 1.3 tok/s with CPU inference alone.

Since this CPU doesn't have SSE2, I changed from llama.cpp to llama2.ce and asked Claude to write a custom tokenizer.

It's crazy to think that with the right file size of 200 MB, we could have experienced this magic back in 1999" - u/drone_stonks, r/localllm
AxionLab-official 
posted an update about 1 month ago
view post
Post
245
We RELEASED!

SupraLabs just released our 50M model!
Base, Instruct Weights are there, you can use!

You can check blog to more informations!(Writing blog yet!)
  • 2 replies
·
AxionLab-official 
updated a Space about 1 month ago