Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
15.1
TFLOPS
2
1
235
Nicholas
nickoo004
Follow
kdrnyzv890's profile picture
davron112's profile picture
qobiljon2010's profile picture
16 followers
ยท
75 following
NursultanMRX
nursultan-koshekbaev
AI & ML interests
ML and NLP , and also DL,NN
Recent Activity
reacted
to
anakin87
's
post
with โค๏ธ
1 day ago
How LLM training with RL Environments works? It all starts with ๐ฅ๐ฒ๐ถ๐ป๐ณ๐ผ๐ฟ๐ฐ๐ฒ๐บ๐ฒ๐ป๐ ๐๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด ๐๐ถ๐๐ต ๐ฉ๐ฒ๐ฟ๐ถ๐ณ๐ถ๐ฎ๐ฏ๐น๐ฒ ๐ฅ๐ฒ๐๐ฎ๐ฟ๐ฑ๐ - question asked - model generates reasoning + answer - answer checked against ground truth - reward drives RL training In this setup, the environment is simple: fixed questions and answers, rollout logic, reward(s) Consider a more complex tic-tac-toe env โโญ It adds: - dynamic game generation/handling - tunable opponent skill - multi-turn interactions (envs can also include tools) --- What happens at training? We use ๐๐ฟ๐ผ๐๐ฝ ๐ฅ๐ฒ๐น๐ฎ๐๐ถ๐๐ฒ ๐ฃ๐ผ๐น๐ถ๐ฐ๐ ๐ข๐ฝ๐๐ถ๐บ๐ถ๐๐ฎ๐๐ถ๐ผ๐ป with a tic-tac-toe env No critic model needed, the group is the baseline Simpler than PPO 1๏ธโฃ Rollout generation: from the same board, model plays N games via sampling 2๏ธโฃ Each game scored with deterministic rewards (win, format, ...) 3๏ธโฃ Mean score computed across the group 4๏ธโฃ Each rollout's advantage = its score minus the group mean 5๏ธโฃ Model updated to favor trajectories above baseline ๐ Repeat For a deep dive, check out ๐ฑ https://github.com/anakin87/llm-rl-environments-lil-course a free hands-on course on RL environments for LLMs
liked
a dataset
3 days ago
tencent/MegaStyle-1.4M
published
a dataset
6 days ago
nickoo004/kaa-parallel-corpus
View all activity
Organizations
models
3
Sort:ย Recently updated
nickoo004/karakalpak-gpt2-v3
Text Generation
โข
97M
โข
Updated
21 days ago
โข
381
โข
1
nickoo004/gemma-2b-reasoning-keras
Updated
Jan 11
โข
3
nickoo004/gpt2_karakalpak
Text Generation
โข
0.1B
โข
Updated
Jun 6, 2025
โข
6
โข
4
datasets
4
Sort:ย Recently updated
nickoo004/kaa-parallel-corpus
Viewer
โข
Updated
6 days ago
โข
14.1k
โข
33
nickoo004/gemma-reasoning-gold-15k
Viewer
โข
Updated
Jan 9
โข
27.1k
โข
19
nickoo004/FeruzaSpeech_to_fine_tuning
Viewer
โข
Updated
Sep 2, 2025
โข
13k
โข
101
โข
2
nickoo004/uzbekdata
Viewer
โข
Updated
Feb 23, 2025
โข
7.27k
โข
5
โข
3