·
AI & ML interests
RL, Planning
Organizations
Text Generation
• 8B • Updated • 1
movefast/Qwen2.5-1.5B-Open-R1-GRPO
2B • Updated • 46
movefast/qwen3_8b_orm_step_20
8B • Updated movefast/qwen3_8b_orm_step_35
8B • Updated movefast/OpenR1-Distill-7B
Text Generation
• 8B • Updated • 2
movefast/Qwen2.5-7B-mult-task-sft-v2-2.5e-6
8B • Updated movefast/Qwen2.5-7B-mult-task-sft-v2-5e-6
8B • Updated movefast/Qwen2.5-7B-mult-task-sft-v2-1e-5
Text Generation
• 8B • Updated • 2
movefast/Qwen2.5-7B-mult-task-sft-v1-1e-5
Text Generation
• 2B • Updated • 8
movefast/Qwen2.5-7B-mult-task-sft-v1
Text Generation
• 2B • Updated • 2
movefast/Qwen2.5-7B-Instruct-GRPO-Distill
Updated
movefast/Qwen2.5-7B-Open-R1-Distill
Text Generation
• 8B • Updated • 4
movefast/Qwen2.5-7B-Instruct-GRPO-Multi-Task
Updated
movefast/Qwen-2.5-7B-Simple-RL
Text Generation
• 8B • Updated • 1
movefast/Qwen2.5-7B-Instruct-GRPO
Text Generation
• 8B • Updated • 2
movefast/Qwen2.5-7B-Open-R1-GRPO-BLEU
Updated
movefast/Qwen2.5-7B-Open-R1-GRPO-Combined-v2-1e-6
Text Generation
• Updated • 1
movefast/Qwen2.5-7B-Open-R1-GRPO-Rule-Based
Updated
movefast/DeepSeek-R1-Distill-Qwen-1.5B-GRPO
Updated
movefast/Qwen2.5-1.5B-Open-R1-Distill
Updated
movefast/Qwen2.5-7B-Open-R1-GRPO-Combined-1e06
Updated
movefast/Qwen2.5-7B-Open-R1-GRPO-Combined
movefast/Qwen2.5-7B-Open-R1-GRPO
Text Generation
• Updated • 125
movefast/peft_last_checkpoint_full_model
Updated
movefast/gpt2_peft_stack-exchange-paired_rmts__100000_2e-05_peft_last_checkpoint
Updated
movefast/rlbench-finetune-0601
Updated