-
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
Paper • 2511.22570 • Published • 93 -
DeepSeek-OCR: Contexts Optical Compression
Paper • 2510.18234 • Published • 93 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 447 -
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Paper • 2505.09343 • Published • 76
Collections
Discover the best community collections!
Collections including paper arxiv:2511.22570
-
Small Language Models are the Future of Agentic AI
Paper • 2506.02153 • Published • 24 -
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
Paper • 2512.02556 • Published • 265 -
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
Paper • 2511.22570 • Published • 93 -
DeepSeek-OCR: Contexts Optical Compression
Paper • 2510.18234 • Published • 93
-
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Paper • 2511.16334 • Published • 96 -
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 105 -
ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute
Paper • 2509.04475 • Published • 3 -
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper • 2512.01374 • Published • 106
-
Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning
Paper • 2510.20150 • Published • 6 -
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
Paper • 2511.06221 • Published • 134 -
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
Paper • 2508.10433 • Published • 146 -
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper • 2512.01374 • Published • 106
-
meta-llama/Llama-Guard-3-8B
Text Generation • 8B • Updated • 129k • • 290 -
Jailbroken: How Does LLM Safety Training Fail?
Paper • 2307.02483 • Published • 16 -
Constitutional AI: Harmlessness from AI Feedback
Paper • 2212.08073 • Published • 4 -
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
Paper • 2312.06674 • Published • 8
-
Scaling Latent Reasoning via Looped Language Models
Paper • 2510.25741 • Published • 229 -
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
Paper • 2511.23319 • Published • 24 -
Focused Chain-of-Thought: Efficient LLM Reasoning via Structured Input Information
Paper • 2511.22176 • Published • 5 -
FedRE: A Representation Entanglement Framework for Model-Heterogeneous Federated Learning
Paper • 2511.22265 • Published • 2
-
Scaling Agent Learning via Experience Synthesis
Paper • 2511.03773 • Published • 83 -
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Paper • 2511.21689 • Published • 126 -
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 230 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 42
-
Language Models Model Language
Paper • 2510.12766 • Published • 26 -
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 513 -
Scaling Latent Reasoning via Looped Language Models
Paper • 2510.25741 • Published • 229
-
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
Paper • 2511.22570 • Published • 93 -
DeepSeek-OCR: Contexts Optical Compression
Paper • 2510.18234 • Published • 93 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 447 -
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Paper • 2505.09343 • Published • 76
-
meta-llama/Llama-Guard-3-8B
Text Generation • 8B • Updated • 129k • • 290 -
Jailbroken: How Does LLM Safety Training Fail?
Paper • 2307.02483 • Published • 16 -
Constitutional AI: Harmlessness from AI Feedback
Paper • 2212.08073 • Published • 4 -
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
Paper • 2312.06674 • Published • 8
-
Small Language Models are the Future of Agentic AI
Paper • 2506.02153 • Published • 24 -
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
Paper • 2512.02556 • Published • 265 -
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
Paper • 2511.22570 • Published • 93 -
DeepSeek-OCR: Contexts Optical Compression
Paper • 2510.18234 • Published • 93
-
Scaling Latent Reasoning via Looped Language Models
Paper • 2510.25741 • Published • 229 -
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
Paper • 2511.23319 • Published • 24 -
Focused Chain-of-Thought: Efficient LLM Reasoning via Structured Input Information
Paper • 2511.22176 • Published • 5 -
FedRE: A Representation Entanglement Framework for Model-Heterogeneous Federated Learning
Paper • 2511.22265 • Published • 2
-
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Paper • 2511.16334 • Published • 96 -
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 105 -
ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute
Paper • 2509.04475 • Published • 3 -
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper • 2512.01374 • Published • 106
-
Scaling Agent Learning via Experience Synthesis
Paper • 2511.03773 • Published • 83 -
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Paper • 2511.21689 • Published • 126 -
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 230 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 42
-
Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning
Paper • 2510.20150 • Published • 6 -
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
Paper • 2511.06221 • Published • 134 -
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
Paper • 2508.10433 • Published • 146 -
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper • 2512.01374 • Published • 106
-
Language Models Model Language
Paper • 2510.12766 • Published • 26 -
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 513 -
Scaling Latent Reasoning via Looped Language Models
Paper • 2510.25741 • Published • 229