view article Article Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP +3 ariG23498, ror, sergiopaniego, pcuenq, sayakpaul • 7 days ago • 42
view article Article Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler +3 ariG23498, sayakpaul, sergiopaniego, ror, pcuenq • 20 days ago • 118
Deep Researcher Agent: An Autonomous Framework for 24/7 Deep Learning Experimentation with Zero-Cost Monitoring Paper • 2604.05854 • Published Apr 7 • 1
ZClip: Adaptive Spike Mitigation for LLM Pre-Training Paper • 2504.02507 • Published Apr 3, 2025 • 90
AdaGC: Improving Training Stability for Large Language Model Pretraining Paper • 2502.11034 • Published Feb 16, 2025 • 1
Pioneer Agent: Continual Improvement of Small Language Models in Production Paper • 2604.09791 • Published Apr 10 • 13
MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization Paper • 2605.19330 • Published about 1 month ago • 9
MedLVR: Latent Visual Reasoning for Reliable Medical Visual Question Answering Paper • 2604.09757 • Published Apr 10 • 1
A^2TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping Paper • 2605.06200 • Published May 7 • 15
Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes Paper • 2605.05724 • Published May 7 • 16
HyLRA: Hybrid Layer Reuse Attention for Efficient Long-Context Inference Paper • 2602.00777 • Published Jan 31 • 1