arxiv:2604.16111
Michal Valko
AI & ML interests
large language models, reasoning, fine-tuning, test-time computation, reinforcement learning with human feedback, world models
Recent Activity
authored a paper about 2 hours ago
Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier authored a paper about 2 hours ago
Best of both worlds: Stochastic & adversarial best-arm identification authored a paper about 2 hours ago
Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning