Michal Valko
AI & ML interests
large language models, reasoning, fine-tuning, test-time computation, reinforcement learning with human feedback, world models
Recent Activity
updated a dataset about 5 hours ago
misovalko/my-research-papers authored a paper about 8 hours ago
Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier authored a paper about 8 hours ago
Best of both worlds: Stochastic & adversarial best-arm identification