LLM Optimization - a gaspardthrl Collection

gaspardthrl 's Collections

LLM Optimization

Retrieval-Augmented Generation

GenAI-based Time Series

LLM Optimization

updated Oct 28, 2024

A Survey on Efficient Inference for Large Language Models

Paper • 2404.14294 • Published Apr 22, 2024 • 4
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Paper • 2310.19102 • Published Oct 29, 2023 • 11
Reducing Activation Recomputation in Large Transformer Models

Paper • 2205.05198 • Published May 10, 2022