local inference, llama.cpp, vllm, quantisation, GGUF, Blackwell architecture, workstation GPU, single-GPU workflows