10.1 Self-Hosting LLMs
Ollama, vLLM, TGI, hardware sizing, and GPU vs CPU inference for self-hosted LLMs.
Ollama, vLLM, TGI, hardware sizing, and GPU vs CPU inference for self-hosted LLMs.
Ollama, vLLM, hardware requirements, and quantization (Q4/Q8) for self-hosting.