4.1 Why RAG?
AI-generated content may contain errors. Always verify against official sources.
4.1 Why RAG?
Key Concepts: Hallucination · Knowledge cutoff · Grounding in facts
Official Docs: LangChain RAG Tutorial · Anthropic — Contextual Retrieval
The Two Core Problems RAG Solves
1. Hallucination
LLMs are trained to produce plausible text, not true text. When asked about information not well represented in training data, they may confidently generate incorrect details.
User: What was Acme Corp's Q3 2025 revenue?
LLM (no context): "Acme Corp reported Q3 2025 revenue of $2.3 billion..." ❌ fabricated
2. Knowledge Cutoff
Every model has a training data cutoff. Events, documents, and updates after that date are unknown to the model unless you provide them in the prompt.
What is RAG?
Retrieval-Augmented Generation (RAG) grounds the model in real documents by retrieving relevant context at query time and injecting it into the prompt.
┌───────────────────────────────────────────┐
│ USER QUERY │
│ "What was Q3 2025 revenue?" │
└───────────────────────────────────────────┘
↓
┌───────────────────────────────────────────┐
│ RETRIEVER │
│ Embed query → search vector store │
│ Return top-k relevant document chunks │
└───────────────────────────────────────────┘
↓
┌───────────────────────────────────────────┐
│ LLM (context injected into prompt) │
│ "Based on the Q3 report: revenue = ..." │
└───────────────────────────────────────────┘
RAG vs Fine-Tuning
| RAG | Fine-Tuning | |
|---|---|---|
| Knowledge update | Real-time (re-index) | Requires re-training |
| Cost | Low | High |
| Citable sources | Yes | No |
| New facts | ✅ | ❌ |
| New style/format | ❌ | ✅ |
Use RAG for factual, up-to-date knowledge. Use fine-tuning to teach the model a new style or specialised reasoning pattern.
Common Mistakes
- Using RAG for style/format tasks — RAG adds knowledge, not behaviour. If you want the model to always respond formally, fine-tune instead.
- No source citation — without citing sources, users can’t verify RAG answers. Always include source metadata in responses.
- Retrieval without quality filtering — if your vector search returns irrelevant chunks, the model will hallucinate despite having “context”. Always evaluate retrieval quality separately.
- Outdated embeddings — when your source documents change, you must re-embed them. Stale embeddings cause retrievals from outdated content.
Quick Quiz
Q1. Why can’t fine-tuning replace RAG for factual knowledge?
A1. Fine-tuning does not reliably teach new facts — it may hallucinate. RAG injects exact context at inference time, grounding answers in real documents.
Q2. What is the key benefit of RAG for time-sensitive information?
A2. RAG retrieves from a knowledge base that can be updated continuously. The LLM’s knowledge cutoff doesn’t affect the accuracy of retrieved content.
Q3. What is “contextual retrieval” as described by Anthropic?
A3. Prepending a context summary to each chunk before embedding, so the chunk’s meaning is self-contained and retrieval accuracy improves.
Student Exercise
Exercise 4.1 — When to use RAG
For each scenario, decide: RAG, Fine-Tuning, or plain prompting? (1) Company HR chatbot that answers questions from internal policy documents. (2) A chatbot that always responds in legal formal English. (3) Answering questions about yesterday’s news.