Skip to main content

4.1 Why RAG?

AI-Generated Content

AI-generated content may contain errors. Always verify against official sources.

4.1 Why RAG?

Key Concepts: Hallucination · Knowledge cutoff · Grounding in facts

Official Docs: LangChain RAG Tutorial · Anthropic — Contextual Retrieval


The Two Core Problems RAG Solves

1. Hallucination

LLMs are trained to produce plausible text, not true text. When asked about information not well represented in training data, they may confidently generate incorrect details.

User: What was Acme Corp's Q3 2025 revenue?
LLM (no context): "Acme Corp reported Q3 2025 revenue of $2.3 billion..." ❌ fabricated

2. Knowledge Cutoff

Every model has a training data cutoff. Events, documents, and updates after that date are unknown to the model unless you provide them in the prompt.


What is RAG?

Retrieval-Augmented Generation (RAG) grounds the model in real documents by retrieving relevant context at query time and injecting it into the prompt.

┌───────────────────────────────────────────┐
│ USER QUERY │
│ "What was Q3 2025 revenue?" │
└───────────────────────────────────────────┘

┌───────────────────────────────────────────┐
│ RETRIEVER │
│ Embed query → search vector store │
│ Return top-k relevant document chunks │
└───────────────────────────────────────────┘

┌───────────────────────────────────────────┐
│ LLM (context injected into prompt) │
│ "Based on the Q3 report: revenue = ..." │
└───────────────────────────────────────────┘

RAG vs Fine-Tuning

RAGFine-Tuning
Knowledge updateReal-time (re-index)Requires re-training
CostLowHigh
Citable sourcesYesNo
New facts
New style/format
Rule of Thumb

Use RAG for factual, up-to-date knowledge. Use fine-tuning to teach the model a new style or specialised reasoning pattern.


Common Mistakes

Common Mistakes
  1. Using RAG for style/format tasks — RAG adds knowledge, not behaviour. If you want the model to always respond formally, fine-tune instead.
  2. No source citation — without citing sources, users can’t verify RAG answers. Always include source metadata in responses.
  3. Retrieval without quality filtering — if your vector search returns irrelevant chunks, the model will hallucinate despite having “context”. Always evaluate retrieval quality separately.
  4. Outdated embeddings — when your source documents change, you must re-embed them. Stale embeddings cause retrievals from outdated content.

Quick Quiz

Test Your Understanding

Q1. Why can’t fine-tuning replace RAG for factual knowledge?
A1. Fine-tuning does not reliably teach new facts — it may hallucinate. RAG injects exact context at inference time, grounding answers in real documents.

Q2. What is the key benefit of RAG for time-sensitive information?
A2. RAG retrieves from a knowledge base that can be updated continuously. The LLM’s knowledge cutoff doesn’t affect the accuracy of retrieved content.

Q3. What is “contextual retrieval” as described by Anthropic?
A3. Prepending a context summary to each chunk before embedding, so the chunk’s meaning is self-contained and retrieval accuracy improves.


Student Exercise

Exercise 4.1 — When to use RAG
For each scenario, decide: RAG, Fine-Tuning, or plain prompting? (1) Company HR chatbot that answers questions from internal policy documents. (2) A chatbot that always responds in legal formal English. (3) Answering questions about yesterday’s news.


Further Reading

Next → 4.2 Document Loading & Chunking