4.4 RAG Pipeline Architecture
AI-Generated Content
AI-generated content may contain errors. Always verify against official sources.
4.4 RAG Pipeline Architecture
Key Concepts: Query → retrieve → augment → generate
Official Docs: LangChain RAG Tutorial · LlamaIndex Starter Example
The Complete RAG Flow
OFFLINE (index time)
Documents → Load → Chunk → Embed → Store in Vector DB
ONLINE (query time)
User query → Embed → Search Vector DB → Top-k chunks
→ Build prompt [system + context + query]
→ LLM generates grounded answer
→ Return answer + source citations
End-to-End with LangChain LCEL
pip install langchain langchain-openai langchain-chroma pypdf
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_chroma import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
# ---- 1. Load & Chunk ----
loader = PyPDFLoader("company_report.pdf")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=64)
chunks = splitter.split_documents(docs)
# ---- 2. Embed & Store ----
vector_store = Chroma.from_documents(
chunks,
embedding=OpenAIEmbeddings(model="text-embedding-3-small"),
persist_directory="./rag_db",
)
retriever = vector_store.as_retriever(search_kwargs={"k": 4})
# ---- 3. Build Prompt ----
RAG_PROMPT = """
You are an analyst assistant. Answer the question using ONLY the context below.
If the answer is not in the context, say "I don't know".
Context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(RAG_PROMPT)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# ---- 4. Chain (LCEL) ----
def format_docs(docs):
return "\n\n".join(
f"[Source: {d.metadata.get('source','?')}, p.{d.metadata.get('page','-')}]\n{d.page_content}"
for d in docs
)
chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
# ---- 5. Query ----
answer = chain.invoke("What was the company's Q3 revenue?")
print(answer)
Adding Source Citations
from langchain_core.runnables import RunnableParallel
chain_with_sources = RunnableParallel(
answer=chain,
sources=retriever,
).invoke("What was Q3 revenue?")
print(chain_with_sources["answer"])
for doc in chain_with_sources["sources"]:
print(f" • {doc.metadata['source']} p.{doc.metadata.get('page', '?')}")
Streaming Responses
for chunk in chain.stream("Summarise the executive highlights."):
print(chunk, end="", flush=True)
Key Takeaways
- The core RAG loop is straightforward; LangChain handles the plumbing
- Always include source metadata in chunk formatting for citations
- Use
RunnableParallelto return both answer and source documents - Use
.stream()for responsive UX
Further Reading
Next → Chapter 6: Agents