Skip to main content

4.4 RAG Pipeline Architecture

AI-Generated Content

AI-generated content may contain errors. Always verify against official sources.

4.4 RAG Pipeline Architecture

Key Concepts: Query → retrieve → augment → generate

Official Docs: LangChain RAG Tutorial · LlamaIndex Starter Example


The Complete RAG Flow

OFFLINE (index time)
Documents → Load → Chunk → Embed → Store in Vector DB

ONLINE (query time)
User query → Embed → Search Vector DB → Top-k chunks
→ Build prompt [system + context + query]
→ LLM generates grounded answer
→ Return answer + source citations

End-to-End with LangChain LCEL

pip install langchain langchain-openai langchain-chroma pypdf
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_chroma import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# ---- 1. Load & Chunk ----
loader = PyPDFLoader("company_report.pdf")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=64)
chunks = splitter.split_documents(docs)

# ---- 2. Embed & Store ----
vector_store = Chroma.from_documents(
chunks,
embedding=OpenAIEmbeddings(model="text-embedding-3-small"),
persist_directory="./rag_db",
)
retriever = vector_store.as_retriever(search_kwargs={"k": 4})

# ---- 3. Build Prompt ----
RAG_PROMPT = """
You are an analyst assistant. Answer the question using ONLY the context below.
If the answer is not in the context, say "I don't know".

Context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(RAG_PROMPT)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# ---- 4. Chain (LCEL) ----
def format_docs(docs):
return "\n\n".join(
f"[Source: {d.metadata.get('source','?')}, p.{d.metadata.get('page','-')}]\n{d.page_content}"
for d in docs
)

chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)

# ---- 5. Query ----
answer = chain.invoke("What was the company's Q3 revenue?")
print(answer)

Adding Source Citations

from langchain_core.runnables import RunnableParallel

chain_with_sources = RunnableParallel(
answer=chain,
sources=retriever,
).invoke("What was Q3 revenue?")

print(chain_with_sources["answer"])
for doc in chain_with_sources["sources"]:
print(f" • {doc.metadata['source']} p.{doc.metadata.get('page', '?')}")

Streaming Responses

for chunk in chain.stream("Summarise the executive highlights."):
print(chunk, end="", flush=True)

Key Takeaways

  • The core RAG loop is straightforward; LangChain handles the plumbing
  • Always include source metadata in chunk formatting for citations
  • Use RunnableParallel to return both answer and source documents
  • Use .stream() for responsive UX

Further Reading

Next → Chapter 6: Agents