4.4 RAG Pipeline Architecture

AI-Generated Content

AI-generated content may contain errors. Always verify against official sources.

4.4 RAG Pipeline Architecture

Key Concepts: Query → retrieve → augment → generate

Official Docs: LangChain RAG Tutorial · LlamaIndex Starter Example

The Complete RAG Flow

OFFLINE (index time)
  Documents → Load → Chunk → Embed → Store in Vector DB

ONLINE (query time)
  User query → Embed → Search Vector DB → Top-k chunks
             → Build prompt [system + context + query]
             → LLM generates grounded answer
             → Return answer + source citations

End-to-End with LangChain LCEL

pip install langchain langchain-openai langchain-chroma pypdf

from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_chroma import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# ---- 1. Load & Chunk ----
loader  = PyPDFLoader("company_report.pdf")
docs    = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=64)
chunks  = splitter.split_documents(docs)

# ---- 2. Embed & Store ----
vector_store = Chroma.from_documents(
    chunks,
    embedding=OpenAIEmbeddings(model="text-embedding-3-small"),
    persist_directory="./rag_db",
)
retriever = vector_store.as_retriever(search_kwargs={"k": 4})

# ---- 3. Build Prompt ----
RAG_PROMPT = """
You are an analyst assistant. Answer the question using ONLY the context below.
If the answer is not in the context, say "I don't know".

Context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(RAG_PROMPT)
llm    = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# ---- 4. Chain (LCEL) ----
def format_docs(docs):
    return "\n\n".join(
        f"[Source: {d.metadata.get('source','?')}, p.{d.metadata.get('page','-')}]\n{d.page_content}"
        for d in docs
    )

chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# ---- 5. Query ----
answer = chain.invoke("What was the company's Q3 revenue?")
print(answer)

Adding Source Citations

from langchain_core.runnables import RunnableParallel

chain_with_sources = RunnableParallel(
    answer=chain,
    sources=retriever,
).invoke("What was Q3 revenue?")

print(chain_with_sources["answer"])
for doc in chain_with_sources["sources"]:
    print(f"  • {doc.metadata['source']} p.{doc.metadata.get('page', '?')}")

Streaming Responses

for chunk in chain.stream("Summarise the executive highlights."):
    print(chunk, end="", flush=True)

Key Takeaways

The core RAG loop is straightforward; LangChain handles the plumbing
Always include source metadata in chunk formatting for citations
Use RunnableParallel to return both answer and source documents
Use .stream() for responsive UX

4.4 RAG Pipeline Architecture

The Complete RAG Flow​

End-to-End with LangChain LCEL​

Adding Source Citations​

Streaming Responses​

Key Takeaways​

Further Reading​

The Complete RAG Flow

End-to-End with LangChain LCEL

Adding Source Citations

Streaming Responses

Key Takeaways

Further Reading