7.4 Building a Research Agent
AI-generated content may contain errors. Always verify against official sources.
7.4 Building a Research Agent
Key Concepts: Web search · Retrieval · Summarisation · Multi-step reasoning
Official Docs: OpenAI Agents SDK · Tavily Search API
What is a Research Agent?
A research agent autonomously:
- Accepts an open-ended question
- Decides what information to look up
- Calls search and retrieval tools
- Synthesises findings into a structured report
This is the canonical multi-step agent use case.
Setup
pip install openai-agents tavily-python python-dotenv
Get a free Tavily API key at tavily.com (built for LLM agents — returns clean, LLM-ready results).
export TAVILY_API_KEY="tvly-..."
export OPENAI_API_KEY="sk-..."
Building the Agent
import os
from agents import Agent, Runner, function_tool
from tavily import TavilyClient
tavily = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])
@function_tool
def web_search(query: str) -> str:
"""
Search the web for up-to-date information on a topic.
Returns a summary of the top search results.
"""
results = tavily.search(query=query, max_results=3)
summaries = [
f"Source: {r['url']}\nContent: {r['content']}"
for r in results.get("results", [])
]
return "\n\n".join(summaries) if summaries else "No results found."
@function_tool
def format_report(topic: str, findings: str) -> str:
"""
Format research findings into a structured Markdown report.
Call this tool as the final step after gathering all information.
"""
return f"# Research Report: {topic}\n\n{findings}"
research_agent = Agent(
name="Research Agent",
model="gpt-4o-mini",
instructions="""
You are a diligent research assistant.
For every research request:
1. Identify 2–3 specific search queries to cover the topic comprehensively
2. Use web_search for EACH query
3. Synthesise the findings — do NOT copy-paste; summarise in your own words
4. Call format_report to produce the final structured output
5. Always cite the sources you found
""",
tools=[web_search, format_report],
)
# Run the agent
result = Runner.run_sync(
research_agent,
"Research the current state of Retrieval-Augmented Generation (RAG) in 2025: "
"main approaches, limitations, and recent improvements."
)
print(result.final_output)
Adding a Validation Step
To ensure the report meets quality standards, add a validation agent:
from agents import Agent, Runner
validation_agent = Agent(
name="Report Validator",
model="gpt-4o-mini",
instructions="""
Review the research report for quality.
Check:
1. Does it answer the original question?
2. Are claims specific (not vague)?
3. Are sources cited?
4. Is it at least 200 words?
Respond with JSON: {"pass": true/false, "issues": [list of issues if any]}
""",
)
report = result.final_output
validation = Runner.run_sync(
validation_agent,
f"Original question: 'What is RAG?'\n\nReport:\n{report}"
)
print(validation.final_output)
Common Mistakes
- Only searching once — one search query rarely gives complete coverage. Always instruct the agent to search multiple times from different angles.
- Not citing sources — without source citations, a research report is not trustworthy. Instruct the agent explicitly to cite sources.
- Using a general web search API — general search APIs (Google) return HTML/ads. Use LLM-optimised search APIs like Tavily that return clean text.
- Not setting a max_turns limit — without a limit, the agent can over-research (calling search 20+ times) before deciding to write.
Quick Quiz
Q1. Why is Tavily preferred over a generic search API for agent use?
A1. Tavily is designed for LLM agents and returns clean, concise, LLM-ready text results without HTML noise or ads.
Q2. Why should the agent make multiple search queries rather than one?
A2. A single query rarely covers a topic comprehensively. Multiple queries from different angles ensure broader, more accurate coverage.
Q3. What is the role of the validation agent in the two-agent pipeline?
A3. To independently verify that the research report meets quality standards before returning it to the user.
Student Exercise
Exercise 7.4 — Domain-specific research agent
Build a research agent specialised for academic topics. Add a tool that formats citations in APA style. Test it on "Summarise the key findings of the Attention Is All You Need paper (Vaswani et al., 2017)."
Further Reading
Next → 7.5 Guardrails & Safety