Skip to main content

7.4 Building a Research Agent

AI-Generated Content

AI-generated content may contain errors. Always verify against official sources.

7.4 Building a Research Agent

Key Concepts: Web search · Retrieval · Summarisation · Multi-step reasoning

Official Docs: OpenAI Agents SDK · Tavily Search API


What is a Research Agent?

A research agent autonomously:

  1. Accepts an open-ended question
  2. Decides what information to look up
  3. Calls search and retrieval tools
  4. Synthesises findings into a structured report

This is the canonical multi-step agent use case.


Setup

pip install openai-agents tavily-python python-dotenv

Get a free Tavily API key at tavily.com (built for LLM agents — returns clean, LLM-ready results).

export TAVILY_API_KEY="tvly-..."
export OPENAI_API_KEY="sk-..."

Building the Agent

import os
from agents import Agent, Runner, function_tool
from tavily import TavilyClient

tavily = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])

@function_tool
def web_search(query: str) -> str:
"""
Search the web for up-to-date information on a topic.
Returns a summary of the top search results.
"""
results = tavily.search(query=query, max_results=3)
summaries = [
f"Source: {r['url']}\nContent: {r['content']}"
for r in results.get("results", [])
]
return "\n\n".join(summaries) if summaries else "No results found."

@function_tool
def format_report(topic: str, findings: str) -> str:
"""
Format research findings into a structured Markdown report.
Call this tool as the final step after gathering all information.
"""
return f"# Research Report: {topic}\n\n{findings}"

research_agent = Agent(
name="Research Agent",
model="gpt-4o-mini",
instructions="""
You are a diligent research assistant.

For every research request:
1. Identify 2–3 specific search queries to cover the topic comprehensively
2. Use web_search for EACH query
3. Synthesise the findings — do NOT copy-paste; summarise in your own words
4. Call format_report to produce the final structured output
5. Always cite the sources you found
""",
tools=[web_search, format_report],
)

# Run the agent
result = Runner.run_sync(
research_agent,
"Research the current state of Retrieval-Augmented Generation (RAG) in 2025: "
"main approaches, limitations, and recent improvements."
)
print(result.final_output)

Adding a Validation Step

To ensure the report meets quality standards, add a validation agent:

from agents import Agent, Runner

validation_agent = Agent(
name="Report Validator",
model="gpt-4o-mini",
instructions="""
Review the research report for quality.
Check:
1. Does it answer the original question?
2. Are claims specific (not vague)?
3. Are sources cited?
4. Is it at least 200 words?

Respond with JSON: {"pass": true/false, "issues": [list of issues if any]}
""",
)

report = result.final_output

validation = Runner.run_sync(
validation_agent,
f"Original question: 'What is RAG?'\n\nReport:\n{report}"
)
print(validation.final_output)

Common Mistakes

Common Mistakes
  1. Only searching once — one search query rarely gives complete coverage. Always instruct the agent to search multiple times from different angles.
  2. Not citing sources — without source citations, a research report is not trustworthy. Instruct the agent explicitly to cite sources.
  3. Using a general web search API — general search APIs (Google) return HTML/ads. Use LLM-optimised search APIs like Tavily that return clean text.
  4. Not setting a max_turns limit — without a limit, the agent can over-research (calling search 20+ times) before deciding to write.

Quick Quiz

Test Your Understanding

Q1. Why is Tavily preferred over a generic search API for agent use?
A1. Tavily is designed for LLM agents and returns clean, concise, LLM-ready text results without HTML noise or ads.

Q2. Why should the agent make multiple search queries rather than one?
A2. A single query rarely covers a topic comprehensively. Multiple queries from different angles ensure broader, more accurate coverage.

Q3. What is the role of the validation agent in the two-agent pipeline?
A3. To independently verify that the research report meets quality standards before returning it to the user.


Student Exercise

Exercise 7.4 — Domain-specific research agent
Build a research agent specialised for academic topics. Add a tool that formats citations in APA style. Test it on "Summarise the key findings of the Attention Is All You Need paper (Vaswani et al., 2017)."


Further Reading

Next → 7.5 Guardrails & Safety