7.4 Building a Research Agent

AI-Generated Content

AI-generated content may contain errors. Always verify against official sources.

7.4 Building a Research Agent

Key Concepts: Web search · Retrieval · Summarisation · Multi-step reasoning

Official Docs: OpenAI Agents SDK · Tavily Search API

What is a Research Agent?

A research agent autonomously:

Accepts an open-ended question
Decides what information to look up
Calls search and retrieval tools
Synthesises findings into a structured report

This is the canonical multi-step agent use case.

Setup

pip install openai-agents tavily-python python-dotenv

Get a free Tavily API key at tavily.com (built for LLM agents — returns clean, LLM-ready results).

export TAVILY_API_KEY="tvly-..."
export OPENAI_API_KEY="sk-..."

Building the Agent

import os
from agents import Agent, Runner, function_tool
from tavily import TavilyClient

tavily = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])

@function_tool
def web_search(query: str) -> str:
    """
    Search the web for up-to-date information on a topic.
    Returns a summary of the top search results.
    """
    results = tavily.search(query=query, max_results=3)
    summaries = [
        f"Source: {r['url']}\nContent: {r['content']}"
        for r in results.get("results", [])
    ]
    return "\n\n".join(summaries) if summaries else "No results found."

@function_tool
def format_report(topic: str, findings: str) -> str:
    """
    Format research findings into a structured Markdown report.
    Call this tool as the final step after gathering all information.
    """
    return f"# Research Report: {topic}\n\n{findings}"

research_agent = Agent(
    name="Research Agent",
    model="gpt-4o-mini",
    instructions="""
    You are a diligent research assistant.
    
    For every research request:
    1. Identify 2–3 specific search queries to cover the topic comprehensively
    2. Use web_search for EACH query
    3. Synthesise the findings — do NOT copy-paste; summarise in your own words
    4. Call format_report to produce the final structured output
    5. Always cite the sources you found
    """,
    tools=[web_search, format_report],
)

# Run the agent
result = Runner.run_sync(
    research_agent,
    "Research the current state of Retrieval-Augmented Generation (RAG) in 2025: "
    "main approaches, limitations, and recent improvements."
)
print(result.final_output)

Adding a Validation Step

To ensure the report meets quality standards, add a validation agent:

from agents import Agent, Runner

validation_agent = Agent(
    name="Report Validator",
    model="gpt-4o-mini",
    instructions="""
    Review the research report for quality.
    Check:
    1. Does it answer the original question?
    2. Are claims specific (not vague)?
    3. Are sources cited?
    4. Is it at least 200 words?
    
    Respond with JSON: {"pass": true/false, "issues": [list of issues if any]}
    """,
)

report = result.final_output

validation = Runner.run_sync(
    validation_agent,
    f"Original question: 'What is RAG?'\n\nReport:\n{report}"
)
print(validation.final_output)

Common Mistakes

Only searching once — one search query rarely gives complete coverage. Always instruct the agent to search multiple times from different angles.
Not citing sources — without source citations, a research report is not trustworthy. Instruct the agent explicitly to cite sources.
Using a general web search API — general search APIs (Google) return HTML/ads. Use LLM-optimised search APIs like Tavily that return clean text.
Not setting a max_turns limit — without a limit, the agent can over-research (calling search 20+ times) before deciding to write.

Quick Quiz

Test Your Understanding

Q1. Why is Tavily preferred over a generic search API for agent use?
A1. Tavily is designed for LLM agents and returns clean, concise, LLM-ready text results without HTML noise or ads.

Q2. Why should the agent make multiple search queries rather than one?
A2. A single query rarely covers a topic comprehensively. Multiple queries from different angles ensure broader, more accurate coverage.

Q3. What is the role of the validation agent in the two-agent pipeline?
A3. To independently verify that the research report meets quality standards before returning it to the user.

Student Exercise

Exercise 7.4 — Domain-specific research agent
Build a research agent specialised for academic topics. Add a tool that formats citations in APA style. Test it on "Summarise the key findings of the Attention Is All You Need paper (Vaswani et al., 2017)."

7.4 Building a Research Agent

What is a Research Agent?​

Setup​

Building the Agent​

Adding a Validation Step​

Common Mistakes​

Quick Quiz​

Student Exercise​

Further Reading​

What is a Research Agent?

Setup

Building the Agent

Adding a Validation Step

Common Mistakes

Quick Quiz

Student Exercise

Further Reading