Skip to main content

2.3 Structured Output

AI-Generated Content

AI-generated content may contain errors. Always verify against official sources.

2.3 Structured Output

Key Concepts: JSON mode · XML tags · Schema enforcement · Pydantic validation

Official Docs: OpenAI Structured Outputs · Anthropic Tool Use · Pydantic Docs


Why Structured Output?

When your code needs to parse the model's response — extract a score, populate a database, call a downstream API — free-form text is fragile. Structured output guarantees a machine-readable format.


Method 1 — OpenAI Structured Outputs

OpenAI's response_format with a Pydantic model guarantees the output matches your schema via constrained decoding.

from openai import OpenAI
from pydantic import BaseModel

client = OpenAI()

class SentimentResult(BaseModel):
sentiment: str # "positive" | "negative" | "neutral"
confidence: float # 0.0 – 1.0
key_phrases: list[str]

response = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{"role": "system", "content": "Analyse the sentiment of the review."},
{"role": "user", "content": "Absolutely loved it — fast shipping, great quality!"}
],
response_format=SentimentResult,
)

result: SentimentResult = response.choices[0].message.parsed
print(result.sentiment) # positive
print(result.confidence) # 0.97

Method 2 — JSON Mode (Wider Compatibility)

import json
from pydantic import BaseModel, ValidationError

class Review(BaseModel):
sentiment: str
score: int

raw = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": 'Respond ONLY with JSON: {"sentiment": str, "score": int 1-5}'},
{"role": "user", "content": "The product broke on day one."}
],
response_format={"type": "json_object"},
).choices[0].message.content

try:
review = Review.model_validate_json(raw)
print(review)
except ValidationError as e:
print("Validation error:", e)

Method 3 — Anthropic XML Tags

import anthropic, re

client = anthropic.Anthropic()

prompt = """
Analyse the sentiment and respond using EXACTLY this format:
<sentiment>positive|negative|neutral</sentiment>
<confidence>0.0 to 1.0</confidence>

Review: 'Fantastic product, exceeded expectations!'
"""

text = client.messages.create(
model="claude-3-5-haiku-20241022",
max_tokens=128,
messages=[{"role": "user", "content": prompt}],
).content[0].text

sentiment = re.search(r"<sentiment>(.*?)</sentiment>", text).group(1)
confidence = re.search(r"<confidence>(.*?)</confidence>", text).group(1)
print(sentiment, confidence)

Key Takeaways

  • Use OpenAI Structured Outputs (client.beta.chat.completions.parse) for strictest guarantees
  • Use JSON mode + Pydantic for broader model compatibility
  • Use XML tags with Anthropic Claude
  • Always validate with Pydantic before using LLM output in production

Further Reading

Next → Chapter 4: RAG