3.5 Error Handling & Retries

AI-Generated Content

AI-generated content may contain errors. Always verify against official sources.

3.5 Error Handling & Retries

Key Concepts: Rate limits · Timeouts · Exponential backoff · Fallback models

Official Docs: OpenAI Error Codes · OpenAI Rate Limits

Common API Errors

Error	HTTP Code	Cause	Fix
`RateLimitError`	429	Too many requests per minute/day	Retry with exponential backoff
`AuthenticationError`	401	Invalid or missing API key	Check `OPENAI_API_KEY` env var
`BadRequestError`	400	Invalid request (malformed JSON, bad params)	Fix request structure
`NotFoundError`	404	Model doesn’t exist or wrong endpoint	Check model name
`InternalServerError`	500	Provider-side outage	Retry with backoff
`APITimeoutError`	—	Request took too long	Increase timeout or retry
`ContextWindowExceededError`	400	Prompt + response exceeds context limit	Reduce prompt length

Automatic Retries with the OpenAI SDK

The OpenAI SDK has built-in retry logic:

from openai import OpenAI

# Automatically retries rate-limit and server errors up to 3 times
client = OpenAI(max_retries=3)

Manual Exponential Backoff

For more control, implement your own retry decorator:

import time
import random
from openai import OpenAI, RateLimitError, InternalServerError

client = OpenAI(max_retries=0)   # disable auto-retries to use our own

def with_backoff(fn, *args, max_retries: int = 5, base_delay: float = 1.0, **kwargs):
    """Call fn with exponential backoff on rate-limit and server errors."""
    for attempt in range(max_retries):
        try:
            return fn(*args, **kwargs)
        except (RateLimitError, InternalServerError) as e:
            if attempt == max_retries - 1:
                raise
            delay = base_delay * (2 ** attempt) + random.uniform(0, 0.5)
            print(f"Attempt {attempt + 1} failed ({type(e).__name__}). Retrying in {delay:.1f}s...")
            time.sleep(delay)

response = with_backoff(
    client.chat.completions.create,
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What is the speed of light?"}],
)
print(response.choices[0].message.content)

Using `tenacity` for Retry Logic

tenacity is a clean, production-grade retry library:

pip install tenacity

from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
from openai import OpenAI, RateLimitError

client = OpenAI()

@retry(
    retry=retry_if_exception_type(RateLimitError),
    wait=wait_exponential(multiplier=1, min=2, max=60),
    stop=stop_after_attempt(6),
)
def chat(messages: list[dict]) -> str:
    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
    )
    return resp.choices[0].message.content

print(chat([{"role": "user", "content": "Hello!"}]))

Fallback Models

Degrade gracefully to a cheaper/available model if the primary model is unavailable:

from openai import OpenAI, RateLimitError, InternalServerError

client = OpenAI()
MODEL_FALLBACKS = ["gpt-4o", "gpt-4o-mini", "gpt-3.5-turbo"]

def chat_with_fallback(messages: list[dict]) -> str:
    for model in MODEL_FALLBACKS:
        try:
            resp = client.chat.completions.create(
                model=model,
                messages=messages,
                timeout=30,
            )
            return resp.choices[0].message.content
        except (RateLimitError, InternalServerError) as e:
            print(f"{model} unavailable: {e}. Trying fallback...")
    raise RuntimeError("All models failed.")

Setting Timeouts

from openai import OpenAI

client = OpenAI(
    timeout=30.0,      # 30 seconds total
    max_retries=2,
)

Common Mistakes

Infinite retry loops — always set a maximum retry count. Infinite loops burn API credits and can cascade into larger outages.
Retrying BadRequestError (400) — 400 errors indicate a malformed request. Retrying won’t fix them; fix the request instead.
No jitter in backoff — without random jitter, all clients retry simultaneously after an outage, creating a “thundering herd” that re-triggers the rate limit.
Not logging failures — always log the error type, attempt number, and delay so you can debug production issues.

Quick Quiz

Test Your Understanding

Q1. What does HTTP 429 mean in the context of LLM APIs?
A1. Rate limit exceeded — too many requests per minute (or tokens per minute) for your tier.

Q2. Why should exponential backoff include random jitter?
A2. Without jitter, all clients retry at the same time after a rate-limit window resets, causing another burst that triggers the rate limit again.

Q3. Should you retry a BadRequestError (400)?
A3. No — a 400 error means your request is malformed. Retrying without changing the request will always fail.

Q4. What is the purpose of a fallback model strategy?
A4. To degrade gracefully to a cheaper or different model when the primary model is unavailable, ensuring service continuity.

Student Exercise

Exercise 3.5 — Resilient API client
Build a ResilientClient class that wraps the OpenAI SDK. It should: retry on rate-limit errors (max 5 attempts, exponential backoff with jitter), fall back to gpt-4o-mini if gpt-4o fails, log each retry attempt, and raise after all retries are exhausted.

3.5 Error Handling & Retries

Common API Errors​

Automatic Retries with the OpenAI SDK​

Manual Exponential Backoff​

Using tenacity for Retry Logic​

Fallback Models​

Setting Timeouts​

Common Mistakes​

Quick Quiz​

Student Exercise​

Further Reading​

Common API Errors

Automatic Retries with the OpenAI SDK

Manual Exponential Backoff

Using `tenacity` for Retry Logic

Fallback Models

Setting Timeouts

Common Mistakes

Quick Quiz

Student Exercise

Further Reading