Skip to main content

3.1 API Fundamentals

AI-Generated Content

AI-generated content may contain errors. Always verify against official sources.

3.1 API Fundamentals

Key Concepts: REST calls · Authentication · Request/response anatomy

Official Docs: OpenAI API Reference · Anthropic API Reference


What is an LLM API?

An LLM API is an HTTP service that accepts your prompt as a JSON payload and returns the model’s response as JSON. You never run the model locally — you send data to the provider’s infrastructure and receive the result.

Your app  ─── HTTPS POST /v1/chat/completions ──→  Provider servers
─── JSON response (choice, usage) ────← (GPU cluster)

Authentication

All major LLM APIs authenticate via a Bearer token in the HTTP Authorization header.

POST /v1/chat/completions HTTP/1.1
Host: api.openai.com
Authorization: Bearer sk-...
Content-Type: application/json

Storing API Keys Safely

# .env file (never commit this to git)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
# Load with python-dotenv
from dotenv import load_dotenv
import os

load_dotenv() # reads .env into environment
api_key = os.environ["OPENAI_API_KEY"]
Never hardcode API keys

Never put api_key="sk-..." in your source code. Git history is permanent — leaked keys lead to unexpected billing charges.


Raw HTTP Request (curl)

Understanding the raw HTTP format helps debug SDK issues:

curl https://api.openai.com/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is 2 + 2?"}
],
"temperature": 0,
"max_tokens": 64
}'

Response Anatomy

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1714000000,
"model": "gpt-4o-mini",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "2 + 2 = 4"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 27,
"completion_tokens": 9,
"total_tokens": 36
}
}
FieldMeaning
choices[0].message.contentThe model’s reply
choices[0].finish_reasonWhy generation stopped: stop (normal), length (hit max_tokens), tool_calls
usage.prompt_tokensTokens in your input
usage.completion_tokensTokens in the response

Key Request Parameters

ParameterTypeEffect
modelstringWhich model to use
messagesarrayConversation history
temperaturefloat 0–2Randomness (0 = deterministic)
max_tokensintMax response length
top_pfloat 0–1Nucleus sampling cutoff
streamboolStream tokens as they arrive
response_formatobjectForce JSON output

Always consult the official API reference for the full, up-to-date parameter list.


Common Mistakes

Common Mistakes
  1. Hardcoding API keys — use environment variables always.
  2. Ignoring finish_reason — if finish_reason == "length", your response was cut off. Increase max_tokens.
  3. Not reading usage — you cannot optimise costs without knowing your token consumption.
  4. Not setting max_tokens — without a limit the model may generate thousands of tokens unexpectedly.

Quick Quiz

Test Your Understanding

Q1. How does an LLM API authenticate your request?
A1. A Bearer token in the Authorization HTTP header.

Q2. What does finish_reason: "length" mean?
A2. The model stopped generating because it hit the max_tokens limit — the response was likely truncated.

Q3. Where in the response JSON do you find the model’s reply text?
A3. choices[0].message.content.

Q4. What parameter controls the maximum length of the generated response?
A4. max_tokens.


Student Exercise

Exercise 3.1 — Raw API call
Using curl or Python requests (not the OpenAI SDK), make a raw POST request to https://api.openai.com/v1/chat/completions. Parse the JSON response manually and print the reply text, prompt tokens, and completion tokens.


Further Reading

Next → 3.2 OpenAI & Anthropic SDKs