3.1 API Fundamentals
AI-generated content may contain errors. Always verify against official sources.
3.1 API Fundamentals
Key Concepts: REST calls · Authentication · Request/response anatomy
Official Docs: OpenAI API Reference · Anthropic API Reference
What is an LLM API?
An LLM API is an HTTP service that accepts your prompt as a JSON payload and returns the model’s response as JSON. You never run the model locally — you send data to the provider’s infrastructure and receive the result.
Your app ─── HTTPS POST /v1/chat/completions ──→ Provider servers
─── JSON response (choice, usage) ────← (GPU cluster)
Authentication
All major LLM APIs authenticate via a Bearer token in the HTTP Authorization header.
POST /v1/chat/completions HTTP/1.1
Host: api.openai.com
Authorization: Bearer sk-...
Content-Type: application/json
Storing API Keys Safely
# .env file (never commit this to git)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
# Load with python-dotenv
from dotenv import load_dotenv
import os
load_dotenv() # reads .env into environment
api_key = os.environ["OPENAI_API_KEY"]
Never put api_key="sk-..." in your source code. Git history is permanent — leaked keys lead to unexpected billing charges.
Raw HTTP Request (curl)
Understanding the raw HTTP format helps debug SDK issues:
curl https://api.openai.com/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is 2 + 2?"}
],
"temperature": 0,
"max_tokens": 64
}'
Response Anatomy
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1714000000,
"model": "gpt-4o-mini",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "2 + 2 = 4"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 27,
"completion_tokens": 9,
"total_tokens": 36
}
}
| Field | Meaning |
|---|---|
choices[0].message.content | The model’s reply |
choices[0].finish_reason | Why generation stopped: stop (normal), length (hit max_tokens), tool_calls |
usage.prompt_tokens | Tokens in your input |
usage.completion_tokens | Tokens in the response |
Key Request Parameters
| Parameter | Type | Effect |
|---|---|---|
model | string | Which model to use |
messages | array | Conversation history |
temperature | float 0–2 | Randomness (0 = deterministic) |
max_tokens | int | Max response length |
top_p | float 0–1 | Nucleus sampling cutoff |
stream | bool | Stream tokens as they arrive |
response_format | object | Force JSON output |
Always consult the official API reference for the full, up-to-date parameter list.
Common Mistakes
- Hardcoding API keys — use environment variables always.
- Ignoring
finish_reason— iffinish_reason == "length", your response was cut off. Increasemax_tokens. - Not reading
usage— you cannot optimise costs without knowing your token consumption. - Not setting
max_tokens— without a limit the model may generate thousands of tokens unexpectedly.
Quick Quiz
Q1. How does an LLM API authenticate your request?
A1. A Bearer token in the Authorization HTTP header.
Q2. What does finish_reason: "length" mean?
A2. The model stopped generating because it hit the max_tokens limit — the response was likely truncated.
Q3. Where in the response JSON do you find the model’s reply text?
A3. choices[0].message.content.
Q4. What parameter controls the maximum length of the generated response?
A4. max_tokens.
Student Exercise
Exercise 3.1 — Raw API call
Using curl or Python requests (not the OpenAI SDK), make a raw POST request to https://api.openai.com/v1/chat/completions. Parse the JSON response manually and print the reply text, prompt tokens, and completion tokens.
Further Reading
Next → 3.2 OpenAI & Anthropic SDKs