1.5 Running Your First LLM
AI-generated content may contain errors. Always verify against official sources.
1.5 Running Your First LLM
Key Concepts: Ollama local setup · API call to OpenAI/Anthropic · Comparing outputs
Official Docs: Ollama · OpenAI Quickstart · Anthropic Quickstart
Option A — Run a Model Locally with Ollama (Free)
Ollama lets you run open-weight models locally. It exposes an OpenAI-compatible API on localhost:11434.
1. Install Ollama
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows — download from https://ollama.com/download
2. Pull and Run a Model
# Pull a small model (~2 GB)
ollama pull llama3.2
# Interactive chat in terminal
ollama run llama3.2
3. Call Ollama from Python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama", # required field, not validated locally
)
response = client.chat.completions.create(
model="llama3.2",
messages=[{"role": "user", "content": "Explain gradient descent in 3 sentences."}],
temperature=0.5,
)
print(response.choices[0].message.content)
Option B — OpenAI API
1. Get an API Key
- Sign up at platform.openai.com
- Go to API Keys → Create new secret key
- Set environment variable:
export OPENAI_API_KEY="sk-..."
2. Install and Call
pip install openai
from openai import OpenAI
client = OpenAI() # reads OPENAI_API_KEY from environment
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the transformer architecture?"}
],
)
print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")
Option C — Anthropic Claude API
pip install anthropic
export ANTHROPIC_API_KEY="sk-ant-..."
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-3-5-haiku-20241022",
max_tokens=256,
messages=[
{"role": "user", "content": "What is the transformer architecture?"}
],
)
print(message.content[0].text)
Key Takeaways
- Ollama is the fastest way to run models locally for free
- OpenAI and Anthropic SDKs have a nearly identical structure
- Always check
usage.total_tokensto monitor costs - Start with smaller, cheaper models (
gpt-4o-mini,claude-3-5-haiku) before moving up
Common Mistakes
- Hardcoding API keys — never put
api_key="sk-..."directly in your source code. Always read from environment variables or a secrets manager. - Not handling API errors — network issues and rate limits are common. Always wrap API calls in try/except.
- Forgetting the system message — without a system prompt the model defaults to a generic assistant persona. Always set context explicitly.
- Not checking
usagein the response — ignoring token usage makes it impossible to monitor costs or debug token limit errors.
Quick Quiz
Q1. What port does Ollama expose its OpenAI-compatible API on by default?
A1. Port 11434 — base URL http://localhost:11434/v1.
Q2. Why can you use the openai Python SDK to call Ollama?
A2. Ollama implements the OpenAI API format (same /v1/chat/completions endpoint), so the SDK works with a custom base_url.
Q3. Where should you store your OPENAI_API_KEY?
A3. In an environment variable (e.g., export OPENAI_API_KEY="..." or a .env file loaded with python-dotenv). Never hardcode it in source files.
Q4. Which model should a student use to minimise API costs while learning?
A4. gpt-4o-mini (OpenAI) or claude-3-5-haiku (Anthropic) — significantly cheaper than the flagship models.
Student Exercise
Exercise 1.8 — Your first LLM call
Set up Ollama locally and run llama3.2. Send it a prompt asking it to explain gradient descent in 3 sentences. Then send the same prompt to gpt-4o-mini via the OpenAI API. Compare the two responses.
Exercise 1.9 — Token awareness
Modify the OpenAI example above to print: the number of prompt tokens, completion tokens, and the estimated cost assuming $0.15/1M input and $0.60/1M output tokens (gpt-4o-mini pricing). Verify your calculation against OpenAI pricing.
Further Reading
- 📘 Ollama Model Library
- 📘 OpenAI Quickstart
- 📘 Anthropic Quickstart
- 📘 OpenAI API Pricing
- 📘 python-dotenv — manage .env files
Next Chapter → Chapter 2: Prompt Engineering