Skip to main content

1.5 Running Your First LLM

AI-Generated Content

AI-generated content may contain errors. Always verify against official sources.

1.5 Running Your First LLM

Key Concepts: Ollama local setup · API call to OpenAI/Anthropic · Comparing outputs

Official Docs: Ollama · OpenAI Quickstart · Anthropic Quickstart


Option A — Run a Model Locally with Ollama (Free)

Ollama lets you run open-weight models locally. It exposes an OpenAI-compatible API on localhost:11434.

1. Install Ollama

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows — download from https://ollama.com/download

2. Pull and Run a Model

# Pull a small model (~2 GB)
ollama pull llama3.2

# Interactive chat in terminal
ollama run llama3.2

3. Call Ollama from Python

from openai import OpenAI

client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama", # required field, not validated locally
)

response = client.chat.completions.create(
model="llama3.2",
messages=[{"role": "user", "content": "Explain gradient descent in 3 sentences."}],
temperature=0.5,
)
print(response.choices[0].message.content)

Option B — OpenAI API

1. Get an API Key

  1. Sign up at platform.openai.com
  2. Go to API KeysCreate new secret key
  3. Set environment variable: export OPENAI_API_KEY="sk-..."

2. Install and Call

pip install openai
from openai import OpenAI

client = OpenAI() # reads OPENAI_API_KEY from environment

response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the transformer architecture?"}
],
)
print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")

Option C — Anthropic Claude API

pip install anthropic
export ANTHROPIC_API_KEY="sk-ant-..."
import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
model="claude-3-5-haiku-20241022",
max_tokens=256,
messages=[
{"role": "user", "content": "What is the transformer architecture?"}
],
)
print(message.content[0].text)

Key Takeaways

  • Ollama is the fastest way to run models locally for free
  • OpenAI and Anthropic SDKs have a nearly identical structure
  • Always check usage.total_tokens to monitor costs
  • Start with smaller, cheaper models (gpt-4o-mini, claude-3-5-haiku) before moving up

Common Mistakes

Common Mistakes
  1. Hardcoding API keys — never put api_key="sk-..." directly in your source code. Always read from environment variables or a secrets manager.
  2. Not handling API errors — network issues and rate limits are common. Always wrap API calls in try/except.
  3. Forgetting the system message — without a system prompt the model defaults to a generic assistant persona. Always set context explicitly.
  4. Not checking usage in the response — ignoring token usage makes it impossible to monitor costs or debug token limit errors.

Quick Quiz

Test Your Understanding

Q1. What port does Ollama expose its OpenAI-compatible API on by default?
A1. Port 11434 — base URL http://localhost:11434/v1.

Q2. Why can you use the openai Python SDK to call Ollama?
A2. Ollama implements the OpenAI API format (same /v1/chat/completions endpoint), so the SDK works with a custom base_url.

Q3. Where should you store your OPENAI_API_KEY?
A3. In an environment variable (e.g., export OPENAI_API_KEY="..." or a .env file loaded with python-dotenv). Never hardcode it in source files.

Q4. Which model should a student use to minimise API costs while learning?
A4. gpt-4o-mini (OpenAI) or claude-3-5-haiku (Anthropic) — significantly cheaper than the flagship models.


Student Exercise

Exercise 1.8 — Your first LLM call
Set up Ollama locally and run llama3.2. Send it a prompt asking it to explain gradient descent in 3 sentences. Then send the same prompt to gpt-4o-mini via the OpenAI API. Compare the two responses.

Exercise 1.9 — Token awareness
Modify the OpenAI example above to print: the number of prompt tokens, completion tokens, and the estimated cost assuming $0.15/1M input and $0.60/1M output tokens (gpt-4o-mini pricing). Verify your calculation against OpenAI pricing.


Further Reading

Next Chapter → Chapter 2: Prompt Engineering