Skip to main content

1.5 Running Your First LLM

AI-Generated Content

AI-generated content may contain errors. Always verify against official sources.

1.5 Running Your First LLM

Key Concepts: Ollama local setup · API call to OpenAI/Anthropic · Comparing outputs

Official Docs: Ollama · OpenAI Quickstart · Anthropic Quickstart


Option A — Run a Model Locally with Ollama (Free)

Ollama lets you run open-weight models locally. It exposes an OpenAI-compatible API on localhost:11434.

1. Install Ollama

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows — download from https://ollama.com/download

2. Pull and Run a Model

# Pull a small model (~2 GB)
ollama pull llama3.2

# Interactive chat in terminal
ollama run llama3.2

3. Call Ollama from Python

from openai import OpenAI

client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama", # required field, not validated locally
)

response = client.chat.completions.create(
model="llama3.2",
messages=[{"role": "user", "content": "Explain gradient descent in 3 sentences."}],
temperature=0.5,
)
print(response.choices[0].message.content)

Option B — OpenAI API

1. Get an API Key

  1. Sign up at platform.openai.com
  2. Go to API KeysCreate new secret key
  3. Set environment variable: export OPENAI_API_KEY="sk-..."

2. Install and Call

pip install openai
from openai import OpenAI

client = OpenAI() # reads OPENAI_API_KEY from environment

response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the transformer architecture?"}
],
)
print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")

Option C — Anthropic Claude API

pip install anthropic
export ANTHROPIC_API_KEY="sk-ant-..."
import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
model="claude-3-5-haiku-20241022",
max_tokens=256,
messages=[
{"role": "user", "content": "What is the transformer architecture?"}
],
)
print(message.content[0].text)

Key Takeaways

  • Ollama is the fastest way to run models locally for free
  • OpenAI and Anthropic SDKs have a nearly identical structure
  • Always check usage.total_tokens to monitor costs
  • Start with smaller, cheaper models (gpt-4o-mini, claude-3-5-haiku) before moving up

Further Reading

Next Chapter → Chapter 2: Prompt Engineering