Skip to main content

1.4 Model Landscape

AI-Generated Content

AI-generated content may contain errors. Always verify against official sources.

1.4 Model Landscape

Key Concepts: Closed models · Open-weight models · Size trade-offs · Choosing the right model

Official Sources: OpenAI Models · Anthropic Models · Hugging Face Model Hub


Closed / Proprietary Models

These are hosted by providers — you access them via API. You cannot run them locally or inspect their weights.

Model FamilyProviderStrengths
GPT-4o / o3OpenAIGeneral reasoning, multimodal, tool use
Claude 3.x / 3.5AnthropicLong-context, coding, safety-focused
Gemini 1.5 / 2.0GoogleVery large context windows, multimodal

Always check the provider's official docs for the latest available models and context windows — these change frequently.


Open-Weight Models

These models have publicly available weights. You can run them locally, fine-tune them, and deploy them on your own infrastructure.

Model FamilyProviderLicense
LLaMA 3.xMetaMeta Community License
Mistral / MixtralMistral AIApache 2.0
DeepSeekDeepSeekMIT
Qwen 2.5AlibabaApache 2.0
Phi-3 / Phi-4MicrosoftMIT
Gemma 2GoogleGemma Terms

Closed vs Open — How to Choose

┌──────────────────────────────────────────────┐
│ Use Closed Models when: │
│ • Highest quality output needed │
│ • Fast time-to-market matters │
│ • Multimodal input required │
├──────────────────────────────────────────────┤
│ Use Open Models when: │
│ • Data privacy / on-premise required │
│ • Fine-tuning on your own data needed │
│ • High-volume inference cost control │
│ • Offline / edge deployment │
└──────────────────────────────────────────────┘

Model Size & Hardware

SizeVRAM needed (fp16)Notes
7B~14 GBSingle consumer GPU
13B~26 GBSingle professional GPU
70B~140 GBMulti-GPU or quantised
400B+800 GB+Multi-node cluster
Quantisation

4-bit quantisation (GGUF/AWQ/GPTQ) reduces VRAM by approximately compared to fp16, with a modest quality trade-off. A 70B model in 4-bit (~35 GB) can fit on 2× consumer GPUs.


Common Mistakes

Common Mistakes
  1. Always using the largest model — GPT-4o is ~10–20× more expensive than GPT-4o-mini per token. For simple classification or short-text tasks, the mini model is usually sufficient.
  2. Not checking the model cutoff date — every model has a knowledge cutoff. Asking Claude 3.5 about events after its training will yield hallucinations or "I don't know".
  3. Confusing open-weight with open-source — LLaMA 3's weights are publicly available, but the Meta Community License restricts commercial use for companies with >700M monthly active users.
  4. Running large models without quantisation — trying to run a 70B model on a single 24 GB GPU without quantisation will fail. Always check VRAM requirements.

Quick Quiz

Test Your Understanding

Q1. Name two key advantages of open-weight models over closed/API models.
A1. Data stays on-premise (privacy), and you can fine-tune them on custom data.

Q2. What does 4-bit quantisation roughly do to VRAM requirements?
A2. Reduces it by approximately 4× compared to fp16 (16-bit). A 70B model needs ~140 GB fp16 but only ~35 GB in 4-bit.

Q3. Which leaderboard provides human-preference rankings of LLMs via blind A/B testing?
A3. LMSYS Chatbot Arena.

Q4. You need to process 500-page legal documents in a single prompt. Which model family has the largest context window for this use case?
A4. Google Gemini 1.5 Pro / 2.0 (up to 2,000,000 tokens context). Always verify at the official docs.


Student Exercise

Exercise 1.6 — Compare models on the same task
Choose one task (e.g., summarise a news article). Send the same prompt to two different models via their playground/API (e.g., GPT-4o-mini and a local LLaMA 3.2 via Ollama). Compare output quality, latency, and cost.

Exercise 1.7 — Leaderboard research
Visit the LMSYS Chatbot Arena leaderboard. Find the top 3 open-weight models by Elo score. What are their parameter sizes? Could you run any of them on a laptop?


Further Reading

Next → 1.5 Running Your First LLM