1.4 Model Landscape
AI-generated content may contain errors. Always verify against official sources.
1.4 Model Landscape
Key Concepts: Closed models · Open-weight models · Size trade-offs · Choosing the right model
Official Sources: OpenAI Models · Anthropic Models · Hugging Face Model Hub
Closed / Proprietary Models
These are hosted by providers — you access them via API. You cannot run them locally or inspect their weights.
| Model Family | Provider | Strengths |
|---|---|---|
| GPT-4o / o3 | OpenAI | General reasoning, multimodal, tool use |
| Claude 3.x / 3.5 | Anthropic | Long-context, coding, safety-focused |
| Gemini 1.5 / 2.0 | Very large context windows, multimodal |
Always check the provider's official docs for the latest available models and context windows — these change frequently.
Open-Weight Models
These models have publicly available weights. You can run them locally, fine-tune them, and deploy them on your own infrastructure.
| Model Family | Provider | License |
|---|---|---|
| LLaMA 3.x | Meta | Meta Community License |
| Mistral / Mixtral | Mistral AI | Apache 2.0 |
| DeepSeek | DeepSeek | MIT |
| Qwen 2.5 | Alibaba | Apache 2.0 |
| Phi-3 / Phi-4 | Microsoft | MIT |
| Gemma 2 | Gemma Terms |
Closed vs Open — How to Choose
┌──────────────────────────────────────────────┐
│ Use Closed Models when: │
│ • Highest quality output needed │
│ • Fast time-to-market matters │
│ • Multimodal input required │
├──────────────────────────────────────────────┤
│ Use Open Models when: │
│ • Data privacy / on-premise required │
│ • Fine-tuning on your own data needed │
│ • High-volume inference cost control │
│ • Offline / edge deployment │
└──────────────────────────────────────────────┘
Model Size & Hardware
| Size | VRAM needed (fp16) | Notes |
|---|---|---|
| 7B | ~14 GB | Single consumer GPU |
| 13B | ~26 GB | Single professional GPU |
| 70B | ~140 GB | Multi-GPU or quantised |
| 400B+ | 800 GB+ | Multi-node cluster |
4-bit quantisation (GGUF/AWQ/GPTQ) reduces VRAM by approximately 4× compared to fp16, with a modest quality trade-off. A 70B model in 4-bit (~35 GB) can fit on 2× consumer GPUs.
Common Mistakes
- Always using the largest model — GPT-4o is ~10–20× more expensive than GPT-4o-mini per token. For simple classification or short-text tasks, the mini model is usually sufficient.
- Not checking the model cutoff date — every model has a knowledge cutoff. Asking Claude 3.5 about events after its training will yield hallucinations or "I don't know".
- Confusing open-weight with open-source — LLaMA 3's weights are publicly available, but the Meta Community License restricts commercial use for companies with >700M monthly active users.
- Running large models without quantisation — trying to run a 70B model on a single 24 GB GPU without quantisation will fail. Always check VRAM requirements.
Quick Quiz
Q1. Name two key advantages of open-weight models over closed/API models.
A1. Data stays on-premise (privacy), and you can fine-tune them on custom data.
Q2. What does 4-bit quantisation roughly do to VRAM requirements?
A2. Reduces it by approximately 4× compared to fp16 (16-bit). A 70B model needs ~140 GB fp16 but only ~35 GB in 4-bit.
Q3. Which leaderboard provides human-preference rankings of LLMs via blind A/B testing?
A3. LMSYS Chatbot Arena.
Q4. You need to process 500-page legal documents in a single prompt. Which model family has the largest context window for this use case?
A4. Google Gemini 1.5 Pro / 2.0 (up to 2,000,000 tokens context). Always verify at the official docs.
Student Exercise
Exercise 1.6 — Compare models on the same task
Choose one task (e.g., summarise a news article). Send the same prompt to two different models via their playground/API (e.g., GPT-4o-mini and a local LLaMA 3.2 via Ollama). Compare output quality, latency, and cost.
Exercise 1.7 — Leaderboard research
Visit the LMSYS Chatbot Arena leaderboard. Find the top 3 open-weight models by Elo score. What are their parameter sizes? Could you run any of them on a laptop?
Further Reading
- 📘 OpenAI Model Overview
- 📘 Anthropic Model Overview
- 📘 Hugging Face Model Hub
- 🏆 LMSYS Chatbot Arena Leaderboard
- 📄 LLaMA 3 Technical Report (Meta, 2024)
- 📄 Mistral 7B (Jiang et al., 2023)
Next → 1.5 Running Your First LLM