1.1 What is a Large Language Model?

AI-Generated Content

AI-generated content may contain errors. Always verify against official sources.

1.1 What is a Large Language Model?

Key Concepts: Neural networks → Transformers → LLMs · Parameters · Training vs Inference

Official Docs: OpenAI — What are LLMs? · Hugging Face NLP Course

What is a Large Language Model?

A Large Language Model (LLM) is a neural network trained on large amounts of text data. Its training objective is simple: predict the next token in a sequence. All capabilities — answering questions, writing code, summarising documents — emerge from learning this single task at scale.

Input text  →  Tokenizer  →  Embedding  →  Transformer Layers  →  Output probabilities  →  Next token

The Transformer Architecture

Modern LLMs are built on the Transformer architecture. The key innovation is self-attention, which allows every token to directly relate to every other token in the sequence, regardless of distance.

Component	Role
Embedding layer	Converts token IDs to dense vectors
Positional encoding	Adds position information to each token
Multi-head self-attention	Each token attends to all other tokens
Feed-forward network	Per-token non-linear transformation
Layer norm + residuals	Stabilises training
LM head	Projects to vocabulary → next-token probabilities

Parameters

A parameter is a learnable number (weight) stored in the network. Models range from millions to hundreds of billions of parameters. Parameters encode patterns learned during training — grammar, facts, reasoning styles.

📌 Rule of thumb: a model with N billion parameters needs roughly 2N GB of GPU memory at 16-bit precision.

Training vs Inference

┌──────────────────────────────────────────────┐
│  PRE-TRAINING                                │
│  Objective: predict next token               │
│  Data: large text corpora                    │
│  Cost: very high (weeks, large GPU clusters) │
├──────────────────────────────────────────────┤
│  FINE-TUNING / ALIGNMENT                     │
│  Supervised training on instruction pairs    │
│  Alignment with human feedback               │
├──────────────────────────────────────────────┤
│  INFERENCE                                   │
│  Single forward pass per token               │
│  Cost: low (milliseconds per token)          │
└──────────────────────────────────────────────┘

Key Takeaways

LLMs are next-token predictors — all capabilities emerge from this objective
The Transformer (self-attention + feed-forward) is the universal building block
Parameters store learned patterns; data quality matters as much as size
Pre-training is expensive; inference is cheap

1.1 What is a Large Language Model?

What is a Large Language Model?​

The Transformer Architecture​

Parameters​

Training vs Inference​

Key Takeaways​

Further Reading​

What is a Large Language Model?

The Transformer Architecture

Parameters

Training vs Inference

Key Takeaways

Further Reading