Skip to main content

5.1 Why Single-Prompt Fails

AI-Generated Content

AI-generated content may contain errors. Always verify against official sources.

5.1 Why Single-Prompt Fails

Key Concepts: Context overload · Instruction drift · Quality degradation

Official Docs: LangChain — Why Chains? · Anthropic — Long Context Tips


The Single-Prompt Trap

A natural first instinct is to cram everything into one prompt:

prompt = """
You are an expert analyst.

1. Read the following 50-page report
2. Extract all financial figures
3. Check each figure against the reference data
4. Identify discrepancies
5. Write an executive summary
6. Suggest corrective actions
7. Format everything as a structured JSON report

REPORT: {report} # 40,000 tokens
REFERENCE: {reference} # 10,000 tokens
"""

This approach fails in predictable ways.


Failure Mode 1: Context Overload

When the context is very large, model quality degrades. Research shows models lose track of information buried in the middle of long contexts.

Prompt length: 50,000 tokens

Model attention:
██████████ start (strong)
░░░░░░░░░░░░ middle (weak — "lost in the middle")
██████ end (strong)

Failure Mode 2: Instruction Drift

With many sub-tasks in one prompt, the model loses track of earlier instructions as it generates a long response.

Task 7 instructions at the top  →  by the time the model reaches Task 7 in output,
it has generated thousands of tokens and
may have forgotten the exact formatting requirement.

Failure Mode 3: Quality Degradation on Later Tasks

Studies show that when models are asked to do multiple sequential tasks in one prompt, quality drops significantly for tasks later in the sequence.


Failure Mode 4: No Intermediate Validation

With a single prompt, you cannot check or correct intermediate results:

Single prompt:  Extract → Validate → Summarise → Format  (all in one)

If extraction fails silently — everything downstream is wrong.
You only discover the error at the very end.

The Solution: Multi-Step Chains

Break complex tasks into discrete, verifiable steps. Each step has a single responsibility.

Step 1: Extract financial figures  →  validate output schema
Step 2: Compare with reference → validate discrepancy list
Step 3: Generate summary → validate word count / format
Step 4: Format as JSON report → validate against Pydantic schema

Benefits:

  • Each step can be tested independently
  • Errors are caught early before they propagate
  • Steps can be run in parallel where possible
  • Easier to debug (you know exactly which step failed)

Common Mistakes

Common Mistakes
  1. Adding more instructions when it fails — if a single prompt fails, adding more instructions makes the context overload worse. Break it into steps instead.
  2. Not validating intermediate outputs — always validate each step’s output with a schema or assertion before passing to the next step.
  3. Over-chaining simple tasks — chains add latency and complexity. Use a single prompt for truly simple tasks.

Quick Quiz

Test Your Understanding

Q1. What does the paper "Lost in the Middle" (Liu et al., 2023) show about LLM context handling?
A1. Models attend more strongly to the beginning and end of long contexts, and often miss important information placed in the middle.

Q2. Name two benefits of breaking a complex task into a chain of steps.
A2. Any two of: early error detection, independent testing of each step, ability to run steps in parallel, easier debugging.

Q3. When is it NOT worth using a chain?
A3. For simple, single-responsibility tasks where a single prompt clearly works. Over-engineering adds latency and complexity.


Further Reading

Next → 5.2 LangChain Basics