Few-shot learning (FSL) refers to a model's ability to adapt to new tasks or classes given only a handful (usually 1-32) of annotated examples. In large language models this commonly takes the form of providing several input-output demonstrations directly inside the prompt so the model can infer the task pattern without parameter updates.
Definition and Scope
The "shots" in few-shot denote the count of task examples available at inference time. Few-shot prompt engineering differs from zero-shot prompting (no examples) and fine-tuning (weight updates). In computer vision FSL may involve meta-learning algorithms such as MAML that update a small classifier head; in LLMs, context-based in-prompt learning dominates.
Mechanisms Enabling FSL in LLMs
- In-context learning: the transformer treats demonstrations as long-range dependencies and learns a new mapping on the fly.
- Attention reuse: keys and values from demo pairs bias decoder states toward analogous outputs.
- Gradient-free adaptation: no weight change is needed; capacity scales with context window.
Effect of Shot Count on Accuracy
Design Trade-offs
- Prompt Length: More shots boost accuracy but consume context tokens and increase latency.
- Example Selection: Diverse, prototypical examples generalize better than random ones.
- Order Effects: Place harder or longer examples last to stay within recency limits.
Current Trends (2025)
- Automatic Example Retrieval: Embedding search selects top-k demos per user query.
- Synthetic Few-Shot Data: Models like Self-Instruct generate their own demonstrations offline, cutting human labeling costs.
- Contrastive Decoding: Combine few-shot prompts with a smaller control model to reduce hallucinations.
Implementation Tips
- Cache tokenized demonstrations to avoid recomputing embeddings.
- Remove personally identifiable info from demos to prevent leakage.
- Use system messages to describe the schema, then supply demos; models follow consistency cues.