Follow the decision framework from Chae & Davidson (2025), which maps document characteristics and available resources to the appropriate approach:
Zero-shot prompting: Use when classifying short documents with a large decoder model (GPT-4o, Llama3-70B+) and no labeled training data. Best for rapid prototyping and tasks where constructs are well-defined. GPT-4o achieves the best zero-shot performance across tasks (Chae & Davidson 2025).
Few-shot prompting: Add labeled examples to the prompt. Results are inconsistent — adding examples helps some models but degrades others (Chae & Davidson 2025). Always compare few-shot against zero-shot on a held-out sample before committing. Select diverse examples covering edge cases, not just prototypical instances.
Fine-tuning: Train a model on labeled data. Effective with as few as 100 hand-coded examples for smaller models (Chae & Davidson 2025). Fine-tuned smaller models (Llama3-8B, GPT-3 Davinci) can match GPT-4o zero-shot performance. Prefer this when you have labeled data and need cost-effective classification at scale.
Instruction-tuning: Combine detailed prompting with fine-tuning on paired instruction-output examples. Most powerful regime for complex tasks — instruction-tuned Llama3-70B surpasses GPT-4o zero-shot on stance detection (Chae & Davidson 2025). Requires more technical infrastructure but yields the highest accuracy.
Encoder-only fine-tuning: A distinct fourth regime often omitted from generative-LLM discussions. Fine-tuning a smaller encoder-only model (BERT, DeBERTa, SBERT; ~86–110M parameters, personal-computer hardware) on modest labeled data can match or exceed zero-shot generative LLMs on many classification tasks at a fraction of the cost and with fully reproducible (deterministic) output (Chae & Davidson 2025, Table 1; Ziems et al. 2024 find fine-tuned RoBERTa rarely under-performs larger generative models across 20 tasks). Prefer encoder fine-tuning when the label set is fixed, labeled data exists, and reproducibility matters more than generative flexibility.