| name | prompt-engineering |
| description | Prompt design techniques for LLMs: structure, examples, reasoning patterns, and optimization. Invoke whenever task involves any interaction with AI instructions ā crafting, debugging, improving, or evaluating prompts for skills, agents, output styles, or system configurations. |
Prompt Engineering
Every prompt is an interface contract ā clarity of intent determines quality of output. Apply when crafting skills,
agents, output styles, system prompts, or any AI instructions.
Read first when
- You are writing a prompt for another model (skill, subagent, system prompt, agent instruction, output style) ā
load [
${CLAUDE_SKILL_DIR}/references/agent-authored-prompts.md] BEFORE drafting. Agent-authored prompts have
distinct failure modes (over-specification, context leakage, ambiguous output contracts, silent degradation across
pipeline stages) that the diagnostic table below does NOT cover. The summary in
Writing Prompts as an Agent is incomplete ā the reference holds the workflow.
What's Wrong With Your Prompt?
- Wrong format ā add explicit format + example. See Output Format
- Missing information ā be more specific about what to include. See Be Specific
- Hallucination ā add context, request citations. See Provide Context
- Ignores instructions ā place critical rules at top and end, use XML tags. See
Persistent Context
- Complex reasoning fails ā use extended thinking or CoT. See Reasoning
- Inconsistent results ā add 3-5 examples. See Examples
- Too verbose ā specify word/sentence limits. See Be Specific
- Security concerns ā validate input, filter output. See [
${CLAUDE_SKILL_DIR}/references/security.md]
References
- Reasoning techniques ā [
${CLAUDE_SKILL_DIR}/references/reasoning-techniques.md] CoT variants (zero-shot,
few-shot, auto), Tree-of-Thoughts, Self-Consistency, extended thinking (adaptive + manual), reasoning models
(o3/o4-mini), CRANE constrained reasoning, academic citations
- Learning paradigms ā [
${CLAUDE_SKILL_DIR}/references/learning-paradigms.md] ICL theory, zero/few-shot
techniques, example selection research, generated knowledge prompting, active prompting
- Workflow patterns ā [
${CLAUDE_SKILL_DIR}/references/workflow-patterns.md] Prompt chaining topologies, iterative
refinement, meta prompting, APE, automated optimization survey
- Prompt security ā [
${CLAUDE_SKILL_DIR}/references/security.md] OWASP Top 10 for LLM 2025, injection defense,
agentic pipeline security, threat modeling, defense patterns
- Optimization strategies ā [
${CLAUDE_SKILL_DIR}/references/optimization-strategies.md] Promptware engineering
lifecycle, DSPy declarative optimization, RAG integration, manual iteration discipline
- Claude-specific ā [
${CLAUDE_SKILL_DIR}/references/claude-specific.md] Adaptive thinking, effort parameter,
prefilling, prompt caching (automatic + explicit, 1-hour TTL), structured outputs, context windows, technique
combinations
- Long context ā [
${CLAUDE_SKILL_DIR}/references/long-context.md] Document organization patterns, XML structuring
for multi-doc, chunking strategies, context rot mitigation
- Agent & tool patterns ā [
${CLAUDE_SKILL_DIR}/references/agent-patterns.md] ReAct, PAL, Reflexion, ART, ACE
implementation patterns, failure modes, pattern selection
- Agent-authored prompts ā [
${CLAUDE_SKILL_DIR}/references/agent-authored-prompts.md] Agents writing prompts:
decomposition workflow, quality dimensions, failure modes, SPL pattern, pipeline rules
- Persistent context ā [
${CLAUDE_SKILL_DIR}/references/persistent-context.md] Technique transfer to skills/system
prompts, instruction degradation research, format sensitivity, declarative vs procedural, U-shaped attention,
minimalism principle
- Structured data formats ā [
${CLAUDE_SKILL_DIR}/references/structured-data-formats.md] Format benchmarks (KV vs
table vs YAML vs JSON), TOON verdict, output format restrictions, CFPO, format selection rules
- Context engineering ā [
${CLAUDE_SKILL_DIR}/references/context-engineering.md] The discipline beyond prompts:
context types, quality principles, retrieval strategies, management patterns, layered architecture
Read the relevant reference before proceeding.
Core Techniques
Start with the simplest technique that fits the problem. Most issues are solved by the first three.
Be Clear and Direct
The golden rule: show your prompt to a colleague with minimal context. If they're confused, Claude will be too.
Provide Context
Tell Claude:
- What the task results will be used for
- Who the audience is
- What success looks like
Be Specific
- "Summarize this" ā "Summarize in 3 bullets, each under 20 words"
- "Make it better" ā "Fix grammar errors, reduce word count by 30%"
- "Analyze the data" ā "Calculate YoY growth, identify top 3 trends"
Output Format
Always specify format explicitly. Show an example if structure matters:
Extract the following as JSON:
- Product name
- Price (number only)
- In stock (boolean)
Example output:
{"name": "Widget Pro", "price": 29.99, "in_stock": true}
Use Examples (Few-Shot)
3-5 examples typically sufficient. Cover edge cases. Format consistency and input distribution matter more than perfect
label accuracy. Performance plateaus after 8-16 examples.
Example selection rules:
- Cover diversity ā represent different categories, edge cases, styles
- Order simple to complex ā build understanding progressively
- Balance output classes ā equal representation across categories
- Put representative examples last ā recency bias makes later examples more influential
- Prioritize format consistency over perfect labeling
- Wrap in
<examples> tags for clear separation
- In system context, examples at the start outperform those placed later (primacy bias)
Choosing the right paradigm:
- Simple, well-known task ā zero-shot (just ask)
- Need specific output format ā one-shot (1 example)
- Complex classification / nuanced judgment ā few-shot (3-5 examples)
- Domain-specific task ā few-shot with domain examples
- Highly nuanced + complex reasoning ā few-shot + CoT
Extended paradigm details and ICL theory: see [${CLAUDE_SKILL_DIR}/references/learning-paradigms.md].
Use XML Tags
Separate components for clarity and parseability:
<instructions>
Analyze the contract for risks.
</instructions>
<contract>
{{CONTRACT_TEXT}}
</contract>
<output_format>
List risks in <risks> tags, recommendations in <recommendations>.
</output_format>
- Use consistent tag names throughout the prompt
- Reference tags in instructions: "Using the contract in
<contract>..."
- Nest for hierarchy:
<outer><inner>...</inner></outer>
- Critical for multi-component prompts ā significantly improves instruction following
Reasoning
For complex reasoning, ask Claude to show its work:
Think through this in <thinking> tags.
Then provide your answer in <answer> tags.
Critical: Claude must output its thinking. Without outputting the thought process, no thinking actually occurs.
Reasoning models (Claude adaptive thinking, OpenAI o-series):
- These models reason internally ā do NOT add "think step by step" (it's redundant and may degrade quality)
- Prefer general instructions ("think thoroughly") over prescriptive step-by-step plans
- Use
<thinking> tags in few-shot examples to demonstrate desired reasoning style
- Ask for self-verification: "Before finishing, verify your answer against [criteria]"
- Use the
effort parameter to control reasoning depth, not prompt-level CoT
Standard models (no native reasoning):
- Use explicit CoT when the problem requires multi-step reasoning
- Use extended thinking when the problem requires exploring multiple approaches
- Use neither for simple factual tasks
CoT trade-off: helpful for structural formatting and complex logic; harmful for tasks with many mechanical
constraints (word limits, format rules).
Detailed techniques, ToT, self-consistency: see [${CLAUDE_SKILL_DIR}/references/reasoning-techniques.md].
Use Sequential Steps
For multi-step tasks, number the steps:
1. Replace customer names with "CUSTOMER_[ID]"
2. Replace emails with "EMAIL_[ID]@example.com"
3. Redact phone numbers as "PHONE_[ID]"
4. Leave product names intact
5. Output only processed messages, separated by "---"
Cap at ~10-15 steps per sequence; beyond that, decompose into sub-procedures (Hierarchical Task Networks).
Structured Data in Prompts
Format choice measurably affects LLM accuracy ā up to 16pp between best and worst formats on identical content.
- Key-value lists for lookup/routing data where entries are independent ā +8.8pp accuracy over tables
- Markdown tables only for genuinely 2D comparisons where cross-criteria scanning IS the point
- YAML for deeply nested data (configs, hierarchies) ā best accuracy for nested structures
- Avoid CSV, JSONL, XML for input data ā consistently underperform alternatives
Test: if removing a column would lose comparative meaning ā table. Otherwise ā KV list.
Output format restrictions degrade reasoning. Use structured output only when downstream consumers require it;
prefer post-processing free-form output for reasoning-heavy tasks.
Full benchmarks and selection rules: see [${CLAUDE_SKILL_DIR}/references/structured-data-formats.md].
Choosing a Technique
- Simple task, clear format ā zero-shot with clear instructions
- Consistent output format ā few-shot (3-5 examples)
- Complex reasoning ā CoT (standard models) or extended thinking (reasoning models)
- Very complex / exploratory ā extended thinking with high effort
- Multi-step workflow ā prompt chaining. See [
${CLAUDE_SKILL_DIR}/references/workflow-patterns.md]
- External information needed ā ReAct. See [
${CLAUDE_SKILL_DIR}/references/agent-patterns.md]
- Precise calculation ā PAL (generate code). See [
${CLAUDE_SKILL_DIR}/references/agent-patterns.md]
- Multi-attempt allowed ā Reflexion. See [
${CLAUDE_SKILL_DIR}/references/agent-patterns.md]
Worked Example: Diagnosing and Fixing a Prompt
**Original prompt:**
```
You are a helpful assistant. Analyze this code and give me feedback.
Make sure to be thorough. Also format it nicely.
```
Diagnosis:
- Wrong format ā no explicit format specified
- Missing information ā "feedback" and "thorough" are vague
- Ignores instructions ā "format it nicely" is ambiguous
Fixed prompt:
<instructions>
Review the provided code for three categories of issues:
1. Bugs ā logic errors, off-by-one, null handling
2. Security ā injection, auth bypass, data exposure
3. Performance ā unnecessary allocations, O(n^2) loops
</instructions>
<output_format>
For each issue found, return:
- **Location:** file:line
- **Category:** Bug | Security | Performance
- **Severity:** Critical | Major | Minor
- **Fix:** concrete code change (not just description)
If no issues found in a category, state "None found."
</output_format>
<code>
{{CODE}}
</code>
What changed: vague task ā specific categories. No format ā explicit structure. Persona removed (adds no value).
Single paragraph ā XML-separated components.
Prompting in Persistent Context
Techniques behave differently in persistent context (skills, system prompts, CLAUDE.md) vs. one-shot user messages.
- Place critical rules at top and end of context. The U-shaped attention curve makes the middle the worst location.
- Prefer declarative bullets over numbered procedures ā except when ordering matters; cap sequences at ~10-15 steps.
- Prime the domain, don't assign a persona. "This is a security review task" beats "You are an expert auditor."
- Format swings compliance up to 40%. XML tags and Markdown headers beat prose; JSON/YAML are for data, not
instructions.
- Every instruction must earn its place. Apply the deletion test: if removing it doesn't change output, remove it.
Full research synthesis: see [${CLAUDE_SKILL_DIR}/references/persistent-context.md].
Claude-Specific Rules
Adaptive Thinking and Effort
Claude 4.6 models use adaptive thinking ā Claude dynamically determines when and how deeply to reason:
{ "thinking": { "type": "adaptive" }, "effort": "high" }
- effort levels:
max (deepest, Opus/Sonnet 4.6 only), high (default), medium, low
- Effort affects all tokens: text, tool calls, and thinking
- At
high/max, Claude almost always thinks; at low, it may skip thinking for simple queries
budget_tokens is deprecated on 4.6 models ā use effort + adaptive thinking instead
Prefilling
Start Claude's response to control format by including a partial assistant message:
- Force JSON: prefill with
{
- Skip preamble: prefill with the opening sentence
- Force XML wrapper: prefill with
<result>
- Deprecated on 4.6 models but still functional on older models
Prompt Caching
Cache stable context to cut latency and cost. Two modes: automatic (top-level cache_control) and explicit (block-level
breakpoints). Key rules:
- Up to 4 breakpoints per request; 5-minute TTL (default) or 1-hour TTL
- Cache read: 0.1x input price. Cache write: 1.25x (5-min) or 2x (1-hour)
- Place breakpoint on the last block that stays identical across requests
- Cache invalidation hierarchy: tools ā system ā messages
Structured Outputs
Constrained decoding guaranteeing schema-compliant JSON. Use output_config.format for response format or
strict: true on tool definitions. Incompatible with citations and prefilling. Grammar applies only to final text
output ā thinking is unconstrained.
Full API details and technique combinations: see [${CLAUDE_SKILL_DIR}/references/claude-specific.md].
Context Engineering
Context engineering is the 2026 evolution beyond prompt engineering ā designing dynamic systems that provide the right
information and tools, in the right format, at the right time.
Key distinction: prompt engineering crafts a single text string; context engineering manages all inputs to the model
ā system prompts, conversation history, retrieved documents, tool results, memory.
Core principles:
- Most agent failures are context failures, not model failures
- Find the smallest set of high-signal tokens that maximizes the desired outcome
- Treat context as a finite resource with diminishing marginal returns
- Organize context into explicit labeled sections for model parseability
Management patterns:
- Compaction ā summarize nearing-limit context; preserve decisions and open questions, discard raw tool outputs
- Structured note-taking ā agent writes selective notes to persistent storage for state continuity
- Multi-agent isolation ā sub-agents handle deep dives in clean contexts; return condensed summaries
- Just-in-time retrieval ā load identifiers upfront, fetch full content on demand via tools
Full depth: see [${CLAUDE_SKILL_DIR}/references/context-engineering.md].
Long Context Rules
When working with 20K+ token documents:
- Documents at the top, query at the bottom ā exploits the U-shaped attention curve
- Wrap each document in XML tags with identifying metadata (source, type, date)
- Ground responses in quotes ā ask Claude to quote relevant passages before answering
- Remove noise before including documents ā strip boilerplate, headers, navigation
- Place instructions at the end after all documents
Document organization, chunking strategies: see [${CLAUDE_SKILL_DIR}/references/long-context.md].
Prompt Chaining Rules
When a single prompt produces error propagation, decompose into a chain of simpler prompts:
- Single responsibility ā each prompt does one thing well
- Clear interfaces ā define what each step receives and produces
- Validation points ā check output before passing to next step
- Chain when there's a natural validation boundary ā avoid over-chaining
Chain topologies, meta prompting, APE: see [${CLAUDE_SKILL_DIR}/references/workflow-patterns.md].
When Prompting Isn't Enough
Start with prompt engineering. If quality plateaus, consider:
- RAG ā need current/accurate external data the model doesn't have
- DSPy ā metric-driven automatic prompt optimization for complex pipelines with labeled eval data
- Fine-tuning ā need deep domain expertise prompting can't achieve
These compose ā combine as needed. Treat prompts as software: version them, test them, monitor them in production.
Strategy comparison, DSPy details, production quality gates: see
[${CLAUDE_SKILL_DIR}/references/optimization-strategies.md].
Security Rules
When a prompt handles untrusted input (user-provided content, web scraping, external documents):
- Mark trust boundaries ā separate trusted instructions from untrusted data with delimiters
- Harden the system prompt ā explicit boundaries, sandwich defense (repeat critical instructions after user content)
- Validate input ā flag instruction-override patterns, unusual length, encoding attempts
- Filter output ā block responses containing sensitive data patterns
- Apply least privilege ā give the LLM access only to data and tools it needs
- Require human approval for sensitive or destructive actions
Prompt injection cannot be fully prevented ā defense is about reducing attack surface, limiting blast radius, and
detecting incidents.
OWASP Top 10, attack taxonomy, defense patterns: see [${CLAUDE_SKILL_DIR}/references/security.md].
Writing Prompts as an Agent
STOP ā load [${CLAUDE_SKILL_DIR}/references/agent-authored-prompts.md] before drafting. The summary below is
orientation only; the reference holds the decomposition workflow, failure-mode taxonomy, and pipeline rules required to
write a non-broken agent prompt.
When you (the AI) are authoring a prompt for another model to execute ā skills, system prompts, subagent instructions:
- Treat prompts as programs ā define signature (inputs, outputs, success criteria) before writing text
- Decompose into components, scaffold with XML, then draft
- Every generated prompt must be self-contained ā the receiving agent has zero knowledge of your context
- Include explicit output format with a concrete example, not just a description
- Embed validation criteria the receiving agent can self-check against
- Sanitize all user-supplied content before incorporating into generated prompts
Key failure modes: blob-prompts (unstructured paragraphs), context leakage (embedding orchestrator state), ambiguous
output contracts, instruction drift across iterative rewrites.
Quality Checklist
Before finalizing a prompt:
For persistent context (skills, system prompts, CLAUDE.md):
When you authored the prompt as an agent (skill, subagent, system prompt, output style):
Related Skills
skill-engineering ā applies prompt techniques to SKILL.md design, description formulas, and content architecture
subagent-engineering ā applies prompt techniques to subagent system prompts, tool scoping, and delegation triggers
output-style-engineering ā applies prompt techniques to persona definition, tone examples, and behavioral rules
claude-code-sdk ā reference for Claude Code extensibility APIs when building any AI artifact