Score, analyze, improve, and architect AI prompts using a 6-dimension evaluation framework, 20 prompting techniques, 11 structured frameworks, 5-layer system prompt architect, and 20 cognitive formulas. Activate this skill when ANY of the following signals appear:
TRIGGER WORDS (English): prompt, prompts, prompting, system prompt, system instruction, prompt engineering, prompt score, prompt rating, prompt quality, prompt review, prompt audit, prompt improvement, prompt optimization, prompt refactor, prompt rewrite, prompt fix, prompt template, prompt structure, prompt format, prompt design, prompt architecture, chain of thought, few-shot, zero-shot, CoT, RTF, RISE, CARE, STONE, CREATE, TRACE, SCOPE, PACKED, CO-STAR, RAG, CHAIN framework, role assignment, structured output, negative constraint, delimiter pattern, prompt comparison, A/B test prompts.
TRIGGER WORDS (Russian): promprt, promt, promprty, promty, sistemnyj promprt, inzheneriya promprtov, ocenka promprta, rejting promprta, kachestvo promprta, uluchshit' promprt, optimizirovat' promprt, perepisat' promprt, ispravit' promprt, shablon promprta, struktura promprta, format promprta, cepochka myslej, few-shot, promprt-inzhiniring, sravnit' promprty.
TRIGGER PHRASES: "help me write a prompt", "my prompt sucks", "my prompt is not working", "how do I prompt for X", "AI is not following instructions", "output is not what I expected", "make my prompt better", "improve this prompt", "score this prompt", "rate my prompt", "build a system prompt", "design an agent prompt", "which prompting technique should I use", "compare these two prompts", "why is my prompt failing", "prompt keeps hallucinating".
ACTIVATE WHEN: user mentions prompt quality/structure/scoring/optimization, user asks how to write better prompts, user complains about AI output quality (often a prompt issue), user wants to build or improve a system prompt for an agent/chatbot, user asks about prompting techniques or frameworks (RTF, CoT, few-shot, etc.), user wants to compare two prompts, user is designing multi-agent orchestration and needs agent-level prompts, user asks "which framework/technique should I use for X".
DO NOT ACTIVATE FOR: general coding questions (not about prompts), UI/UX design requests, debugging code errors (unless explicitly about prompt engineering), database queries, deployment/infrastructure questions.
id
ZAI-STS-001
version
1.1
trigger
prompt, prompt engineering, improve prompt, system prompt, CoT, few-shot
Skill: Prompt Engineering v1.1
ID: ZAI-STS-001
Version: 1.1
Last Updated: 2026-05
Expert prompt engineering with scoring, techniques, and frameworks.
1. Prompt Scoring (6 Dimensions)
[C] Always score before improving. Scoring is your diagnostic tool -- without it you are guessing.
Dimension
Weight
What It Measures
Specificity
0.25
How specific and unambiguous the request is
Clarity
0.20
Ease of understanding, action verbs, structure
Context
0.15
Sufficient background, role, audience, domain
Completeness
0.15
All necessary info: input/output, edge cases, criteria
Grading: S (95+) > A (80+) > B (65+) > C (50+) > D (35+) > F (<35)
When scoring, show: overall grade + numeric score, each dimension with grade and feedback, top 3 weakest dimensions with specific improvement suggestions.
2. Prompt Improvement Workflow
Score the current prompt (baseline + diagnosis)
Identify the weakest 2-3 dimensions (targets)
Select the most relevant technique(s) from Section 4
Apply one technique at a time, rewrite the prompt
Re-score to verify improvement (show before/after)
Explain what changed and why it matters
Stop when score reaches B+ (70+) or after 3 iterations, whichever comes first
[C] Never apply more than 2 techniques in a single improvement iteration. Stacking too many makes the prompt unreadable.
[W] Stop improving when the score reaches A-grade (80+). Further gains are marginal and the prompt is production-ready.
Done criteria -- improvement is complete when:
Overall score is B+ (70+) or higher
No dimension scores below C (50)
The user confirms the output matches their intent
If after 3 iterations the score is still below B+, explain the structural limitations and suggest a different framework.
3. Framework Selection
Framework
Best For
Complexity
RTF (Role-Task-Format)
Code gen, content writing, data extraction -- 80% of needs
[C] Start with RTF. Upgrade to TRACE/SCOPE only if more structure is needed. CREATE/PACKED for creative content. CHAIN for multi-step pipelines.
[C] Never recommend the 5-layer system prompt architect for tasks solvable with RTF. If a simple Role-Task-Format prompt gets the job done (80% of cases), do not over-engineer.
4. Technique Selection + Examples
Prompt is too vague
Explicit Instruction -- add specificity first. Replace vague words with concrete requirements.
Before: "Write a summary of this article."
After: "Write a 3-paragraph summary of this article. Paragraph 1: main argument. Paragraph 2: supporting evidence. Paragraph 3: implications for software teams. Maximum 200 words total."
Precision Drill -- replace every vague term with a measurable one.
Before: "Make the response faster."
After: "Reduce API response time from 800ms to under 200ms at the 95th percentile."
Definition Lock -- lock key term definitions before reasoning starts.
Before: "Analyze the security of this system."
After: "Analyze the security of this system. By 'security' I mean: resistance to SQL injection, XSS, CSRF, and unauthorized access to admin endpoints. Do not evaluate availability or performance."
Need better reasoning
Chain of Thought -- step-by-step reasoning for math/logic/multi-step tasks.
Before: "What is 15% of 340?"
After: "What is 15% of 340? Think step by step: 1) Convert 15% to decimal, 2) Multiply by 340, 3) State the final answer."
Plan and Solve -- plan first, then execute. Prevents jumping to wrong solutions.
Before: "Refactor this function."
After: "Refactor this function. First, identify the 3 code smells. Then, for each smell, describe the refactoring technique you will apply. Finally, show the refactored code with comments marking each change."
Need to control output
Structured Output -- specify exact format.
Before: "List the errors in this code."
After: "List the errors in this code as a JSON array. Each entry must have: severity (critical|high|medium|low), line (number), type (syntax|logic|security|performance), fix (one-sentence suggestion)."
Negative Constraint -- list what NOT to include.
Before: "Review this pull request."
After: "Review this pull request. Do NOT comment on: code style, formatting, naming conventions, or import ordering. Focus ONLY on logic errors, security vulnerabilities, and performance regressions."
Need domain expertise
Role Assignment -- frame the AI as a domain expert.
Before: "Is this API design good?"
After: "You are a principal API architect with 15 years of experience designing RESTful systems at scale. Evaluate this API design against industry best practices (OpenAPI 3.0, REST constraints, backward compatibility)."
Few-Shot Learning -- provide examples of desired input/output.
Before: "Generate a commit message for my changes."
After: "Generate a commit message for my changes. Examples of good commit messages:
feat(auth): add JWT refresh token rotation
fix(api): handle null response from payment gateway
refactor(db): extract connection pooling into shared module"
Need structure
Delimiter Pattern -- separate instructions from data with markers.
Before: "Translate this text to French: Hello world"
After: "Translate the text between tags to French.
Hello world
"
5. System Prompt Architecture (5 Layers)
For AI agents and chatbots, use the 5-layer architect:
Layer
Weight
Required
Purpose
Identity
0.9
Yes
Role, domain, tone, language, audience
Context
0.7
Yes
Environment, domain knowledge, tools
Constraints
1.0
Yes
Numbered hard rules, forbidden actions
Output
0.95
Yes
Format-specific instructions
Behavior
0.6
No
Tone mapping, quality standards, edge cases
[I] Simple tasks: Identity + Output (2 layers). Production agents: all 5 layers.
When building: start with Identity, add Context, set Constraints (number them, use NEVER), specify Output format exactly, optionally add Behavior with tone mapping and edge case handling.
6. Intent Detection
Intent
Signals
Best Framework
Best Technique
Code generation
"write code", "implement", "build"
RTF or TRACE
Structured Output
Code review
"review", "analyze code", "refactor"
RISE
Adversarial Reviewer
Debugging
"bug", "error", "fix", "crash"
STONE
Chain of Thought
Explanation
"explain", "what is", "how does"
RTF
Few-Shot
Data analysis
"analyze", "statistics", "metrics"
SCOPE
Structured Output
Creative writing
"write", "story", "article", "copy"
CREATE or PACKED
Role Assignment
Layout/UI
"layout", "grid", "dashboard"
SCOPE
Few-Shot
Translation
"translate", "convert to", "localize"
CARE
Delimiter Pattern
Refactoring
"refactor", "restructure", "simplify"
TRACE
Plan and Solve
Testing
"test", "spec", "unit test", "coverage"
RTF
Few-Shot
[W] If unsure which technique to pick, default to Explicit Instruction -- it is the safest starting point.
7. Decision Tree
User request
|
+-- "Improve my prompt" -> Score (Section 1) -> Identify weak dimensions -> Apply technique (Section 4) -> Re-score
|
+-- "Build a system prompt" -> Simple task? -> Yes -> RTF + 2 layers (Identity + Output)
| -> No -> 5-layer architect (Section 5)
|
+-- "Compare prompts" -> Score both -> Side-by-side table with deltas -> Declare winner
|
+-- "Which technique/framework?" -> Detect intent (Section 6) -> Match framework (Section 3) + technique (Section 4)
|
+-- "Design multi-agent system" -> Choose pattern (Section 8) -> Define agent roles -> Map data flow
|
+-- "Think through a decision" -> Select cognitive formula (Section 9) -> Apply step by step
8. Multi-Agent Orchestration
Pattern
Topology
Best For
Sequential Chain
sequential
Plan -> Code -> Review
Parallel Experts
parallel
Multi-perspective review
Hierarchical Delegate
hierarchical
Manager -> specialists -> integrate
Debate Adversarial
mesh
Architecture decisions
Iterative Refinement
sequential
Generate -> Criticize -> Refine
Ensemble Voting
parallel
Answer validation
Diamond
hierarchical
Diverge-then-converge
Supervisor-Workers
hierarchical
Batch processing at scale
9. Cognitive Formulas
Use these when the user needs to think through a problem, not when they just need a better prompt.
Bias Mitigation: Anchoring Break (3 approaches before choosing), Confirmation Discount (argue FOR + AGAINST), Status Quo Challenge (when are assumptions FALSE?)
Reasoning: First Principles (strip to fundamentals), Inversion ("How would I guarantee failure?"), Pre-Mortem ("It failed. Why?")
Self-Critique: Self-Audit (accuracy/completeness/clarity/relevance/consistency), Devil's Advocate (case FOR, then AGAINST, then verdict)
Hard Constraints
[C] These 5 rules are non-negotiable. Violating any is a critical failure.
NEVER assign grade S (95+) if two or more dimensions score below B (65). S-tier means excellence across the board, not just high average.
NEVER recommend the 5-layer system prompt architect for tasks solvable with RTF. If a simple Role-Task-Format prompt gets the job done (80% of cases), do not over-engineer.
NEVER use Chain of Thought for formatting tasks. CoT is for reasoning (math, logic, multi-step analysis). Using it for "format this as JSON" wastes tokens and confuses the model.
NEVER apply more than 2 techniques in a single improvement iteration. Stack too many techniques and the prompt becomes unreadable.
NEVER replace the user's prompt entirely. Improve it, don't rewrite from scratch. Keep their voice, intent, and domain-specific terminology intact.
Error Handling
Problem
What to Do
Score stays flat after 2 iterations
The technique doesn't fit the problem. Switch to a different category.
User says "this isn't what I meant"
Stop. Ask the user to clarify their intent.
Improved prompt scores higher but user dislikes it
You optimized for the framework, not the user. Revert and adjust.
Two dimensions conflict
Prioritize based on the user's stated goal.
Prompt is already A-grade (80+)
Stop improving. Explain that further gains are marginal.
Unsure which technique to pick
Use the intent detection table (Section 6). Default to Explicit Instruction.
Checklist
Intent detected -- you identified what the user actually needs
Framework selected -- you picked the right framework (default: RTF)
6-dimension score shown -- every scoring response includes all 6 dimensions
Technique applied with before/after -- every improvement shows the technique used
Hard Constraints respected -- none of the 5 rules were violated
User's voice preserved -- the improved prompt keeps the user's terminology
Stop condition checked -- improvement stops at B+ (70+) or 3 iterations