تشغيل أي مهارة في Manus بنقرة واحدة

$pwd:

context-engineering

Name: Context Engineering
Author: liza-mas

// Analyze Liza `.liza/agent-prompts/` and `.liza/agent-outputs/` from a context-engineering perspective: prompt payload shape, context budget use, cacheability, duplicated or missing context, instruction hierarchy, tool-output pressure, role-specific context fit, and prompt-output feedback loops. Use when diagnosing agent context bloat, prompt drift, poor agent handoffs, repeated misunderstandings, excessive tool output, or whether Liza agents received the right information at the right time.

تشغيل في Manus

$ git log --oneline --stat

stars:٢٣٥

forks:٣٧

updated:٢٧ مايو ٢٠٢٦ في ١٥:٤٤

مستكشف الملفات

2 ملفات

SKILL.md

readonly

related-skills.json

نفس المستودع

code-review.md

from "liza-mas/liza"

Code Review Protocol

2026-05-30235

adversarial-pairing.md

from "liza-mas/liza"

Coordinate Pairing-mode doer/reviewer sessions through a Markdown blackboard. Use when the user invokes /adversarial-pairing with role and blackboard-path arguments or asks multiple pairing agents to coordinate plan review, implementation, staged code review, and follow-up review rounds without Liza multi-agent mode.

2026-05-30235

liza-logs.md

from "liza-mas/liza"

Analyze Liza agents logs

2026-05-27235

epic-writing.md

from "liza-mas/liza"

Transform vision documents into structured epics that bound story-writing

2026-05-26235

user-story-writing.md

from "liza-mas/liza"

Transform requirements into user stories for coding tasks

2026-05-26235

generic-subagent.md

from "liza-mas/liza"

Context-efficient delegation to subagents (read-only default, READ-WRITE opt-in)

2026-05-12235

package.json

"author": "liza-mas"

"repository": "liza-mas/liza"

فتح مستودع GitHub عرض مستودعات المنشئ

$ install --global

$ download --local

تشغيل في Manus

$ useful --forSOC

محللو أنظمة الحاسوبمهن الحاسوب والرياضيات15-1211L4

name

context-engineering

description

Analyze Liza `.liza/agent-prompts/` and `.liza/agent-outputs/` from a context-engineering perspective: prompt payload shape, context budget use, cacheability, duplicated or missing context, instruction hierarchy, tool-output pressure, role-specific context fit, and prompt-output feedback loops. Use when diagnosing agent context bloat, prompt drift, poor agent handoffs, repeated misunderstandings, excessive tool output, or whether Liza agents received the right information at the right time.

Context Engineering

Scope

Analyze only .liza/agent-prompts/ and .liza/agent-outputs/ unless the user names other artifacts.

This skill is complementary to liza-logs: liza-logs finds operational failures and token/tool patterns; this skill explains whether the prompt and context design caused or amplified those patterns.

Protocol

1. Inventory Before Reading

Run the corpus indexer first:

python3 skills/context-engineering/scripts/context-corpus-index.py .liza

Use the generated index as the primary source for mechanical discovery: inventory, prompt/output pairing, size and pressure signals, outcome signals, common tools, MCP usage, and sample selection. The index is not evidence of causality by itself.

The indexer supports both Claude rich stream-json logs and Codex sparse item.completed logs. Check the reported format counts before assuming which fields are available.

Use indexer options deliberately:

python3 skills/context-engineering/scripts/context-corpus-index.py .liza --json
python3 skills/context-engineering/scripts/context-corpus-index.py .liza --max-pair-minutes 30
python3 skills/context-engineering/scripts/context-corpus-index.py .liza --sample-limit 25

Use --json when exact pair metadata, token fields, or full metrics are needed.
Use --max-pair-minutes to control how strict same-role timestamp pairing should be.
Use --sample-limit to expand or shrink top lists and the sampling plan.

If a liza-logs report or analyzer output is available, use it as the first sampling guide. Prioritize roles, runs, or timestamps with repeated tool failures, broad tool-result volume, duplicated task-local material, growing prompts, low cache reuse for expected-stable prefixes, or blocked/rejected task outcomes.

If liza-logs and context-engineering evidence disagree, report the disagreement explicitly and keep the narrower claim supported by direct prompt/output evidence. Example: liza-logs may correctly flag token pressure while prompt shape is not the cause.

Before opening raw prompt/output files, use the index's Sampling Plan plus the sections relevant to the question: Largest Prompts, Largest Outputs, High Tool-Output Pressure, Outcome Signal Mentions, Role Distribution, Prompt Size Trends, Common Tools, MCP Usage, and Pairing.

Treat indexer outcome signals as text mentions until confirmed by structured state, blackboard, or source context. They are good sampling signals, not proof that a verdict or status transition occurred.

Use the script's pair confidence labels to guide evidence tier and confidence: exact-stem, within-5m, within-30m, within-2h, low-confidence, and no-pair. Report the pairing confidence or matching window for prompt-causality claims.

Classify evidence tier before making claims:

Output-only: behavior, tool-pressure, token-pressure, struggle, rejection, or blocked-outcome finding
Prompt + output pair: prompt-causality finding
Prompt + output + source template/config: fix-localization finding
State/blackboard/spec support: higher-confidence handoff, continuity, and missing-context finding

Fallback indexing when the script is unavailable:

Count prompt files, output files, suffixes, and total bytes.
Group files by role and timestamp parsed from filenames.
Pair outputs to same-role nearest prior prompts and report the matching window.
Use wc -c to identify largest prompts and outputs.
Search output text for tool-output dominance, errors, rejected/blocked mentions, and broad exploratory reads.
Limit claims to output-only and prompt + output pair findings unless source templates/configs/state files are also inspected.
Do not make role-distribution, prompt-trend, aggregate pressure, or sampling-plan claims unless you computed the equivalent fallback data.

2. Prompt Size and Cacheability Audit

Use the index's largest-prompt, largest-output, token, cache, and pressure signals to choose prompts and outputs for deeper reading.

Use the index's Role Distribution table for per-role prompt/output averages, output-to-prompt ratios, and prompt size trend classification (stable, growing, shrinking). Use Prompt Size Trends for chronological size progressions on roles with ≥10 prompts or non-stable trends.

Separate expected fixed cost from avoidable bloat before raising findings:

Contract, guardrail, and tool instructions are load-bearing stable context. Do not flag their volume solely because they are large. Raise an issue only when they are duplicated unnecessarily, rendered in a way that defeats provider prefix/block caching, or bury the current task badly enough to explain observed behavior.
Role/task prompts are parametric by design. Do not flag low cacheability solely because task IDs, state, paths, timestamps, or acceptance criteria vary. Raise an issue only when avoidable dynamic bytes repeat across runs, prompt growth accumulates retry history without compression, or measured cache metrics show expected-stable prefixes are not being reused.
Provider/runtime transcript volume is a logging/storage signal, not automatically an agent-quality or prompt-design problem. Treat rich JSON output size as evidence only after attributing the volume to avoidable agent/tool behavior, such as broad reads, huge command output, repeated diffs, or noisy failing tests.

Treat these as structural pressure signals:

Prompt leaves insufficient room for expected tool output, code reads, and reasoning given that role's observed output-to-prompt ratio
One role's rendered prompt is much larger than sibling roles because of avoidable dynamic or duplicated task-local material
Output-to-prompt ratio >20x suggests tool-heavy exploration only when top items show avoidable broad reads, large diffs, repeated command output, or missing targeted file refs
Output-to-prompt ratio <3x on a non-trivial task suggests the prompt may have crowded out work only after excluding stable load-bearing contract/context
Prompt size trend classified as "growing" indicates accumulating state across iterations
Repeated runs vary early in the prompt before expected-stable blocks, reducing prefix or block cache reuse

Classify prompt segments as:

Segment	Cacheability question
Stable prefix	Is this identical across runs so provider prompt caching can reuse it?
Semi-stable role context	Does it change only when role, contract, skill, or pipeline config changes?
Task-local context	Is it necessary for this run, or should it be referenced/deferred?
Volatile context	Does timestamped state, logs, or generated output appear earlier than needed?

Flag high-leverage cacheability issues:

Volatile state inserted before stable contract/role context
Repeated boilerplate rendered differently across agents or restarts
Large stable context repeated after task-local or timestamped material
Prompt prefixes that differ only because of ordering, whitespace churn, or non-semantic metadata

Check salience order separately from cacheability. A stable prefix helps provider caching, but the agent must still find decision-relevant context quickly. Prefer rendered prompts where the usable task surface is easy to locate:

Task and exact next action
Acceptance criteria and validation plan
Current state, prior failures, and pending decisions
Constraints, guardrails, and role responsibilities
Broader references and load-on-demand source pointers

Flag packing failures where stable but low-salience material buries the task, where broad references appear before the current decision surface, or where large artifacts are embedded when a precise source pointer would let the agent load them on demand.

Actively raise these prompt-shape issues when supported by evidence:

duplicated task-local material within a prompt, across retries, or across handoffs where a compact summary would preserve the decision surface
huge embedded artifacts where a file path, line range, artifact pointer, or bounded excerpt would let the agent load only what it needs
poor salience ordering where the task, next action, acceptance criteria, prior failure, or validation plan is buried behind broad context
prompt growth across retries or restarts without compression of prior attempts, decisions, rejected paths, and current hypothesis

3. Prompt Context Audit

For each sampled prompt, classify context into:

Class	Question
Contract	Did the prompt include required mode, tool, and guardrail context?
Task	Is the concrete task unambiguous, bounded, and falsifiable?
Domain	Does the agent receive the specs, docs, skills, or state it needs?
Operational	Are commands, paths, worktree rules, validation requirements, and approval rules clear?
Noise	What content is duplicated, stale, irrelevant, or too broad for the role?

Look for context-engineering failures:

Underspecific prompts that force avoidable exploration instead of directing the agent to the right source refs, files, acceptance criteria, boundaries, or next action
Missing source-of-truth refs, causing rediscovery or invention
Overbroad contract/spec dumps that crowd out task-local details
Role prompts that do not differ meaningfully across roles
Conflicting instructions without priority or resolution
Hidden assumptions embedded as facts
Validation criteria that cannot falsify the intended outcome
Tool instructions that are present but not operationally actionable

4. Output Behavior Audit

For each sampled output, trace behavior back to context:

Did the agent follow the highest-priority relevant instruction?
Did it ask for missing context, infer silently, or fabricate?
Did it spend context on repeated initialization reads?
Did it perform broad exploratory searches because the prompt lacked concrete refs, acceptance criteria, or boundaries?
Did tool output dominate the transcript?
Did the agent loop because the prompt lacked a decision boundary?
Did review agents catch issues the original role could have prevented with better context?
Do repeated reviewer rejections or blocked outcomes trace to missing, ambiguous, stale, or buried upstream context?

When outputs show failures, distinguish:

Prompt defect: the needed instruction/context was absent, ambiguous, or buried.
Execution defect: the prompt was adequate, but the agent ignored it.
System defect: Liza generated or routed the wrong context.
Evidence gap: logs are insufficient to decide.

Use these heuristics:

If the instruction was present, unambiguous, and salient, but the agent ignored it or contradicted it, lean Execution defect.
If behavior is consistent with a plausible misread of buried, conflicting, stale, or underspecific context, lean Prompt defect.
If the prompt should have contained a state/spec/task field but the rendered prompt omits or misroutes it, lean System defect.
If the output alone shows bad behavior but no paired prompt was read, keep it Evidence gap or Output-only.
If repeated agents fail the same way from similar prompts, prefer a prompt/system hypothesis over individual execution error.

5. Cross-Agent Context Flow

Compare prompt-output chains across roles:

Architect -> reviewer: does the reviewer receive the architect's actual decision surface?
Planner -> coder: are constraints converted into executable acceptance criteria?
Coder -> reviewer: does the reviewer receive enough diff/test context to review behavior?
Orchestrator -> role agents: are task boundaries, prior failures, and superseded work visible?
Restart/session continuity: do prior failures, rejected paths, current hypothesis, pending validation, superseded work, and exact next action survive across handoff or agent restart?

Flag handoff compression failures where important nuance disappears, and handoff bloat where downstream agents receive full upstream artifacts when a structured digest would suffice.

Adversarial pair overlap is not duplication. Liza's doer/reviewer pairs require both agents to receive the same context so the reviewer can independently verify the doer's work. High content overlap between paired roles (e.g., analyst + reviewer, coder + code-reviewer, writer + us-reviewer) is by design. Do not flag it as redundancy. The relevant questions for paired prompts are whether the shared context is too large, poorly ordered, or volatile — not whether it is duplicated.

A good compressed handoff preserves:

Decisions made
Rejected paths and why they were rejected
Current hypothesis or implementation direction
Validation evidence already gathered
Exact next action
Key files, specs, tasks, or state fields
Confidence and source refs for non-obvious claims

6. Fix Localization

For each proposed fix, identify the smallest source artifact likely to implement it:

Prompt template or block under internal/prompts/templates/
Role or pipeline configuration
Contract, guardrail, skill, spec, or docs source
Blackboard or state field consumed by prompt generation
Operational process outside prompt generation

Existing-context check: before recommending a change, verify whether the intended instruction or context already exists upstream but is not rendered, is rendered too late, or is buried by higher-volume context. This prevents recommending duplicate contract/spec/template text when the actual issue is rendering, ordering, routing, or salience.

Each finding's Fix must name:

Source artifact to change
Expected rendered prompt or output difference
Validation method in a future .liza/agent-prompts/ or .liza/agent-outputs/ sample

Do not present a fix-localization finding until the relevant prompt/output pair and source template, config, contract, guardrail, skill, spec, state field, or operational process has been inspected.

7. Synthesis

Produce findings in this format:

# Context Engineering Report

## Executive Summary

- Prompt corpus: [N prompts, size range, roles]
- Output corpus: [N outputs, size range, roles]
- Primary bottleneck: [one sentence]

When referring to a specific session in the Executive Summary or Findings, name
the concrete `.liza/agent-outputs/` log filename. If the claim depends on a
paired prompt, also name the concrete `.liza/agent-prompts/` filename.

## Findings

### P1/P2/P3: [Finding Title]

- **Evidence tier:** [output-only | prompt+output | prompt+output+source | state-supported]
- **Evidence:** [specific prompt/output files and concise observed fact]
- **Confidence:** [high/medium/low, based on prompt-output pairing quality and evidence completeness]
- **Context mechanism:** [why this context shape leads to the behavior]
- **Impact:** [cost, failure rate, review churn, blocked tasks, context pressure]
- **Fix:** [source artifact to change, expected rendered prompt/output difference]
- **Validation:** [future `.liza/agent-prompts/` or `.liza/agent-outputs/` sample that would prove the fix worked]

## Context Budget Opportunities

| Opportunity | Expected effect | Risk |
|-------------|-----------------|------|
| [dedupe/summarize/defer/load-on-demand] | [token or behavior impact] | [what could regress] |

## Non-Findings

Items checked that are acceptable. Include these to avoid repeated rediscovery.

Prioritize findings by observed effect, not theoretical prompt neatness.

Work-Splitting Note

When subagent delegation is supported, consider delegating separable evidence-gathering slices:

Per-role prompt-output audits
Large-file sampling and evidence extraction
Independent handoff-chain checks
Validation of whether a proposed context fix would address the observed behavior

Recommend delegation only when slices are separable, evidence can be passed as raw artifacts, and the main agent can integrate findings without duplicating all reads.

Guardrails

Do not expose secrets. If prompt/output content appears sensitive, quote only redacted snippets.
Do not claim a prompt caused an output unless the prompt-output pair was read.
Do not rely on exact prompt/output filename stems for pairing; match by same role and nearest timestamp, then report the matching window.
Do not claim fix localization unless the source artifact that generates or routes the relevant context was inspected.
Do not recommend deleting contract or guardrail context solely because it is large; first decide whether it is cacheable, load-bearing, or duplicated.
Do not raise prompt cacheability as a finding solely because task-local prompts are parametric. Identify avoidable dynamic content, bad stable/dynamic ordering, or cache metrics inconsistent with the intended stable prefix/block.
Do not raise runtime output volume as a context-engineering finding solely because rich JSON transcripts are large. Attribute the volume to avoidable agent/tool behavior before calling it a prompt or context problem.
Do not optimize only for token count. Context engineering optimizes task success per token, not minimum tokens.
Prefer structured handoff summaries over broad truncation when downstream judgment depends on upstream decisions.

context-engineering

المزيد من هذا المستودع

المزيد من هذا المستودع

Context Engineering

Scope

Protocol

1. Inventory Before Reading

2. Prompt Size and Cacheability Audit

3. Prompt Context Audit

4. Output Behavior Audit

5. Cross-Agent Context Flow

6. Fix Localization

7. Synthesis

Work-Splitting Note

Guardrails

Context Engineering

Scope

Protocol

1. Inventory Before Reading

2. Prompt Size and Cacheability Audit

3. Prompt Context Audit

4. Output Behavior Audit

5. Cross-Agent Context Flow

6. Fix Localization

7. Synthesis

Work-Splitting Note

Guardrails