원클릭으로 Manus에서 모든 스킬 실행

$pwd:

agent-safety-architect

Name: Agent Safety Architect
Author: saifyxpro

// Design safety architectures for AI agents — autonomy tiers, permission zones, command approval gates, secret handling, escalation paths, and observability. Use when building agents that execute code, modify files, access networks, handle credentials, or make consequential decisions. Covers three autonomy tiers (full-auto, supervised, human-led), container security models, tool safety classifications, and audit logging. Based on patterns from Kimi's 4-layer container model, Claude Code's approval workflows, Devin's data security, and Windsurf's safety protocols.

Manus에서 실행

$ git log --oneline --stat

stars:0

forks:0

updated:2026년 2월 18일 16:18

파일 탐색기

7 개 파일

SKILL.md

readonly

name

agent-safety-architect

description

Design safety architectures for AI agents — autonomy tiers, permission zones, command approval gates, secret handling, escalation paths, and observability. Use when building agents that execute code, modify files, access networks, handle credentials, or make consequential decisions. Covers three autonomy tiers (full-auto, supervised, human-led), container security models, tool safety classifications, and audit logging. Based on patterns from Kimi's 4-layer container model, Claude Code's approval workflows, Devin's data security, and Windsurf's safety protocols.

Agent Safety Architect

Design safety boundaries, permission models, and observability for AI agents.

Workflow

Safety Design Workflow

Classify agent actions by risk level (safe, moderate, dangerous)
Assign autonomy tier for each action class
Define permission zones for file system and network access
Implement approval gates for high-risk operations
Add audit logging and observability

Safety Audit Workflow

Map all agent capabilities to risk levels
Check for missing approval gates on dangerous operations
Verify secret handling (no credentials in prompts or logs)
Test escalation paths end-to-end
Score against the safety checklist

Autonomy Tiers

Three levels of agent autonomy based on risk. Read references for templates.

Tier	Risk Level	Agent Role	Reference
Full Auto	Low	Execute without approval	`references/01-full-auto.md`
Supervised	Medium	Execute after approval	`references/02-supervised.md`
Human-Led	High	Recommend only, human executes	`references/03-human-led.md`

Permission Zones

Zone	Access	Reference
File System	Read/write/execute boundaries	`references/04-permission-zones.md`
Network	Allowed endpoints and protocols	`references/04-permission-zones.md`
Secrets	Environment variables, credential vaults	`references/05-secret-handling.md`

Action Risk Classification

<risk_classification>
  <safe auto_approve="true">
    - Read files within workspace
    - Search codebase
    - View file outlines
    - List directories
    - Run read-only database queries
  </safe>

  <moderate requires_approval="first_time">
    - Write/modify files within workspace
    - Run shell commands (non-destructive)
    - Install dependencies
    - Create branches
    - Make API calls to allowed endpoints
  </moderate>

  <dangerous requires_approval="always">
    - Delete files or directories
    - Run commands with sudo/root
    - Push to main/production branches
    - Drop database tables
    - Modify environment variables
    - Access external APIs not in allowlist
    - Execute arbitrary network requests
  </dangerous>
</risk_classification>

Approval Gate Template

<approval_gate>
  <trigger>[Action that requires approval]</trigger>
  <display>
    - Exact command/action to be performed
    - One-sentence purpose explanation
    - Risk assessment (what could go wrong)
  </display>
  <options>
    <approve>Execute the action</approve>
    <modify>Suggest alternative</modify>
    <reject>Cancel the action</reject>
  </options>
  <timeout>[Auto-reject after N minutes of no response]</timeout>
</approval_gate>

Audit Logging Requirements

Every consequential agent action MUST log:

Timestamp
Action taken (tool name + parameters)
Approval status (auto-approved, user-approved, system-approved)
Outcome (success, failure, partial)
State changes caused (files modified, commands run)

Anti-Patterns

Blanket Trust — auto-approving all actions regardless of risk
Security Theater — approval gates on safe actions, none on dangerous ones
Credential Leaking — API keys in prompts, logs, or generated code
Silent Failure — agent fails destructively with no audit trail
Privilege Creep — agent gradually escalates permissions without review

Validation Scripts

Validate safety architecture with automated scoring (0-10):

python3 scripts/validate_safety.py <config_file> [--strict]

Checks autonomy tier definitions, 5 safety mechanisms (secret handling, permission zones, audit logging, escalation, input validation), detects hardcoded credentials, and flags unsafe patterns (bypass instructions, elevated defaults).

related-skills.json

같은 저장소

agent-finops.md

from "saifyxpro/Agent-Architect"

Design cost-efficient AI agent architectures. Use when optimizing token usage, selecting model tiers, budgeting compute costs, implementing caching strategies, or designing plan-and-execute patterns for cost reduction. Covers model tiering (frontier for planning, cheap for execution), token budgeting, response caching, the plan-and-execute cost reduction pattern (up to 90% savings), and cost monitoring. Based on emerging FinOps-for-AI trends, heterogeneous model architectures, and production cost optimization practices.

2026-02-180

agent-orchestrator.md

from "saifyxpro/Agent-Architect"

Design and implement multi-agent systems with proven coordination patterns. Use when building agent teams, delegation architectures, inter-agent communication, lead-agent orchestration, or agent swarm coordination. Covers 5 orchestration topologies (hub-and-spoke, pipeline, broadcast, hierarchical, mesh), delegation protocols, state sharing across agent boundaries, conflict resolution, and the plan-and-execute pattern. Based on patterns from Kimi Agent Swarm, Devin, BabyAGI, MetaGPT, Google A2A, and DyLAN architectures.

2026-02-180

context-engineer.md

from "saifyxpro/Agent-Architect"

Design agent memory architectures and context window optimization strategies. Use when building persistent memory systems, context budgeting, dynamic context loading, knowledge retrieval, or managing token limits. Covers three-tier memory (episodic, semantic, procedural), context priority frameworks, just-in-time loading patterns, cache invalidation, and provider-agnostic context layers. Based on patterns from Kimi's skill injection, Cursor's scratchpad, BabyAGI's graph memory, and emerging context engineering practices.

2026-02-180

tool-sdk-designer.md

from "saifyxpro/Agent-Architect"

Design production-grade tool specifications for AI agents. Use when defining tool interfaces, parameter schemas, safety flags, error handling, MCP compatibility, or tool composition rules. Covers three specification formats (XML, JSON Schema, markdown), 6 quality indicators, safety classification, error recovery patterns, and MCP interoperability. Based on analysis of 40+ tool specs from Cursor, Replit, Devin, Kimi, Windsurf, and the Model Context Protocol standard.

2026-02-180

prompt-engineer-pro.md

from "saifyxpro/Agent-Architect"

Generate, audit, and optimize system prompts for AI agents using 8 proven architectural patterns extracted from 16+ production systems (Kimi, Cursor, Devin, Kiro, Claude Code, v0, Windsurf, Lovable, Replit, Traycer, Manus). Use when creating new agent system prompts, auditing existing prompts for quality and completeness, optimizing prompt architecture for specific use cases, or designing multi-agent workflows. Covers skill injection, persona replacement, state machine planning, structured scratchpad, todo tracking, XML response protocols, design system enforcement, and prompt structure blueprints.

2026-02-180

package.json

"author": "saifyxpro"

"repository": "saifyxpro/Agent-Architect"

GitHub 저장소 열기 Creator 저장소 보기

$ install --global

$ download --local

Manus에서 실행

$ useful --forSOC

소프트웨어 개발자컴퓨터 및 수학직15-1252L4

name

agent-safety-architect

description

Agent Safety Architect

Design safety boundaries, permission models, and observability for AI agents.

Workflow

Safety Design Workflow

Classify agent actions by risk level (safe, moderate, dangerous)
Assign autonomy tier for each action class
Define permission zones for file system and network access
Implement approval gates for high-risk operations
Add audit logging and observability

Safety Audit Workflow

Map all agent capabilities to risk levels
Check for missing approval gates on dangerous operations
Verify secret handling (no credentials in prompts or logs)
Test escalation paths end-to-end
Score against the safety checklist

Autonomy Tiers

Three levels of agent autonomy based on risk. Read references for templates.

Tier	Risk Level	Agent Role	Reference
Full Auto	Low	Execute without approval	`references/01-full-auto.md`
Supervised	Medium	Execute after approval	`references/02-supervised.md`
Human-Led	High	Recommend only, human executes	`references/03-human-led.md`

Permission Zones

Zone	Access	Reference
File System	Read/write/execute boundaries	`references/04-permission-zones.md`
Network	Allowed endpoints and protocols	`references/04-permission-zones.md`
Secrets	Environment variables, credential vaults	`references/05-secret-handling.md`

Action Risk Classification

<risk_classification>
  <safe auto_approve="true">
    - Read files within workspace
    - Search codebase
    - View file outlines
    - List directories
    - Run read-only database queries
  </safe>

  <moderate requires_approval="first_time">
    - Write/modify files within workspace
    - Run shell commands (non-destructive)
    - Install dependencies
    - Create branches
    - Make API calls to allowed endpoints
  </moderate>

  <dangerous requires_approval="always">
    - Delete files or directories
    - Run commands with sudo/root
    - Push to main/production branches
    - Drop database tables
    - Modify environment variables
    - Access external APIs not in allowlist
    - Execute arbitrary network requests
  </dangerous>
</risk_classification>

Approval Gate Template

<approval_gate>
  <trigger>[Action that requires approval]</trigger>
  <display>
    - Exact command/action to be performed
    - One-sentence purpose explanation
    - Risk assessment (what could go wrong)
  </display>
  <options>
    <approve>Execute the action</approve>
    <modify>Suggest alternative</modify>
    <reject>Cancel the action</reject>
  </options>
  <timeout>[Auto-reject after N minutes of no response]</timeout>
</approval_gate>

Audit Logging Requirements

Every consequential agent action MUST log:

Timestamp
Action taken (tool name + parameters)
Approval status (auto-approved, user-approved, system-approved)
Outcome (success, failure, partial)
State changes caused (files modified, commands run)

Anti-Patterns

Blanket Trust — auto-approving all actions regardless of risk
Security Theater — approval gates on safe actions, none on dangerous ones
Credential Leaking — API keys in prompts, logs, or generated code
Silent Failure — agent fails destructively with no audit trail
Privilege Creep — agent gradually escalates permissions without review

Validation Scripts

Validate safety architecture with automated scoring (0-10):

python3 scripts/validate_safety.py <config_file> [--strict]

agent-safety-architect

Agent Safety Architect

Workflow

Safety Design Workflow

Safety Audit Workflow

Autonomy Tiers

Permission Zones

Action Risk Classification

Approval Gate Template

Audit Logging Requirements

Anti-Patterns

Validation Scripts

이 저장소의 다른 Skills

Agent Safety Architect

Workflow

Safety Design Workflow

Safety Audit Workflow

Autonomy Tiers

Permission Zones

Action Risk Classification

Approval Gate Template

Audit Logging Requirements

Anti-Patterns

Validation Scripts

이 저장소의 다른 Skills