一键在 Manus 中运行任何 Skill

$pwd:

agent-finops

Name: Agent Finops
Author: saifyxpro

// Design cost-efficient AI agent architectures. Use when optimizing token usage, selecting model tiers, budgeting compute costs, implementing caching strategies, or designing plan-and-execute patterns for cost reduction. Covers model tiering (frontier for planning, cheap for execution), token budgeting, response caching, the plan-and-execute cost reduction pattern (up to 90% savings), and cost monitoring. Based on emerging FinOps-for-AI trends, heterogeneous model architectures, and production cost optimization practices.

在 Manus 中运行

$ git log --oneline --stat

stars:0

forks:0

updated:2026年2月18日 16:18

文件资源管理器

6 个文件

SKILL.md

readonly

name

agent-finops

description

Design cost-efficient AI agent architectures. Use when optimizing token usage, selecting model tiers, budgeting compute costs, implementing caching strategies, or designing plan-and-execute patterns for cost reduction. Covers model tiering (frontier for planning, cheap for execution), token budgeting, response caching, the plan-and-execute cost reduction pattern (up to 90% savings), and cost monitoring. Based on emerging FinOps-for-AI trends, heterogeneous model architectures, and production cost optimization practices.

Agent FinOps

Design cost-efficient AI agent architectures with model tiering, token budgeting, and caching.

Workflow

Cost Optimization Workflow

Audit current token usage per agent component
Classify tasks by complexity (planning vs execution)
Assign model tiers to each task class
Implement caching for repeated queries
Set up cost monitoring and alerts

Cost Audit Workflow

Measure tokens consumed per agent interaction
Identify the most expensive operations
Check for cacheable or downgradable operations
Calculate potential savings from model tiering
Generate cost reduction recommendations

Model Tiering

Three model tiers for different task complexities. Read references for templates.

Tier	Model Class	Use For	Reference
Frontier	GPT-4o, Claude Opus, Gemini Ultra	Complex reasoning, planning, orchestration	`references/01-model-tiering.md`
Mid-Tier	GPT-4o-mini, Claude Sonnet, Gemini Pro	Standard tasks, code generation	`references/01-model-tiering.md`
Economy	GPT-3.5, Claude Haiku, Gemini Flash	High-frequency, simple execution	`references/01-model-tiering.md`

Plan-and-Execute Pattern

Read the reference for the cost reduction architecture.

Component	Description	Reference
Planner	Frontier model creates strategy (high cost, low frequency)	`references/02-plan-and-execute.md`
Executor	Economy model follows plan (low cost, high frequency)	`references/02-plan-and-execute.md`
Verifier	Mid-tier model checks results (medium cost, as needed)	`references/02-plan-and-execute.md`

Token Optimization

Read the reference for token reduction strategies.

Strategy	Savings	Reference
Response Caching	40-80% for repeated queries	`references/03-token-optimization.md`
Structured Outputs	20-40% vs free-form text	`references/03-token-optimization.md`
Context Compression	30-50% on conversation history	`references/03-token-optimization.md`
Batch Processing	10-30% on similar requests	`references/03-token-optimization.md`

Cost Monitoring

Read the reference for monitoring and alerting setup.

Metric	Description	Reference
Cost per Interaction	Average spend per user session	`references/04-cost-monitoring.md`
Token Efficiency	Useful output tokens / total tokens	`references/04-cost-monitoring.md`
Cache Hit Rate	Percentage of requests served from cache	`references/04-cost-monitoring.md`
Model Tier Distribution	Percentage of requests per tier	`references/04-cost-monitoring.md`

Anti-Patterns

Frontier Everything — using the most expensive model for all tasks
No Caching — regenerating identical responses repeatedly
Token Bloat — verbose system prompts consuming budget on every call
Invisible Costs — no monitoring, no budget alerts, surprise bills
Premature Optimization — optimizing cost before validating quality

Validation Scripts

Estimate agent operational costs with automated scoring (0-10):

python3 scripts/estimate_cost.py <prompt_file> [--strict]

Detects model references across 12 LLMs, calculates per-call and monthly costs (1K/10K calls), checks for tiering/caching/budget strategies, and flags cost anti-patterns (premium models for all requests, full history inclusion, disabled caching).

related-skills.json

同仓库

agent-orchestrator.md

from "saifyxpro/Agent-Architect"

Design and implement multi-agent systems with proven coordination patterns. Use when building agent teams, delegation architectures, inter-agent communication, lead-agent orchestration, or agent swarm coordination. Covers 5 orchestration topologies (hub-and-spoke, pipeline, broadcast, hierarchical, mesh), delegation protocols, state sharing across agent boundaries, conflict resolution, and the plan-and-execute pattern. Based on patterns from Kimi Agent Swarm, Devin, BabyAGI, MetaGPT, Google A2A, and DyLAN architectures.

2026-02-180

agent-safety-architect.md

from "saifyxpro/Agent-Architect"

Design safety architectures for AI agents — autonomy tiers, permission zones, command approval gates, secret handling, escalation paths, and observability. Use when building agents that execute code, modify files, access networks, handle credentials, or make consequential decisions. Covers three autonomy tiers (full-auto, supervised, human-led), container security models, tool safety classifications, and audit logging. Based on patterns from Kimi's 4-layer container model, Claude Code's approval workflows, Devin's data security, and Windsurf's safety protocols.

2026-02-180

context-engineer.md

from "saifyxpro/Agent-Architect"

Design agent memory architectures and context window optimization strategies. Use when building persistent memory systems, context budgeting, dynamic context loading, knowledge retrieval, or managing token limits. Covers three-tier memory (episodic, semantic, procedural), context priority frameworks, just-in-time loading patterns, cache invalidation, and provider-agnostic context layers. Based on patterns from Kimi's skill injection, Cursor's scratchpad, BabyAGI's graph memory, and emerging context engineering practices.

2026-02-180

tool-sdk-designer.md

from "saifyxpro/Agent-Architect"

Design production-grade tool specifications for AI agents. Use when defining tool interfaces, parameter schemas, safety flags, error handling, MCP compatibility, or tool composition rules. Covers three specification formats (XML, JSON Schema, markdown), 6 quality indicators, safety classification, error recovery patterns, and MCP interoperability. Based on analysis of 40+ tool specs from Cursor, Replit, Devin, Kimi, Windsurf, and the Model Context Protocol standard.

2026-02-180

prompt-engineer-pro.md

from "saifyxpro/Agent-Architect"

Generate, audit, and optimize system prompts for AI agents using 8 proven architectural patterns extracted from 16+ production systems (Kimi, Cursor, Devin, Kiro, Claude Code, v0, Windsurf, Lovable, Replit, Traycer, Manus). Use when creating new agent system prompts, auditing existing prompts for quality and completeness, optimizing prompt architecture for specific use cases, or designing multi-agent workflows. Covers skill injection, persona replacement, state machine planning, structured scratchpad, todo tracking, XML response protocols, design system enforcement, and prompt structure blueprints.

2026-02-180

package.json

"author": "saifyxpro"

"repository": "saifyxpro/Agent-Architect"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

金融风险专家商业与金融运营类职业13-2054L4

name

agent-finops

description

Agent FinOps

Design cost-efficient AI agent architectures with model tiering, token budgeting, and caching.

Workflow

Cost Optimization Workflow

Audit current token usage per agent component
Classify tasks by complexity (planning vs execution)
Assign model tiers to each task class
Implement caching for repeated queries
Set up cost monitoring and alerts

Cost Audit Workflow

Measure tokens consumed per agent interaction
Identify the most expensive operations
Check for cacheable or downgradable operations
Calculate potential savings from model tiering
Generate cost reduction recommendations

Model Tiering

Three model tiers for different task complexities. Read references for templates.

Tier	Model Class	Use For	Reference
Frontier	GPT-4o, Claude Opus, Gemini Ultra	Complex reasoning, planning, orchestration	`references/01-model-tiering.md`
Mid-Tier	GPT-4o-mini, Claude Sonnet, Gemini Pro	Standard tasks, code generation	`references/01-model-tiering.md`
Economy	GPT-3.5, Claude Haiku, Gemini Flash	High-frequency, simple execution	`references/01-model-tiering.md`

Plan-and-Execute Pattern

Read the reference for the cost reduction architecture.

Component	Description	Reference
Planner	Frontier model creates strategy (high cost, low frequency)	`references/02-plan-and-execute.md`
Executor	Economy model follows plan (low cost, high frequency)	`references/02-plan-and-execute.md`
Verifier	Mid-tier model checks results (medium cost, as needed)	`references/02-plan-and-execute.md`

Token Optimization

Read the reference for token reduction strategies.

Strategy	Savings	Reference
Response Caching	40-80% for repeated queries	`references/03-token-optimization.md`
Structured Outputs	20-40% vs free-form text	`references/03-token-optimization.md`
Context Compression	30-50% on conversation history	`references/03-token-optimization.md`
Batch Processing	10-30% on similar requests	`references/03-token-optimization.md`

Cost Monitoring

Read the reference for monitoring and alerting setup.

Metric	Description	Reference
Cost per Interaction	Average spend per user session	`references/04-cost-monitoring.md`
Token Efficiency	Useful output tokens / total tokens	`references/04-cost-monitoring.md`
Cache Hit Rate	Percentage of requests served from cache	`references/04-cost-monitoring.md`
Model Tier Distribution	Percentage of requests per tier	`references/04-cost-monitoring.md`

Anti-Patterns

Frontier Everything — using the most expensive model for all tasks
No Caching — regenerating identical responses repeatedly
Token Bloat — verbose system prompts consuming budget on every call
Invisible Costs — no monitoring, no budget alerts, surprise bills
Premature Optimization — optimizing cost before validating quality

Validation Scripts

Estimate agent operational costs with automated scoring (0-10):

python3 scripts/estimate_cost.py <prompt_file> [--strict]

agent-finops

Agent FinOps

Workflow

Cost Optimization Workflow

Cost Audit Workflow

Model Tiering

Plan-and-Execute Pattern

Token Optimization

Cost Monitoring

Anti-Patterns

Validation Scripts

同仓库更多 Skills

同仓库更多 Skills

Agent FinOps

Workflow

Cost Optimization Workflow

Cost Audit Workflow

Model Tiering

Plan-and-Execute Pattern

Token Optimization

Cost Monitoring

Anti-Patterns

Validation Scripts