Run any Skill in Manus with one click

$pwd:

ai-gateway-guardrails

Name: Ai Gateway Guardrails
Author: aws-samples

// Enforce Input/Output Guardrails at the LLM Gateway layer — PII redaction, Prompt Injection defense, Jailbreak detection, Toxicity filter, and Tool Allow-list. Integrates Bedrock Guardrails, NeMo Guardrails, Llama Guard 3, and regex/regex-ML policies on Bifrost/LiteLLM with Langfuse audit trail.

Run Skill in Manus

$ git log --oneline --stat

stars:11

forks:3

updated:May 2, 2026 at 07:11

SKILL.md

readonly

name	ai-gateway-guardrails
description	Enforce Input/Output Guardrails at the LLM Gateway layer — PII redaction, Prompt Injection defense, Jailbreak detection, Toxicity filter, and Tool Allow-list. Integrates Bedrock Guardrails, NeMo Guardrails, Llama Guard 3, and regex/regex-ML policies on Bifrost/LiteLLM with Langfuse audit trail.
argument-hint	[compliance scope — ISMS-P, finance, healthcare]
user-invocable	true
model	claude-sonnet-4-6
allowed-tools	Read,Write,Edit,Bash,Grep,Glob,mcp__eks,mcp__aws-documentation,mcp__well-architected-security

When to Use

한국 금융권(전자금융감독규정·ISMS-P), 의료, 공공 등 규제 환경에 LLM 서비스를 배포할 때
Prompt Injection / Jailbreak / PII 유출 / Tool Poisoning 위협을 방어해야 할 때
Bedrock Guardrails, NeMo Guardrails, Llama Guard 3 중 선택 및 조합이 필요할 때
Agent 가 외부 Tool 을 호출할 때 Allow-list 기반 정책이 필요할 때

When NOT to Use

내부 PoC 로 위협 모델이 불필요 — Guardrail 오버헤드만 발생
Bedrock 매니지드 모델만 호출하며 Bedrock Guardrails 기본 활성 — 추가 구성 불필요 (단, 로그는 필수)
내부 RAG 없이 단순 Q&A — regex 수준의 Input Guard 만 필요할 수 있음

Preconditions

Inference Gateway (Bifrost/LiteLLM) 가 이미 배포됨 (inference-gateway-routing 완료)
Langfuse 가 audit log 를 수신 가능 (langfuse-observability 완료)
PII 정책·차단 카테고리·Tool Allow-list 정의 문서 확보

Procedure

Step 1. 위협 모델 정의 (OWASP LLM Top 10 기반)

LLM01 Prompt Injection (Direct/Indirect)
LLM02 Sensitive Information Disclosure (PII, 영업비밀)
LLM06 Excessive Agency (Tool 오용)
LLM08 Vector & Embedding Weaknesses (RAG poisoning)

Step 2. 다층 방어 (Defense in Depth)

User → Input Guard → Gateway Policy → Tool Allow-list → LLM → Output Guard → Response
                                                                     ↓
                                                                 Audit Log (Langfuse)

Input Guard: PII redaction, Injection pattern, Jailbreak classifier
Gateway Policy: AuthN/Z, Rate Limit, Tenant Isolation
Tool Allow-list: MCP Server Registry, Scoped tokens
Output Guard: PII scrub, Toxicity, Fact check
Audit Log: 모든 단계에서 Langfuse + CloudTrail 기록

Step 3. Bedrock Guardrails 연동 (매니지드)

# Bifrost 설정
providers:
  bedrock:
    region: ap-northeast-2
    guardrails:
      - id: arn:aws:bedrock:ap-northeast-2:ACCOUNT:guardrail/PII-BLOCK
        version: "1"
      - id: arn:aws:bedrock:ap-northeast-2:ACCOUNT:guardrail/TOXICITY
        version: "1"

Step 4. NeMo Guardrails (오픈소스 Flow)

# config.yml
models:
  - type: main
    engine: openai
    model: gpt-4.1
rails:
  input:
    flows:
      - self check input
      - detect pii
  output:
    flows:
      - self check output
      - remove pii
      - fact checking

Step 5. Llama Guard 3 (Output Classifier)

Meta Llama Guard 3 8B 모델을 vLLM 별도 Pod 로 배포
Bifrost output 훅에서 Llama Guard 3 call → unsafe 판정 시 재생성 또는 차단

Step 6. Tool Allow-list (MCP)

mcpAllowList:
  - name: aws-documentation
    scopes: ["read"]
  - name: eks
    scopes: ["read", "describe"]
  # deny all others
tokenPolicy:
  maxLifetimeSeconds: 900
  audience: ai-infra

Step 7. Audit & 알림

모든 guard violation 은 Langfuse scores + tags 로 기록
Prometheus 메트릭 guardrail_violation_total{type="pii",decision="block"}
CloudWatch Logs + SIEM 연계 (Security Lake)
Slack/PagerDuty 알림 기준: guardrail_violation_rate > 5%/5m

Good Examples

ISMS-P 대상 금융: Bedrock Guardrails(managed PII + Block) + NeMo Guardrails(자체 Policy) + Llama Guard 3(output)
Coding Agent: Tool Allow-list 로 shell_exec, network_request 차단
RAG: Indirect Injection 방어용 Llama Guard 3 + fact-check Flow

Bad Examples (금지)

Guardrails 없이 Tool-calling Agent 를 프로덕션 배포 → LLM06 즉시 위반
정규식 기반 PII 단독 → 한국 주민번호 변형 패턴 미탐지, ML classifier 병행 필수
Audit log 미수집 → 규제 감사 시 근거 부재
allowed-tools: ["*"] — 전체 허용 = 정책 없음

References

AI Gateway Guardrails (community resource)
컴플라이언스 프레임워크 (community resource)
Bedrock Guardrails 공식 문서
NeMo Guardrails
Llama Guard 3 (Hugging Face)
OWASP LLM Top 10 2025
ISMS-P 인증 기준 — 한국 인터넷진흥원

related-skills.json

same repository

audit-trail.md

from "aws-samples/sample-oh-my-aidlcops"

모든 사용자 발화·agent 행동·phase 전환·gate 판정을 ISO 8601 타임스탬프와 함께 감사 로그에 기록한다. 사용자 입력은 축약·요약 없이 verbatim blockquote로 보존하며, SOC2·ISMS-P 감사 요구사항에 매핑되는 보존 정책(30·90·365일)을 프로젝트별로 선택한다. 모든 AIDLC skill이 호출 가능한 공통 감사 계층을 제공한다.

2026-05-0211

component-design.md

from "aws-samples/sample-oh-my-aidlcops"

Inception 아티팩트(requirements, user-stories, workflow-plan)를 입력으로 받아 agentic 시스템의 컴포넌트 경계·인터페이스 계약·데이터 모델을 설계하고 `.omao/plans/construction/design.md`를 생성한다. Agent·Tool·Memory·Gateway 경계를 명확히 나누고 후속 code-generation·test-strategy skill의 단일 진실원 역할을 한다.

2026-05-0211

quality-gates.md

from "aws-samples/sample-oh-my-aidlcops"

AIDLC 각 phase 종료 시점에 필수 gate 체크리스트를 강제한다. Inception gate는 요구사항·사용자 스토리·워크플로우 계획 서명을, Construction gate는 설계·코드·테스트 전수 통과와 risk-discovery PASS를, Operations gate는 continuous-eval 24시간 green과 cost-governance budget OK를 요구한다. 미통과 시 `.omao/state/gates/<phase>.json`에 blocked 상태를 기록하고 다음 phase 진입을 차단한다.

2026-05-0211

risk-discovery.md

from "aws-samples/sample-oh-my-aidlcops"

Construction 단계 실행 직전, Inception 아티팩트와 설계 문서를 교차 분석하여 12개 카테고리 기반 위험 체크포인트를 탐지한다. 비즈니스 연속성·보안·외부 통합·데이터 일관성·비용·성능·규제·가용성·장애 전파 반경·운영 복잡도·의존성 취약점·롤백 가능성을 각각 PASS/WARN/BLOCK으로 판정하고 BLOCK 항목은 다음 phase 진입을 차단한다.

2026-05-0211

structured-intake.md

from "aws-samples/sample-oh-my-aidlcops"

AIDLC Inception 시작 시 자유 형식 발화 대신 구조화된 템플릿으로 프로젝트 정보를 수집한다. project-info 템플릿과 requirements 템플릿을 순차 생성하여 후속 workspace-detection·requirements-analysis·user-stories skill이 소비할 단일 진실원을 제공한다.

2026-05-0211

user-stories.md

from "aws-samples/sample-oh-my-aidlcops"

Conditionally generate user stories in As-a/I-want/So-that format only when changes are user-facing. Skips story generation for pure infrastructure or internal refactor work. Produces stories with acceptance criteria linked back to REQ-IDs.

2026-05-0211

package.json

"author": "aws-samples"

"repository": "aws-samples/sample-oh-my-aidlcops"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Information Security AnalystsComputer and Mathematical Occupations15-1212L4

name	ai-gateway-guardrails
description	Enforce Input/Output Guardrails at the LLM Gateway layer — PII redaction, Prompt Injection defense, Jailbreak detection, Toxicity filter, and Tool Allow-list. Integrates Bedrock Guardrails, NeMo Guardrails, Llama Guard 3, and regex/regex-ML policies on Bifrost/LiteLLM with Langfuse audit trail.
argument-hint	[compliance scope — ISMS-P, finance, healthcare]
user-invocable	true
model	claude-sonnet-4-6
allowed-tools	Read,Write,Edit,Bash,Grep,Glob,mcp__eks,mcp__aws-documentation,mcp__well-architected-security

When to Use

한국 금융권(전자금융감독규정·ISMS-P), 의료, 공공 등 규제 환경에 LLM 서비스를 배포할 때
Prompt Injection / Jailbreak / PII 유출 / Tool Poisoning 위협을 방어해야 할 때
Bedrock Guardrails, NeMo Guardrails, Llama Guard 3 중 선택 및 조합이 필요할 때
Agent 가 외부 Tool 을 호출할 때 Allow-list 기반 정책이 필요할 때

When NOT to Use

내부 PoC 로 위협 모델이 불필요 — Guardrail 오버헤드만 발생
Bedrock 매니지드 모델만 호출하며 Bedrock Guardrails 기본 활성 — 추가 구성 불필요 (단, 로그는 필수)
내부 RAG 없이 단순 Q&A — regex 수준의 Input Guard 만 필요할 수 있음

Preconditions

Inference Gateway (Bifrost/LiteLLM) 가 이미 배포됨 (inference-gateway-routing 완료)
Langfuse 가 audit log 를 수신 가능 (langfuse-observability 완료)
PII 정책·차단 카테고리·Tool Allow-list 정의 문서 확보

Procedure

Step 1. 위협 모델 정의 (OWASP LLM Top 10 기반)

LLM01 Prompt Injection (Direct/Indirect)
LLM02 Sensitive Information Disclosure (PII, 영업비밀)
LLM06 Excessive Agency (Tool 오용)
LLM08 Vector & Embedding Weaknesses (RAG poisoning)

Step 2. 다층 방어 (Defense in Depth)

User → Input Guard → Gateway Policy → Tool Allow-list → LLM → Output Guard → Response
                                                                     ↓
                                                                 Audit Log (Langfuse)

Input Guard: PII redaction, Injection pattern, Jailbreak classifier
Gateway Policy: AuthN/Z, Rate Limit, Tenant Isolation
Tool Allow-list: MCP Server Registry, Scoped tokens
Output Guard: PII scrub, Toxicity, Fact check
Audit Log: 모든 단계에서 Langfuse + CloudTrail 기록

Step 3. Bedrock Guardrails 연동 (매니지드)

# Bifrost 설정
providers:
  bedrock:
    region: ap-northeast-2
    guardrails:
      - id: arn:aws:bedrock:ap-northeast-2:ACCOUNT:guardrail/PII-BLOCK
        version: "1"
      - id: arn:aws:bedrock:ap-northeast-2:ACCOUNT:guardrail/TOXICITY
        version: "1"

Step 4. NeMo Guardrails (오픈소스 Flow)

# config.yml
models:
  - type: main
    engine: openai
    model: gpt-4.1
rails:
  input:
    flows:
      - self check input
      - detect pii
  output:
    flows:
      - self check output
      - remove pii
      - fact checking

Step 5. Llama Guard 3 (Output Classifier)

Meta Llama Guard 3 8B 모델을 vLLM 별도 Pod 로 배포
Bifrost output 훅에서 Llama Guard 3 call → unsafe 판정 시 재생성 또는 차단

Step 6. Tool Allow-list (MCP)

mcpAllowList:
  - name: aws-documentation
    scopes: ["read"]
  - name: eks
    scopes: ["read", "describe"]
  # deny all others
tokenPolicy:
  maxLifetimeSeconds: 900
  audience: ai-infra

Step 7. Audit & 알림

모든 guard violation 은 Langfuse scores + tags 로 기록
Prometheus 메트릭 guardrail_violation_total{type="pii",decision="block"}
CloudWatch Logs + SIEM 연계 (Security Lake)
Slack/PagerDuty 알림 기준: guardrail_violation_rate > 5%/5m

Good Examples

ISMS-P 대상 금융: Bedrock Guardrails(managed PII + Block) + NeMo Guardrails(자체 Policy) + Llama Guard 3(output)
Coding Agent: Tool Allow-list 로 shell_exec, network_request 차단
RAG: Indirect Injection 방어용 Llama Guard 3 + fact-check Flow

Bad Examples (금지)

Guardrails 없이 Tool-calling Agent 를 프로덕션 배포 → LLM06 즉시 위반
정규식 기반 PII 단독 → 한국 주민번호 변형 패턴 미탐지, ML classifier 병행 필수
Audit log 미수집 → 규제 감사 시 근거 부재
allowed-tools: ["*"] — 전체 허용 = 정책 없음

References

AI Gateway Guardrails (community resource)
컴플라이언스 프레임워크 (community resource)
Bedrock Guardrails 공식 문서
NeMo Guardrails
Llama Guard 3 (Hugging Face)
OWASP LLM Top 10 2025
ISMS-P 인증 기준 — 한국 인터넷진흥원

ai-gateway-guardrails

When to Use

When NOT to Use

Preconditions

Procedure

Step 1. 위협 모델 정의 (OWASP LLM Top 10 기반)

Step 2. 다층 방어 (Defense in Depth)

Step 3. Bedrock Guardrails 연동 (매니지드)

Step 4. NeMo Guardrails (오픈소스 Flow)

Step 5. Llama Guard 3 (Output Classifier)

Step 6. Tool Allow-list (MCP)

Step 7. Audit & 알림

Good Examples

Bad Examples (금지)

References

More from this repository

When to Use

When NOT to Use

Preconditions

Procedure

Step 1. 위협 모델 정의 (OWASP LLM Top 10 기반)

Step 2. 다층 방어 (Defense in Depth)

Step 3. Bedrock Guardrails 연동 (매니지드)

Step 4. NeMo Guardrails (오픈소스 Flow)

Step 5. Llama Guard 3 (Output Classifier)

Step 6. Tool Allow-list (MCP)

Step 7. Audit & 알림

Good Examples

Bad Examples (금지)

References

More from this repository