Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

ai-llm-backend

Build LLM features on the backend — deterministic agent loops (round-trip every tool call by id), RAG over a vector store, token/cost accounting, streaming, eval harness, and prompt-injection defense (treat all model context as untrusted). Use when adding an AI feature, building RAG, or wiring an agent loop. Not for the AI streaming UI on the frontend (use frontend-toolkit's AI integration) or general boundary input parsing (use data-validation).

Exécuter dans Manus

Étoiles0

Forks0

Mis à jour8 juin 2026 à 13:41

Source

JayKim88

JayKim88/claude-ai-engineering

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Commande d'installation

Téléchargement

Exécuter dans Manus

SKILL.md

readonly

Plus depuis ce dépôt

même dépôt

api-contract

JayKim88/claude-ai-engineering

Define a schema-first API contract — standardized error envelope (RFC 9457), pagination, status codes, consistent JSON shapes. Use when establishing API conventions, before multiple teams consume an API, or when error responses are inconsistent. Not for choosing the protocol or modeling resources (use api-design) or for runtime input parsing at the boundary (use data-validation).

2026-06-080

api-design

JayKim88/claude-ai-engineering

Choose the API protocol (REST / GraphQL / gRPC) by traffic shape and design resources, versioning, and async patterns. Use when adding a new API surface, designing a service boundary, or when clients complain about over/under-fetching. Not for the schema/error envelope details (use api-contract) or per-resource access control (use authorization).

2026-06-080

architecture-improvement

JayKim88/claude-ai-engineering

Default to a modular monolith with enforced internal boundaries; treat microservices as a destination after boundaries prove stable, not a starting point. Use when structuring a backend, when tempted to split into services, or when module boundaries blur. Not for the actual schema-split / service-extraction migrations (use migration-strategy + schema-design).

2026-06-080

async-messaging

JayKim88/claude-ai-engineering

Build reliable event-driven flows with the Transactional Outbox pattern — write state and event in one transaction, relay asynchronously, achieve at-least-once delivery + consumer idempotency. Use when an action must reliably trigger downstream work, or when events are lost on crash (dual-write problem). Not for simple background work without state+event reliability (use background-jobs) or outbound HTTP webhook specifics (use webhook-design).

2026-06-080

authentication

JayKim88/claude-ai-engineering

Choose and implement auth correctly — JWT vs session vs OAuth decision, pin allowed algorithms server-side, rotate refresh tokens with reuse detection, avoid the classic JWT pitfalls. Use when adding login, integrating OAuth, or when token handling looks risky. Not for access control / permissions (use authorization) or a broader OWASP audit (use backend-security-audit).

2026-06-080

authorization

JayKim88/claude-ai-engineering

Design access control — RBAC for coarse function-level checks, Postgres Row Level Security (RLS) for row-level data isolation, ABAC pushed to the app/policy layer. Use when adding permissions, building multi-user data access, or when one user can see another's data. Not for establishing who the caller is (use authentication) or tenant isolation specifically (use multitenancy-audit).

2026-06-080

name	ai-llm-backend
description	Build LLM features on the backend — deterministic agent loops (round-trip every tool call by id), RAG over a vector store, token/cost accounting, streaming, eval harness, and prompt-injection defense (treat all model context as untrusted). Use when adding an AI feature, building RAG, or wiring an agent loop. Not for the AI streaming UI on the frontend (use frontend-toolkit's AI integration) or general boundary input parsing (use data-validation).
license	MIT

AI / LLM Backend

Purpose

Build production LLM features that are deterministic where they must be, cost-controlled, observable, and safe against prompt injection — rather than a fragile prompt glued to an API call.

Universal — agent-loop discipline, RAG architecture, token accounting, streaming, eval, and treating model context as untrusted are LLM-backend principles independent of the model vendor; pgvector/Postgres is the default vector store.

Procedure

Distinguish workflows from agents
- Workflow — predefined LLM call sequence (classify → extract → format); deterministic, cheaper, debuggable; prefer this
- Agent — model dynamically chooses tools/steps in a loop; powerful but less predictable; use only when the path genuinely can't be predefined
Make the agent loop deterministic and bounded
- Round-trip every tool call correctly: each tool-use block gets a matching tool-result carrying the SAME tool-call id — mismatches corrupt the conversation (Anthropic names this field tool_use_id; OpenAI calls it tool_call_id)
- Cap loop iterations (no infinite tool-calling); cap tool-result token size (compaction)
- Invest in the tool interface (clear schemas + descriptions) as much as the prompt
RAG: keep the vector store in the existing database
- One datastore avoids operating a second system; an embeddings column alongside your data is enough for most workloads
- Use an approximate-nearest-neighbour (ANN) index tuned for the recall/latency balance you need
- Chunk deliberately, store source metadata for citations, retrieve top-k then re-rank
- Pin the embedding model id with each stored vector — changing the embedding model invalidates every existing vector (different model = different vector space). Plan a reindex (or dual-write embeddings during a window) before swapping; this is a one-way migration and the #1 RAG operational landmine
Account for tokens and cost per call — and survive provider limits
- Log input/output tokens + model + cost for every LLM call
- Set per-user / per-session budgets; alert on spikes (a runaway agent loop or prompt-injection can explode cost)
- Handle 429/503 from the provider with exponential backoff + jitter (see resilience-patterns); cap parallel in-flight calls per key; for high-availability paths, define a model-fallback chain (primary → secondary → cached/degraded)
- For deterministic prompts (temperature = 0, same input) cache the response — see caching-strategy (this is what makes evals cheap to re-run)
Stream responses
- SSE / ReadableStream for token-by-token output (pairs with frontend-toolkit AI streaming)
- Handle mid-stream cancellation (client disconnect → stop generation → stop billing)
Treat ALL model-context content as untrusted (prompt injection is structural)
- User input, retrieved documents, tool results — all can carry injection; you can't fully "patch" it
- Channel separation is the structural defense: keep user-supplied content in the user role, never concatenated into the system prompt or a tool description. Same for retrieved docs — wrap each as a user message with a clear "untrusted retrieved content" boundary
- Defenses: never let the model's raw output trigger privileged actions without a gate; validate/parse tool arguments (see data-validation); a Human-in-the-loop gate for high-stakes actions; least-privilege tools
Build an eval harness
- A fixed test set of inputs + expected properties; score outputs (exact, rubric, LLM-judge)
- Run on prompt/model changes — regressions in LLM features are invisible without evals
Validate (validation loop)
- Run the eval set; if quality drops below threshold on a prompt/model change → revert or fix and re-run
- Inject a prompt-injection payload via a retrieved doc → verify it can't trigger a privileged action
- Force a tool error → verify the loop handles it (tool-call id still round-tripped, doesn't hang)

Anti-patterns

❌ Anti-pattern	✅ Correct
Agent loop with no iteration cap	Bounded loop + tool-result compaction
Mismatched/ignored tool-call id	Round-trip every tool call by id
Trusting retrieved docs / tool output as safe	Treat all context as untrusted; gate privileged actions
No token/cost logging	Per-call token + cost accounting + budgets
Shipping prompt changes with no eval	Eval harness gates prompt/model changes
Standing up a second vector DB when your DB can store vectors	Vector store in the existing database
Swapping embedding models without a reindex plan	Pin the embedding model id with each vector; reindex (or dual-write) before swap
User text concatenated into the system prompt	Channel separation: user content in the user role only
No backoff / fallback on provider 429 / 503	Exponential backoff + jitter + parallel-call cap; multi-model fallback for critical paths

Severity tiers

Tier	Examples	Action SLA
Critical	Prompt injection can trigger a privileged action (delete data, send money); unbounded agent loop / cost; raw model output executed; embedding model swapped with no reindex (RAG silently returns garbage)	Block release; fix immediately
Major	No token/cost accounting; no eval harness; tool-call-id mishandling causing failures; user content mixed into the system prompt (collapses channel-separation defense)	Fix this sprint
Minor	Suboptimal chunking; ANN index params untuned; missing stream cancellation; no multi-model fallback	Schedule within 2 sprints

Completion Criteria

Agent loops bounded + tool calls round-tripped by id
RAG uses a vector store with an ANN index + source metadata
Token/cost logged per call + budgets set
Responses stream with cancellation handling
All model context treated as untrusted; privileged actions gated
Eval harness gates prompt/model changes
Embedding model id pinned per vector; reindex plan in place before any swap
Provider rate-limit handling (429/503 backoff + parallel cap); multi-model fallback for critical paths

Output

AI feature code: agent loop / RAG pipeline / streaming endpoint
Eval harness: test set + scoring + CI integration
Cost dashboard: per-feature token/cost metrics
Commit format: feat(ai): RAG over <corpus> with pgvector / feat(ai): eval harness for <feature>

Implementation

TypeScript + Postgres(pgvector) + Anthropic SDK (default)

Agent loop: Anthropic SDK tool-use; match tool_use_id on every tool_result; cap iterations
RAG: pgvector extension, vector column, HNSW index (USING hnsw (embedding vector_cosine_ops)); embeddings via the model provider
Streaming: SSE from a NestJS endpoint or ReadableStream; pairs with frontend-toolkit ai-llm UI
Cost: log usage (input/output tokens) per call to observability
Eval: a test suite of prompts + assertions (run in CI)

Other stacks

Python / FastAPI: Anthropic/OpenAI SDK; pgvector via SQLAlchemy or pgvector-python; LangChain/LlamaIndex optional (prefer thin)
Go: provider SDKs; pgvector via pgx
Universal: agent-loop discipline, prompt-injection-is-untrusted, token accounting, and eval are vendor-agnostic; pgvector is Postgres (alternatives: Qdrant/Weaviate, but prefer one datastore)

Related skills

data-validation — tool inputs and model outputs are untrusted — parse them
observability-setup — token/cost/latency are first-class metrics for AI features
caching-strategy — cache embeddings and deterministic completions

Reference

Key insight encoded: Distinguish workflows (predefined, prefer) from agents (dynamic); treat ALL model-context content as untrusted (prompt injection is structural, not patchable) and defend with channel separation — user content stays in the user role, never concatenated into the system prompt; make the loop deterministic — round-trip every tool call by id (tool_use_id), cap tool-result tokens, stream with explicit cost accounting. Two operational landmines: changing the embedding model invalidates all stored vectors (plan the reindex), and provider rate-limit / outage handling needs explicit backoff + multi-model fallback for critical paths.