Run any Skill in Manus with one click

ai-engineer

Build production-ready LLM applications, RAG systems, and intelligent agents. Covers model selection, vector search, prompt engineering, agent orchestration with LangGraph, cost optimization, and AI safety. Use for any LLM feature, chatbot, AI agent, or AI-powered application.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/RaheesAhmed/SajiCode --skill ai-engineer

Copy and paste this command into Claude Code to install the skill

Source

RaheesAhmed/SajiCode

Stars66

Forks15

UpdatedMarch 4, 2026 at 00:56

SKILL.md

readonly

More from this repository

same repository

3d-web-experience

RaheesAhmed/SajiCode

Build immersive 3D web experiences with Three.js, React Three Fiber, Spline, and WebGL. Covers product configurators, 3D portfolios, scroll-driven 3D, model pipeline optimization, performance budgets, and interactive scenes. Use when building 3D websites, three.js scenes, or WebGL experiences.

2026-03-0466

api-architect

RaheesAhmed/SajiCode

Design and implement production APIs — RESTful and GraphQL. Covers API design principles, endpoint architecture, authentication flows (OAuth2, JWT, API keys), webhook handlers, third-party API integration, rate limiting, retry logic, API versioning, error standardization, and OpenAPI documentation. Use when designing APIs, integrating external services, or building API clients.

2026-03-0466

architect

RaheesAhmed/SajiCode

Design scalable system architectures and make technology decisions. Covers microservices vs monolith analysis, event-driven architecture, CQRS, domain-driven design, system design methodology, scalability planning, technology selection frameworks, architecture decision records, C4 diagrams, and infrastructure patterns. Use when designing systems, making architectural decisions, or planning for scale.

2026-03-0466

database-patterns

RaheesAhmed/SajiCode

Design and implement production database systems. Covers schema design, migrations, query optimization, ORMs (Prisma, Drizzle, TypeORM), PostgreSQL/SQLite/MongoDB patterns, indexing strategies, connection pooling, and multi-tenant architectures. Use when designing schemas, writing queries, or setting up data access layers.

2026-03-0466

debugger

RaheesAhmed/SajiCode

Systematic debugging and troubleshooting methodology for production applications. Covers error trace analysis, network debugging, memory profiling, reproducing intermittent bugs, bisecting regressions, stack trace interpretation, log analysis, performance bottleneck identification, and debugging strategies for Node.js, browser, React, and database issues. Use when debugging errors, investigating failures, or troubleshooting performance issues.

2026-03-0466

devops-patterns

RaheesAhmed/SajiCode

Cloud-native DevOps and infrastructure automation. Covers Docker multi-stage builds, Kubernetes deployments, GitHub Actions CI/CD, Vercel/AWS/GCP deployment, monitoring with Prometheus and Grafana, logging, health checks, infrastructure as code, and production readiness checklists. Use when deploying, containerizing, or setting up CI/CD pipelines.

2026-03-0466

Source

RaheesAhmed

RaheesAhmed/SajiCode

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Data ScientistsComputer and Mathematical Occupations15-2051L4

Software DevelopersL4

name	ai-engineer
description	Build production-ready LLM applications, RAG systems, and intelligent agents. Covers model selection, vector search, prompt engineering, agent orchestration with LangGraph, cost optimization, and AI safety. Use for any LLM feature, chatbot, AI agent, or AI-powered application.

AI Engineer

Model Selection Matrix

Model	Best For	Cost	Speed	Context
GPT-4o	General tasks, function calling	$$$	Fast	128K
Claude 3.5 Sonnet	Code generation, analysis	$$$	Fast	200K
GPT-4o-mini	Cost-sensitive tasks	$	Very fast	128K
Claude 3.5 Haiku	High-volume, simple tasks	$	Very fast	200K
Llama 3.1 70B	Self-hosted, privacy	Free*	Medium	128K
Mixtral 8x22B	Open-source, balanced	Free*	Medium	64K
Gemini 2.0 Flash	Multimodal, long context	$$	Fast	1M

Decision: Start with cheapest model that meets quality bar. Upgrade only when quality fails.

RAG Pipeline Architecture

Ingestion Pipeline

Documents → Chunking → Embedding → Vector Store
                ↓
         Metadata extraction
         (title, source, date)

Retrieval Pipeline

Query → Query Understanding → Retrieval → Reranking → Generation
              ↓                    ↓           ↓
         Expansion/       Hybrid search    Cross-encoder
         Decomposition    (vector + BM25)   scoring

Chunking Strategies

// Recursive text splitter — best default
const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200,
  separators: ["\n\n", "\n", ". ", " ", ""],
});

// Semantic chunking — for high-quality retrieval
const semanticSplitter = new SemanticChunker(embeddings, {
  breakpointThresholdType: "percentile",
  breakpointThresholdAmount: 95,
});

Vector Database Selection

Database	Hosting	Best For
Pinecone	Managed	Production, scale
Qdrant	Self-hosted/Cloud	Hybrid search
Chroma	Embedded	Prototyping, local
pgvector	PostgreSQL ext	Existing Postgres
Weaviate	Self-hosted/Cloud	Multimodal

LangGraph Agent Patterns

ReAct Agent (tool-calling loop)

import { StateGraph, MessagesAnnotation } from "@langchain/langgraph";
import { ToolNode } from "@langchain/langgraph/prebuilt";

const agentNode = async (state: typeof MessagesAnnotation.State) => {
  const response = await model.invoke(state.messages);
  return { messages: [response] };
};

const shouldContinue = (state: typeof MessagesAnnotation.State) => {
  const lastMessage = state.messages[state.messages.length - 1];
  return lastMessage.tool_calls?.length ? "tools" : "__end__";
};

const graph = new StateGraph(MessagesAnnotation)
  .addNode("agent", agentNode)
  .addNode("tools", new ToolNode(tools))
  .addEdge("__start__", "agent")
  .addConditionalEdges("agent", shouldContinue)
  .addEdge("tools", "agent")
  .compile();

Multi-Agent Supervisor Pattern

const supervisorNode = async (state: AgentState) => {
  const response = await supervisorModel.invoke([
    { role: "system", content: "Route to the right specialist agent." },
    ...state.messages,
  ]);
  return { next: response.content }; // "researcher" | "coder" | "reviewer"
};

Prompt Engineering

Structured Output

const schema = z.object({
  sentiment: z.enum(["positive", "negative", "neutral"]),
  confidence: z.number().min(0).max(1),
  reasoning: z.string(),
});

const structuredLlm = model.withStructuredOutput(schema);
const result = await structuredLlm.invoke("Analyze: Great product!");

Chain-of-Thought

You are an expert analyst. Think through this step by step:

1. First, identify the key entities in the text
2. Then, determine the relationships between them
3. Finally, synthesize your findings into a structured answer

Text: {input}

Few-Shot Pattern

const fewShotPrompt = ChatPromptTemplate.fromMessages([
  ["system", "Extract structured data from text."],
  ["human", "John works at Google since 2020"],
  ["ai", '{"name": "John", "company": "Google", "year": 2020}'],
  ["human", "Sarah joined Meta in 2023"],
  ["ai", '{"name": "Sarah", "company": "Meta", "year": 2023}'],
  ["human", "{input}"],
]);

Cost Optimization

Token Reduction Strategies

Shorter prompts: Remove fluff, use terse instructions
Caching: Semantic cache with vector similarity threshold
Model routing: Use cheap model for simple tasks, expensive for complex
Streaming: Stream responses to reduce perceived latency
Batching: Group similar requests for batch API pricing

Semantic Caching

const cache = new SemanticCache({
  embeddings,
  vectorStore,
  similarityThreshold: 0.95,
});

async function cachedInvoke(prompt: string) {
  const cached = await cache.lookup(prompt);
  if (cached) return cached;
  const result = await model.invoke(prompt);
  await cache.store(prompt, result);
  return result;
}

AI Safety Checklist

Validate all user inputs before sending to LLM
Strip PII from prompts when possible
Set max token limits on all LLM calls
Implement rate limiting per user/API key
Add content moderation on LLM outputs
Log all LLM interactions for debugging (redact PII)
Use temperature=0 for deterministic tasks
Implement timeout and retry with exponential backoff
Never expose raw LLM errors to end users