| name | ai-ml |
| description | AI/ML integrations including multi-provider LLM routing, embeddings, RAG pipeline, autonomous agents, Langfuse prompt management, image generation, and observability. Use when working on LLM provider config, adding new AI models, modifying the RAG/embedding pipeline, building agents, changing prompt templates, or integrating image generation providers. Do NOT use for voice-specific features (use voice skill) or frontend UI (use ui-components skill). |
AI/ML & LLM System
Multi-provider AI system using Vercel AI SDK with specialized agents, RAG pipeline, local prompt management (with Langfuse observability), and tool-calling agentic loops.
Provider Architecture
Three LLM providers via Vercel AI SDK adapters (@ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google). Provider is auto-detected from model name in server/agents/runtime.ts:
if (modelName.startsWith('claude-')) provider = 'anthropic'
else if (modelName.startsWith('gemini-')) provider = 'google'
else provider = 'openai'
Each provider is instantiated with createOpenAI(), createAnthropic(), or createGoogleGenerativeAI() using env-based API keys. OpenAI supports optional OPENAI_API_BASE for proxy routing (localhost is resolved to 127.0.0.1 for IPv6 safety).
Model Inventory
Models are declared in prompts/*.md YAML frontmatter. Current assignments:
| Model | Provider | Used By |
|---|
claude-opus-4-6 | anthropic | spark-system (main chat), flow-context |
gpt-5.2 | openai | system-prompt-component (system prompt generation) |
gpt-5.1 | openai | trainer-system, guided-curator-system, idea-context |
gpt-5-mini | openai | campaign-brief-generator |
gpt-5-nano | openai | pattern-analyzer-system, content-curator-system |
gpt-4.1-mini | openai | flow-moderator-system, user-output-evaluator, spark-profile-generator, profile-image-generator-system, tool-orchestrator, voice-transcription, document-analysis-system, flow-name-generator, quick-content-check-system, voice personality analysis (voice/analyze-personality.ts) |
gpt-4o-mini | openai | Image analysis (image-analysis.ts), greetings, edit summaries, document processing, LLM-as-judge evaluation, spark description generation (data-collection/spark-generation/description.ts) |
gpt-4o | openai | Document analysis in tools/file.ts |
text-embedding-3-large | openai | All embeddings (1536 dimensions) |
Reasoning Model Handling
server/utils/ai-helpers.ts provides isReasoningModel() and buildAIOptions():
runtime.ts additionally strips these for Google/Gemini models (which do not support frequency/presence penalties). For creative-type sparks only, topP and topK from config are preserved; they are stripped for expert and user spark types.
Agent Runtime (server/agents/runtime.ts)
Central orchestrator for all agentic AI calls. Two main functions:
runAgentCompletion(opts) -- Streaming agentic loop
- Loads prompt config from local
prompts/*.md via getPromptConfig(promptKey)
- Normalizes snake_case config to camelCase (
normalizeConfig)
- Resolves provider and model, creates provider client
- Estimates input tokens via
estimateInputTokens() and warns at 80%/90% context utilization; dynamically reduces maxSteps when context is tight
- Calls
streamText() with tools, system prompt, messages, and config from Langfuse
- Processes
fullStream events: text-delta, tool-call, tool-result
- Buffers text chunks (50ms / 10-char threshold) for efficient SSE delivery
- Handles empty-response recovery: if AI used preparatory tools (RAG, web search, image gen) but returned no text, it triggers a follow-up
streamText() call to force a conversational response
- Programmatic tool chaining: after
CREATE_FLOW_IDEA, auto-calls GENERATE_IMAGE to visualize new ideas
- Uses
withRetry() from agent-errors.ts for transient error recovery with exponential backoff
- Per-tool timeouts via
getToolTimeout() (e.g., GENERATE_IMAGE: 120s, DOCUMENT_PROCESSING: 90s)
- Records generation observation to Langfuse with token counts, cost, and
inputTokenEstimate
- Records error events via
recordErrorEvent() for stream errors, tool timeouts, empty responses
runModeratorDecision(opts) -- Structured output
Uses generateObject() with ModeratorDecisionSchema (Zod) to produce { decision: 'CONTINUE'|'STOP', nextSparkId, reasoning }.
Streaming Patterns
All chat completions flow through runAgentCompletion which uses streamText() internally. The onTextDelta callback SSE-streams chunks to the client. Key files:
server/agents/runtime.ts -- core streaming via streamText()
server/utils/voice/orchestrator.ts -- voice mode: STT -> streamText() -> TTS
server/api/spark/[id]/greeting.get.ts -- standalone streamText() for spark greetings
server/api/flows/[id]/ideas/[ideaId]/messages/index.post.ts -- idea-specific streaming
Structured Output (generateObject)
Used for AI calls requiring typed JSON responses:
| Location | Schema | Purpose |
|---|
agents/pattern-analyzer.ts | { patterns: [{ aspect, subAspect, spark }] } | Classify content into aspect/sub-aspect patterns |
agents/content-filter.ts | { isRelevant, score, reason, extractedContent } | Filter content relevance with LLM |
agents/runtime.ts | ModeratorDecisionSchema | Flow moderator turn decisions |
utils/observability/evaluate.ts | EvaluationSchema | LLM-as-judge scoring (relevance, helpfulness, persona, clarity) |
utils/spark-profile.ts | SparkProfileSchema / KeywordsOnlySchema | Generate spark profile (name, discipline, keywords) or keywords-only for metadata mode. SparkTypeSchema exists but is deprecated (always returns 'expert'). |
utils/voice/analyze-personality.ts | voicePersonalitySchema | Determine voice archetype, gender, speed, and cloneability for a spark persona |
api/flows/[id]/generate-campaign-brief.post.ts | Campaign brief schema | Generate campaign briefs |
All use Zod schemas. Pattern: const { object } = await generateObject({ model, schema, system, prompt }).
Note: AI SDK v5+ marks generateObject as deprecated in favor of generateText with Output.object(). The codebase still uses generateObject -- migration is recommended for new code.
Tool System
Tools are defined in server/utils/tools/ using the AI SDK tool() function with Zod input schemas:
| Tool | File | Description |
|---|
GET_SPARK_RAG | tools/rag.ts | Semantic search over spark's knowledge base (pgvector) |
ADD_SPARK_TRAINING_CONTENT | tools/rag.ts | Add content to spark's knowledge base with embedding |
WEB_SEARCH | tools/web.ts | Web search via Tavily API with link analysis |
ANALYZE_LINK | tools/web.ts | Fetch and analyze a specific URL |
CREATE_FLOW_IDEA | tools/idea.ts | Create ideas in flow boards |
GENERATE_IMAGE | tools/image.ts | AI image generation (Replicate) |
DISPLAY_IMAGE | tools/image.ts | Display images from portfolio |
DATABASE_QUERY | tools/db.ts | Query spark's database records |
QUERY_CHAT_SESSIONS | tools/chat.ts | Search past chat history |
LIST_PORTFOLIO_ITEMS | tools/portfolio.ts | List portfolio items |
GET_PORTFOLIO_ITEM_DETAILS | tools/portfolio.ts | Get portfolio item details |
FINALIZE_FILE_UPLOAD | tools/portfolio.ts | Complete file upload process |
DOCUMENT_PROCESSING | tools/file.ts | Process/analyze uploaded documents |
| GET_IMAGE_DETAILS | tools/image.ts | Get details of a generated image |
| EDIT_IDEA | tools/idea.ts | Edit an existing idea |
| GET_IDEA_COMMENTS | tools/idea.ts | Get comments and edit summaries for an idea |
| ADD_IDEA_IMAGES | tools/idea.ts | Add images to an idea (array handling, deduplication) |
| REMOVE_IDEA_IMAGES | tools/idea.ts | Remove images from an idea (exact URL match, cover fallback) |
| SET_IDEA_COVER | tools/idea.ts | Set idea cover image (cosmetic, no edit history) |
| ENDORSE_IDEA | tools/idea.ts | Endorse an idea (returns endorsedBy with name mappings) |
| ADD_PORTFOLIO_ITEM | tools/portfolio.ts | Add portfolio item (accepts sourceUrl, broad criteria) |
Tools are assembled per-endpoint and passed to runAgentCompletion({ tools }). The runtime handles tool-call and tool-result stream events with per-tool configurable timeouts (default 30s, up to 120s for image generation).
Prompt Management
Source of truth: Local prompts/*.md files with YAML frontmatter. Langfuse is synced for observability only.
Prompt file structure (prompts/*.md)
---
name: spark-system
config:
model: claude-sonnet-4-20250514
provider: anthropic
temperature: 0.85
max_tokens: 3000
maxSteps: 20
labels:
- production
- development
---
You are {{spark_name}}.
{{
Template syntax (compilePromptTemplate in langfuse-prompts.ts)
{{variable}} -- simple substitution
{{#section}}...{{/section}} -- conditional (truthy) / array iteration
{{^section}}...{{/section}} -- inverse conditional (falsy/empty)
{{.}} -- current item in array loop
{{var|default(fallback)}} -- default values
- Jinja-style
{% if %}, {% for %} -- legacy support for moderator prompts
Caching
- 5-minute TTL cache (
MAX_CACHE_SIZE=100) in production
- Cache disabled in development (
NODE_ENV=development)
- LRU eviction when cache is full
- Cache key: prompt name + hash of stable variables (isGreetingMode, sparkType, mode, etc.)
Exported prompt functions
Each prompt type has a dedicated getter in langfuse-prompts.ts:
getSparkPromptWithMeta(vars) -- spark-system
getFlowPromptWithMeta(vars) -- flow-context
getFlowModeratorPromptWithMeta(vars) -- flow-moderator-system
getTrainerSystemPromptFromLangfuse(vars) -- trainer-system
getIdeaPromptWithMeta(vars) -- idea-context
getGuidedCuratorPromptWithMeta(vars) -- guided-curator-system
getSparkProfilePromptWithMeta(vars) -- spark-profile-generator
getProfileImageGeneratorPromptWithMeta(vars) -- profile-image-generator-system
getEvaluatorPromptFromLangfuse(vars) -- user-output-evaluator
getPromptConfig(promptName) -- generic: returns raw prompt + config + compile function
Agent System
All agents in server/agents/:
| Agent | Prompt | Function | Description |
|---|
spark.ts | spark-system | getSparkSystemPrompt() | Compiles the full system prompt for a spark. Injects temporal context (date, timezone, knowledge cutoff), spark type booleans, and user language. Records prompt observation to Langfuse. |
flow.ts | flow-context, flow-moderator-system | getFlowContextPrompt(), getFlowModeratorPrompt() | Generates flow-scoped context: parses guided context (description/task/method), builds context document summaries, formats existing idea summaries. Moderator prompt takes candidate speaker IDs. |
trainer.ts | trainer-system | Uses generateText with tools | Training agent with tool access (DB query, chat sessions, web search, portfolio, RAG, training content). Uses gpt-tokenizer for token counting. |
idea.ts | idea-context | getIdeaContextPrompt() | Generates idea refinement context: idea metadata, comment history, original creator, room task. Includes inline fallback if prompt file missing. |
evaluator.ts | user-output-evaluator | evaluateReliability() | Scores response reliability 0-100 against spark persona. Reads model from prompt config. Returns 70 default for short system prompts, 0 for empty responses. |
pattern-analyzer.ts | pattern-analyzer-system | analyzeContentForPatterns() | Multi-provider generateObject with Zod schema. Classifies content into aspect/sub-aspect patterns. Handles long content via intelligent chunking (1500 chars, 300 overlap). Validates patterns against predefined framework lists. |
guidedCurator.ts | guided-curator-system | generateSparkSuggestions() | Generates spark team suggestions for guided flows. Uses runAgentCompletion(). Parses JSON response, assigns pre-generated portrait images by type/gender/age with uniqueness tracking. |
content-filter.ts | content-curator-system, quick-content-check-system | filterContentWithLLM(), quickRelevanceCheck() | LLM-based content relevance filtering. Multi-provider support. Returns relevance score (0-100) with extracted content. Batch filtering in groups of 5 with 1s delay. |
runtime.ts | Any prompt via promptKey | runAgentCompletion(), runModeratorDecision() | Central runtime. See "Agent Runtime" section above. |
Embedding System
Model: text-embedding-3-large at 1536 dimensions (OpenAI native client, not AI SDK).
Chunking: 1000 chars with 200-char overlap. Word-boundary-aware splitting (breaks at last space if >80% through chunk). Chunks <10 chars are filtered out.
Deduplication: SHA-256 content hash per chunk. Skips insertion if (sparkId, contentHash) already exists.
Storage: spark_embeddings table in PostgreSQL with pgvector extension. Raw SQL insert with ::vector cast.
Two creation paths:
server/utils/embeddings.ts -- createDirectEmbeddingsFromText() for direct/manual embedding
server/utils/data-collection/embeddings/generator.ts -- createCollectorEmbeddingsFromText() for data collection pipeline (adds ON CONFLICT DO NOTHING)
RAG Pipeline
server/utils/tools/rag.ts provides the complete RAG flow:
- Query embedding: User query is embedded with
text-embedding-3-large (1536d)
- Vector search: Raw SQL cosine similarity query on
spark_embeddings via pgvector: 1 - (embedding <=> query_vector::vector) ordered by similarity DESC
- Context formatting: Chunks formatted as
[#1] content\n\n[#2] content...
- Citation injection: Tool response includes citation format instructions:
[[RAG:START]]fact[[RAG:END]]
- Access control: Checks spark ownership, flow membership, or public status
The GET_SPARK_RAG tool is available during chat. The addSparkTrainingContentTool allows adding new knowledge during training conversations.
Data Collection Pipeline
server/utils/data-collection/orchestrator.ts runs a multi-phase pipeline:
Phase 1 (Demo/Quick): ~0-95% progress
- Web search via Tavily (up to 300 sources/entity, 5 queries, 10 results each)
- YouTube transcripts via Apify (primary) / Supadata (fallback), 3 per entity. URL discovery via DuckDuckGo (primary) with Tavily fallback.
- Content processing with pattern analysis (LLM-based in full mode, heuristic in demo mode)
- System prompt generation (
generateSystemPromptWithComponents)
- Spark description generation (runs in parallel with prompt gen)
Phase 2 (Full Analysis): 96-100%
- Collects additional sources at full limits
- Regenerates system prompt and description with enriched context
Config (data-collection/config.ts): COLLECTION_TIMEOUT_MS: 1800000 (30min), MAX_ENTITIES: 15, CHUNK_SIZE: 1000, CHUNK_OVERLAP: 200, QUALITY_THRESHOLD: 0.7.
Sub-modules: search/ (Tavily/YouTube), filtering/ (LLM content filter), content/ (extraction), patterns/ (analysis), embeddings/ (generator), spark-generation/ (system prompt, description, profile image).
Image Generation
server/utils/image-generation.ts uses Replicate API (not BFL directly):
- With face/image references ->
google/nano-banana-pro (supports up to 14 reference images, face identity preservation with enhanced prompting)
- Text-only ->
black-forest-labs/flux-kontext-pro (fast, high quality)
- Fallback: If face refs fail, retries Nano Banana without references
Flow: Create prediction -> Poll every 2s (120s timeout) -> Return image URL. Includes retry logic that removes problematic image URLs on 403/timeout errors.
Image Analysis
server/utils/image-analysis.ts uses gpt-4o-mini via AI SDK generateText() with vision:
messages: [{ role: 'user', content: [
{ type: 'text', text: promptText },
{ type: 'image', image: imageUrl }
]}]
Used for portfolio image analysis during data collection. Max 500 tokens.
Observability (Langfuse)
Initialization (server/utils/observability/langfuse.ts)
- OpenTelemetry SDK with
LangfuseSpanProcessor
- Environment detection: staging/production/development from
SITE_URL
LangfuseClient for scores and API operations
Recording (server/utils/observability/record.ts)
recordGenerationObservation(): Records model, I/O, token counts (via gpt-tokenizer), and cost calculation
recordToolObservation(): Records tool calls as child spans
recordErrorEvent(): Structured error-level events for runtime pipeline issues (stream_error, tool_timeout, empty_response, retry_attempt, recovery_failed)
recordDemoEvent(): Standalone traces for demo funnel tracking (demo.spark.created, demo.conversation.start/turn, demo.signup.clicked, demo.share.clicked, demo.bounce)
- Cost calculation uses
MODEL_PRICING lookup table (per 1M tokens) with prefix matching. Includes Feb 2026 pricing for claude-sonnet-4, claude-3-5-sonnet, o3-mini, o4-mini
Evaluation (server/utils/observability/evaluate.ts)
evaluateTrace(): LLM-as-judge using gpt-4o-mini with EvaluationSchema (relevance, helpfulness, persona consistency, clarity, overall)
queueEvaluation(): Fire-and-forget background evaluation via setImmediate
Adding tracing to new endpoints
import { startActiveObservation } from '@langfuse/tracing'
await startActiveObservation('prompt:my-prompt', async (obs) => {
obs.update({ input: vars, output: text, metadata: { promptName, promptLabel } })
})
Environment Variables
| Variable | Required | Purpose |
|---|
OPENAI_API_KEY | Yes | OpenAI API (LLM + embeddings) |
OPENAI_ORG_ID | No | OpenAI organization ID |
OPENAI_API_BASE | No | Custom OpenAI base URL (proxy) |
ANTHROPIC_API_KEY | No | Anthropic Claude models |
GOOGLE_API_KEY | No | Google Gemini models |
REPLICATE_API_TOKEN | No | Image generation (Replicate) |
LANGFUSE_PUBLIC_KEY | No | Langfuse observability |
LANGFUSE_SECRET_KEY | No | Langfuse observability |
LANGFUSE_HOST | No | Langfuse host (default: cloud.langfuse.com) |
TAVILY_API_KEY | No | Web search in data collection |
SERPER_API_KEY | No | Google Image search |
SUPADATA_API_KEY | No | YouTube transcripts |
APIFY_API_TOKEN | No | Social media profile scraping |
BFL_API_KEY | No | BFL API (legacy, now uses Replicate) |
DEBUG_AGENT_RUNTIME | No | Enable verbose runtime logging |
DISABLE_PROMPT_CACHE | No | Force-disable prompt caching |
DEBUG_LANGFUSE | No | Log suppressed Langfuse errors |
Safety & Error Handling
Input Sanitization (server/utils/safety-filters.ts)
sanitizeInput(text): Detects prompt injection patterns, returns SanitizeResult with risk level
validateOutput(text): Redacts tool names and detects leakage patterns in AI output
Error Classification (server/utils/agent-errors.ts)
classifyError(error): Categorizes errors into transient (retryable), permanent, or user_facing
withRetry(fn, opts): Exponential backoff retry wrapper for transient errors
getToolTimeout(toolName): Per-tool timeout configuration (e.g., GENERATE_IMAGE: 120s, DOCUMENT_PROCESSING: 90s, default: 30s)
Context Management
server/utils/context-compression.ts: Intelligent conversation history compression
scoreMessage(): Priority-based importance scoring for messages
compressConversationHistory(): Compression with message promotion and heuristic summaries
compressArtifactList(): Truncates artifact descriptions for prompt context
buildOptimizedMessages(): Injects compressed summaries into message array
Conversation Memory (server/utils/conversation-memory.ts)
shouldRegenerateSummary(): Determines when new summaries are needed (15+ messages, then every 10)
generateConversationSummary(): LLM-based rolling summary with heuristic fallback (uses conversation-summarizer.md prompt)
persistConversationSummary(): Stores summaries as system FlowMessages with metadata
Common Pitfalls
- Reasoning model parameter errors: Never pass
temperature, frequencyPenalty, presencePenalty, or topP to o1/o3/gpt-5 models. Always use isReasoningModel() check or buildAIOptions().
- Embedding dimension mismatch: Always use
dimensions: 1536 with text-embedding-3-large. The pgvector column and index are configured for 1536.
- Provider parameter mismatches: Gemini does not support
frequencyPenalty/presencePenalty -- runtime.ts strips these automatically. Do not add them back.
- Prompt config missing model: If
getPromptConfig() returns no model, runtime.ts throws. Always ensure config.model is set in the prompt YAML frontmatter.
- Creative-only params:
topP and topK are stripped for non-creative sparks in runtime.ts. If you need them for a new agent type, update the heuristic.
- Empty response recovery:
runtime.ts handles empty AI responses with follow-up prompts. If adding new preparatory tools, add the tool name to the preparatoryTools array.
generateObject deprecation: AI SDK v5+ recommends generateText with Output.object(). Existing code uses generateObject -- prefer Output.object() for new code.
Common Tasks
Add a new AI-powered endpoint
- Create route in
server/api/
- Create prompt file in
prompts/my-prompt.md with YAML frontmatter (model, provider, temperature)
- Load config:
const cfg = await getPromptConfig('my-prompt')
- Use
runAgentCompletion() for tool-enabled streaming, or generateText()/generateObject() for simple calls
- Add tracing with
startActiveObservation()
Add a new agent
- Create
server/agents/my-agent.ts
- Create
prompts/my-agent-system.md with config
- Add variable interface in
langfuse-prompts.ts
- Add getter function (e.g.,
getMyAgentPromptWithMeta())
- Use
runAgentCompletion() from runtime.ts for execution
- Record observations with
startActiveObservation()
Change a model for an existing agent
- Edit the
config.model field in the corresponding prompts/*.md file
- Set
config.provider if changing providers (openai/anthropic/google)
- Verify reasoning model compatibility (no temperature params for o1/o3/gpt-5)
Related Files
server/utils/ai-helpers.ts -- isReasoningModel(), buildAIOptions()
server/utils/embeddings.ts -- createDirectEmbeddingsFromText() (also has private chunkText() helper)
server/utils/langfuse-prompts.ts -- All prompt loading, caching, template compilation
server/agents/runtime.ts -- runAgentCompletion(), runModeratorDecision()
server/agents/spark.ts -- getSparkSystemPrompt()
server/agents/flow.ts -- getFlowContextPrompt(), getFlowModeratorPrompt()
server/agents/trainer.ts -- Training agent with tool access
server/agents/idea.ts -- getIdeaContextPrompt()
server/agents/evaluator.ts -- evaluateReliability()
server/agents/pattern-analyzer.ts -- analyzeContentForPatterns()
server/agents/guidedCurator.ts -- generateSparkSuggestions()
server/agents/content-filter.ts -- filterContentWithLLM(), quickRelevanceCheck()
server/utils/tools/ -- All tool definitions (rag, web, image, file, portfolio, db, chat, spark, user)
server/utils/data-collection/ -- Full data collection pipeline
server/utils/image-generation.ts -- Replicate image generation (Nano Banana Pro, FLUX Kontext Pro)
server/utils/image-analysis.ts -- Vision model image analysis (gpt-4o-mini)
server/utils/spark-profile.ts -- Spark profile generation with structured output
server/utils/observability/langfuse.ts -- OpenTelemetry + Langfuse initialization
server/utils/observability/record.ts -- Generation/tool observation recording, cost calculation
server/utils/observability/evaluate.ts -- LLM-as-judge evaluation
server/utils/voice/orchestrator.ts -- Voice mode: STT -> streamText -> TTS pipeline
server/utils/flow-conversation.ts -- Flow conversation types and text processing helpers
server/utils/safety-filters.ts -- sanitizeInput(), validateOutput() for prompt injection and leakage detection
server/utils/agent-errors.ts -- classifyError(), withRetry(), getToolTimeout() error handling
server/utils/context-compression.ts -- Conversation history compression with priority scoring
server/utils/conversation-memory.ts -- LLM-based rolling summaries with DB persistence
prompts/*.md -- All prompt templates with YAML frontmatter configs (includes conversation-summarizer.md for rolling summaries)