Run any Skill in Manus with one click

$pwd:

llm-cost-optimizer

Name: Llm Cost Optimizer
Author: vellum-ai

// Analyze and reduce LLM spend by mapping call-site overrides to managed profiles (Balanced / Quality / Speed). Covers spend analysis, profile assignment, and config correctness.

Run Skill in Manus

$ git log --oneline --stat

stars:535

forks:76

updated:May 28, 2026 at 21:37

SKILL.md

readonly

related-skills.json

same repository

media-processing.md

from "vellum-ai/vellum-assistant"

Ingest and process media files (video, audio, image)

2026-05-30535

messaging.md

from "vellum-ai/vellum-assistant"

Read, search, send, and manage messages across Gmail, Outlook, Telegram, and other platforms

2026-05-29535

meet-join.md

from "vellum-ai/vellum-assistant"

Join a Google Meet call to take notes; only when the user explicitly asks.

2026-05-29535

assistant-migration.md

from "vellum-ai/vellum-assistant"

Migrate from ChatGPT, Claude, OpenClaw, Hermes, Manus, and other AI assistants into Vellum by inspecting their data exports, conversation archives, files, prompts, custom instructions, memory, saved memories, tools, GPTs, workflows, integrations, and relationships, then mapping as much as safely possible into Vellum primitives. Handles single-source and multi-source migrations with a unified, deduplicated inventory.

2026-05-29535

notifications.md

from "vellum-ai/vellum-assistant"

Send notifications through the unified notification router

2026-05-28535

schedule.md

from "vellum-ai/vellum-assistant"

Recurring and one-shot scheduling - cron, RRULE, or single fire-at time

2026-05-28535

package.json

"author": "vellum-ai"

"repository": "vellum-ai/vellum-assistant"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

# Weekly totals assistant usage totals --range week # Break down by call site (most useful — shows what's expensive) assistant usage breakdown --group-by call_site --range week # Break down by model assistant usage breakdown --group-by model --range week # Break down by profile assistant usage breakdown --group-by inference_profile --range week

Profile

Call Sites

balanced (Sonnet)

mainAgent, subagentSpawn, compactionAgent, analyzeConversation, patternScan, narrativeRefinement, memoryConsolidation, recall, callAgent, emptyStateGreeting, conversationStarters, identityIntro, proactiveArtifactBuild

cost-optimized (Haiku)

Everything else — memoryRouter (with 1M context override), memory extraction/retrieval, UI copy, classifiers, summarization, background tasks

quality-optimized (Opus)

Do not pin. Reserved for on-demand user escalation via /model

assistant config set llm.callSites '{ "mainAgent": {"profile":"balanced"}, "subagentSpawn": {"profile":"balanced"}, "compactionAgent": {"profile":"balanced"}, "analyzeConversation": {"profile":"balanced"}, "patternScan": {"profile":"balanced"}, "narrativeRefinement": {"profile":"balanced"}, "memoryRouter": {"profile":"cost-optimized","contextWindow":{"maxInputTokens":1000000}}, "heartbeatAgent": {"profile":"cost-optimized","maxTokens":2048,"effort":"low","temperature":0,"thinking":{"enabled":false,"streamThinking":false},"contextWindow":{"maxInputTokens":16000}}, "filingAgent": {"profile":"cost-optimized"}, "callAgent": {"profile":"balanced"}, "proactiveArtifactDecision":{"profile":"cost-optimized"}, "proactiveArtifactBuild": {"profile":"balanced"}, "memoryExtraction": {"profile":"cost-optimized"}, "memoryConsolidation": {"profile":"balanced"}, "memoryRetrieval": {"profile":"cost-optimized"}, "memoryRetrospective": {"profile":"cost-optimized"}, "recall": {"profile":"balanced","maxTokens":4096,"effort":"low","thinking":{"enabled":false,"streamThinking":false},"temperature":0}, "memoryV2Migration": {"profile":"cost-optimized"}, "memoryV2Sweep": {"profile":"cost-optimized"}, "memoryV2Consolidation": {"profile":"balanced"}, "conversationSummarization":{"profile":"cost-optimized"}, "commitMessage": {"profile":"cost-optimized","maxTokens":120,"temperature":0.2,"effort":"low","thinking":{"enabled":false}}, "conversationStarters": {"profile":"balanced","effort":"low","thinking":{"enabled":false}}, "replySuggestion": {"profile":"cost-optimized","effort":"low","thinking":{"enabled":false}}, "conversationTitle": {"profile":"cost-optimized"}, "identityIntro": {"profile":"balanced"}, "emptyStateGreeting": {"profile":"balanced"}, "guardianQuestionCopy": {"profile":"cost-optimized","effort":"low","thinking":{"enabled":false}}, "approvalCopy": {"profile":"cost-optimized"}, "approvalConversation": {"profile":"cost-optimized"}, "trustRuleSuggestion": {"profile":"cost-optimized"}, "notificationDecision": {"profile":"cost-optimized","effort":"low","thinking":{"enabled":false}}, "preferenceExtraction": {"profile":"cost-optimized","effort":"low","thinking":{"enabled":false}}, "interactionClassifier": {"profile":"cost-optimized","effort":"low","thinking":{"enabled":false}}, "styleAnalyzer": {"profile":"cost-optimized"}, "inviteInstructionGenerator":{"profile":"cost-optimized","effort":"low","thinking":{"enabled":false}}, "skillCategoryInference": {"profile":"cost-optimized","effort":"low","thinking":{"enabled":false}}, "meetConsentMonitor": {"profile":"cost-optimized"}, "meetChatOpportunity": {"profile":"cost-optimized"}, "inference": {"profile":"cost-optimized"} }'

# Collect the key securely — never paste it in chat credential_store prompt --service anthropic --field api_key \ --label "Anthropic API Key" --placeholder "sk-ant-..." assistant inference providers connections create my-anthropic-key \ --provider anthropic \ --auth api_key \ --credential credential/anthropic/api_key assistant config set llm.profiles.opus-personal '{"provider":"anthropic","model":"claude-opus-4-8","label":"Opus (Personal)","provider_connection":"my-anthropic-key"}'

assistant inference providers connections list assistant inference providers connections get <name> assistant inference providers connections create <name> --provider <p> --auth api_key --credential <vault-key> assistant inference providers connections update <name> --auth platform assistant inference providers connections delete <name>

Profile

Call Sites

balanced (Sonnet)

cost-optimized (Haiku)

Everything else — memoryRouter (with 1M context override), memory extraction/retrieval, UI copy, classifiers, summarization, background tasks

quality-optimized (Opus)

Do not pin. Reserved for on-demand user escalation via /model

llm-cost-optimizer

Overview

🚨 Critical: unoverridden call sites fall back to `llm.default`

Step 1 — Understand current spend

Step 2 — Read current overrides

Step 3 — Recommended profile assignment

Step 4 — Config gotchas

⚠️ JSON object value replaces the entire block

⚠️ Always use profile references — never direct model

Profile + tuning fields can coexist

Step 5 — Apply the complete turnkey blob

Step 6 — Escalation path (on-demand Opus)

Step 7 — Verify and monitor

Reference: provider connections

Reference: usage breakdown group-by values

Reference: usage time ranges

Overview

🚨 Critical: unoverridden call sites fall back to `llm.default`

Step 1 — Understand current spend

Step 2 — Read current overrides

Step 3 — Recommended profile assignment

Step 4 — Config gotchas

⚠️ JSON object value replaces the entire block

⚠️ Always use profile references — never direct model

Profile + tuning fields can coexist

Step 5 — Apply the complete turnkey blob

Step 6 — Escalation path (on-demand Opus)

Step 7 — Verify and monitor

Reference: provider connections

Reference: usage breakdown group-by values

Reference: usage time ranges

name	llm-cost-optimizer
description	Analyze and reduce LLM spend by mapping call-site overrides to managed profiles (Balanced / Quality / Speed). Covers spend analysis, profile assignment, and config correctness.
metadata	{"emoji":"💸","vellum":{"display-name":"LLM Cost Optimizer"}}

llm-cost-optimizer

More from this repository

More from this repository

Overview

🚨 Critical: unoverridden call sites fall back to llm.default

Step 1 — Understand current spend

Step 2 — Read current overrides

Step 3 — Recommended profile assignment

Step 4 — Config gotchas

⚠️ JSON object value replaces the entire block

⚠️ Always use profile references — never direct model

Profile + tuning fields can coexist

Step 5 — Apply the complete turnkey blob

Step 6 — Escalation path (on-demand Opus)

Step 7 — Verify and monitor

Reference: provider connections

Reference: usage breakdown group-by values

Reference: usage time ranges

Overview

🚨 Critical: unoverridden call sites fall back to llm.default

Step 1 — Understand current spend

Step 2 — Read current overrides

Step 3 — Recommended profile assignment

Step 4 — Config gotchas

⚠️ JSON object value replaces the entire block

⚠️ Always use profile references — never direct model

Profile + tuning fields can coexist

Step 5 — Apply the complete turnkey blob

Step 6 — Escalation path (on-demand Opus)

Step 7 — Verify and monitor

Reference: provider connections

Reference: usage breakdown group-by values

Reference: usage time ranges

🚨 Critical: unoverridden call sites fall back to `llm.default`

🚨 Critical: unoverridden call sites fall back to `llm.default`