with one click
inbox-cleanup
Run a high-recall, multi-pass email inbox cleanup. Pattern-based subject queries catch 25x more archivable email than sender scans alone. Includes urgency triage, classification signals, and post-cleanup filter setup.
Menu
Run a high-recall, multi-pass email inbox cleanup. Pattern-based subject queries catch 25x more archivable email than sender scans alone. Includes urgency triage, classification signals, and post-cleanup filter setup.
Spawn external coding agents via the Agent Client Protocol (ACP)
Manually test a running Vellum assistant end-to-end purely from the CLI — no desktop app or web UI. Hatch an instance, send messages, watch the reply, and tear it down. Use when verifying assistant behavior, reproducing a bug, or smoke-testing a change without the macOS/web clients.
Build and edit small, personal visual tools and artifacts — dashboards, trackers, calculators, data visualizations, charts, simple landing pages, and slide decks the user wants for THEMSELVES. This is the right skill whenever the user asks to "visualize this," "make a chart," or "build an artifact" for their own use, or to edit an app they already built here. Do NOT reach for a ui_show dynamic_page to fake an artifact — build a real persistent app here. NOT for complex, multi-user, or shippable products — those go to a real project folder with a coding agent (see Scope below).
Use whenever the user wants to write or draft an article, blog post, essay, report, or any long-form content. Creates the content in a rich text editor instead of dumping it in chat, so it can be streamed, reviewed, edited, and exported.
Design-quality layer for frontend code — typography, color, motion, spatial composition, and avoiding generic "AI slop" aesthetics. Use this as a companion when you are already writing or styling frontend code and want it to look distinctive and production-grade. For building an app, dashboard, tracker, calculator, visualization, landing page, or slide deck the user wants inside the assistant, load `app-builder` instead — it loads this skill itself for the design pass.
Manage Gmail email — drafting, sending, organizing, filters, vacation replies, and inbox analysis
| name | inbox-cleanup |
| description | Run a high-recall, multi-pass email inbox cleanup. Pattern-based subject queries catch 25x more archivable email than sender scans alone. Includes urgency triage, classification signals, and post-cleanup filter setup. |
| compatibility | Designed for Vellum personal assistants |
| metadata | {"icon":"assets/icon.svg","emoji":"📭","vellum":{"category":"email","display-name":"Inbox Cleanup","includes":["gmail"],"activation-hints":["When the user asks to clean up, organize, or triage their email inbox","When the user wants to archive old or unwanted emails in bulk","When the user asks to set up email filters to prevent inbox clutter"],"avoid-when":["When the user wants to read, send, or draft a specific email","When the user is setting up email OAuth or connecting a new provider"]}} |
A playbook for large-scale email inbox cleanup. The core insight: sender-based scans are low-recall. Subject/body pattern queries catch 25x more archivable email. This skill is a multi-pass pipeline built around that insight.
Works with any connected email provider. Adapt query syntax to whatever the provider supports — the strategy (what to search for, how to decide what to archive) is universal.
Gmail is a required integration. It's declared via
includes: ["gmail"]in the frontmatter so it loads synchronously on activation, not lazily after the preferences form. Load/confirm the Gmail integration the moment this skill activates — before Phase 1 — so a missing or unauthorized connection surfaces up front rather than mid-cleanup.
Do this before touching anything. Ask the user:
1. Aggressiveness level
2. Age threshold Archive everything older than X days? Common choices: 30 / 60 / 90 days. Or no age filter.
First-run scope: On first invocation, scope to last 30 days or top 3 noise patterns, whichever surfaces faster. Show result, offer to expand. Prove the approach on a fast, visible slice before draining the whole backlog.
3. VIP senders to protect Ask: "Are there any senders that might look like cold outreach but you actually care about? Think: specific individuals at investors, advisors, your lawyer, accountant, recruiters you're actively working with."
Build an explicit keep list. Do not archive anything matching it, ever, regardless of aggressiveness.
4. Categories to confirm before archiving These need a sample + explicit approval before bulk action:
Scan the inbox first for high-stakes items that should be surfaced, not archived. Look for:
| Signal | Why it matters |
|---|---|
| "past due", "overdue", "final notice", "balance due" | Outstanding invoice — financial consequence |
| "will be suspended", "account suspension", "service interruption" | Service shutoff — operational consequence |
| "collections", "case #", "recovery" in sender domain | Collections agency — credit/legal consequence |
| "signature required", "agreement", "DocuSign pending" | Legal action needed |
| Government TLDs (.gov), "IRS", "state of", "department of" | Regulatory — can't be skipped |
Surface these to the user before running the cleanup. They're easy to miss buried in a big inbox.
Run these passes in order. Each pass should paginate to exhaustion (keep fetching while more results exist). After each pass, show the user a count + 5 sample subjects before archiving anything.
Search for all inbox messages older than the user's age threshold (e.g. 30 days). Typically 50–80% of the archivable backlog. Always show a sample before bulk archiving.
Note on result caps: Some providers cap query results (e.g. ~5,000). If a query returns exactly at the cap, archive that batch and re-run the same query — the next batch will surface. Repeat until it returns fewer than the cap.
Ask the user for their first name and company name, then search for subject lines containing patterns like:
[FirstName] -, [FirstName],, for [FirstName], hi [FirstName], hey [FirstName], [FirstName] |[CompanyName] -, [CompanyName]?, for [CompanyName], re: [CompanyName], [CompanyName] AIThese are the highest-recall patterns for cold outreach and partnership spam. A startup founder's inbox will see the biggest wins here.
Search for subject lines containing:
Search for:
Search for subject lines containing:
Calendar response confirmations are pure noise. Safe to bulk archive without review.
Search for subject lines containing:
Cross-check against urgency triage first — filter out any "past due" or "final notice" items before archiving this batch.
Search for messages from sender domains ending in .shop, .biz, .xyz, .info, .club, .online.
Disproportionately spam. Safe to bulk archive.
After the above passes, run a sender frequency count on what remains. Any sender with 3+ emails not on the keep list is a candidate for bulk archive. Show grouped list to user for approval.
For emails not caught by pattern queries, use LLM-based classification in Standard/Aggressive mode. Flag as cold outreach if 3+ signals are present:
apollo.io, outreach.io, lemlist.com, instantly.ai, salesloft.comRe: prefix, no quoted text from user in body)Every bulk archive previews before it executes — regardless of batch size or trust stage. Run the pipeline with --dry-run on all archive calls, then render a ui_show table preview the user commits or refines from. Never archive in bulk straight from a query.
The preview table must show:
Surface "things worth flagging before you confirm" inside the preview, not after. If the dry-run catches claim documents, failed-payment notices, or any urgency-triage signal (Phase 2), call them out in the preview so the user sees them while deciding — never let a flag-worthy item get archived first and surfaced afterward.
After rendering the preview:
bun run scripts/gmail-commit.ts commit --run-id "<run-id>"bun run scripts/gmail-commit.ts cancel --run-id "<run-id>"Larger batches (e.g. >1,000 operations) and lower trust stages (stage 0 flag-only) warrant extra scrutiny in the preview, but the preview itself is always required before any bulk archive — including small batches and high trust stages. Direct archives are still logged for audit/reversal.
Archive operations are logged to an operation log for resumability. If a pass fails mid-run (rate limit, daily quota, OAuth expiry, crash):
bun run scripts/gmail-runs.ts list. If a recent run shows status: "interrupted", offer to resume it.bun run scripts/gmail-archive.ts archive --resume "<run-id>". This skips already-committed chunks and retries pending ones.interrupted log entry with a resume hint. Do not retry until after midnight PT — offer to resume the run later.All archive outputs now include a run_id. Pass --run-id to group multiple passes under one run, and --phase to label the pipeline phase (e.g. --phase "noise_archive").
run_id for each passbun run scripts/gmail-reverse.ts --run-id <id> --thread <message-id>"After cleanup, propose Gmail filters so the same categories don't re-accumulate. This bridges cleanup (drain backlog once) and inbox-management (keep inbox clean on schedule).
Note: Filter creation capabilities vary by provider. The
gmail-auto-filters.tsscript handles Gmail. If the provider doesn't support programmatic filter creation, give the user manual instructions instead.
Filters are permanent behavior changes. Unlike a one-time archive, a filter silently skips the inbox for every future matching email. A wrong filter means the user misses emails they were expecting — with no indication anything happened. Always confirm with the user before creating filters.
One-time bulk archiving and permanent auto-archiving are different risk levels. The auto-filter script only derives candidates from patterns marked "Yes" below:
| Pattern | Safe as permanent filter? | Notes |
|---|---|---|
| noreply / no-reply / donotreply senders | Yes | Automated senders, never personal |
| Calendar responses (accepted/declined in subject) | Yes | Pure noise |
| Specific spam domains identified during cleanup | Yes | Domain-level, not pattern-level |
| Sketchy TLDs (.shop, .biz, .xyz, .info) | Yes | High spam signal, low false positive risk |
| Known newsletter senders confirmed during cleanup | Yes | User just explicitly confirmed unwanted |
| Generic phrases ("quick question", "checking in") | Risky | Real colleagues use these — don't filter |
| Name/company subject patterns ("for [Name]", "[Company] -") | No | Too broad — will catch real emails |
| Age-based | No | Not generally supported as a filter condition |
After the cleanup pipeline completes (Phase 5 post-cleanup report), invoke:
# Preview: show what filters would be created (no confirmation prompt)
bun run scripts/gmail-auto-filters.ts preview --run-id "<cleanup-run-id>"
# Generate: show plan, confirm with user, then create
bun run scripts/gmail-auto-filters.ts generate --run-id "<cleanup-run-id>"
If --run-id is omitted, the script finds the most recent completed cleanup run automatically.
The script:
auto/* label (e.g. auto/no-reply, auto/calendar, auto/newsletter, auto/sketchy-tld)Every auto-filter applies an auto/* label instead of silently archiving. This gives the user an audit trail — search label:auto/calendar to see what was caught. Labels are created automatically if they don't exist.
Tell the user:
label:auto/no-reply)bun run scripts/gmail-manage.ts filters --action delete --filter-id "<id>"From a single cleanup session on a startup founder's inbox (April 2026):
| Pass | Approx. catch |
|---|---|
| Older than 30 days | ~7,200 |
| Name-personalized subject patterns | ~35,000 |
| Company-name subject patterns | ~50,000 |
| Sketchy TLDs (.shop/.biz/.xyz) | ~3,741 |
| Newsletters/digests | ~1,014 |
| Calendar responses | ~142 |
| Generic cold outreach phrases | ~23 |
| Completed DocuSigns | ~34 |
Total: ~90,000+ emails in one session. The name/company pattern passes alone accounted for ~85k. This is why patterns dominate sender scans.