con un clic
writing-cli-e2e-tests
// Use when writing, adding, or modifying e2e tests for CLI commands in packages/@sanity/cli-e2e/. Triggers on e2e test creation, new command test coverage, or changes to CLI test infrastructure.
// Use when writing, adding, or modifying e2e tests for CLI commands in packages/@sanity/cli-e2e/. Triggers on e2e test creation, new command test coverage, or changes to CLI test infrastructure.
Use when creating git commits - enforces conventional commit format with correct type prefixes for release-triggering vs non-release changes
Use this for writing or reviewing Sanity product UI copy: Studio UI text, error/validation messages, empty states, tooltips, buttons, status/confirmation dialogs, onboarding flows, CLI output, and API error responses. Trigger whenever you see UX writing, product copy, or interface text for Sanity (Studio, CLI, API), especially errors or system messages.
| name | writing-cli-e2e-tests |
| description | Use when writing, adding, or modifying e2e tests for CLI commands in packages/@sanity/cli-e2e/. Triggers on e2e test creation, new command test coverage, or changes to CLI test infrastructure. |
E2e tests run real CLI commands against real infrastructure with real side effects. They validate that commands work end-to-end for both humans (interactive) and agents/CI (non-interactive). This skill defines the philosophy and patterns for writing effective e2e tests in packages/@sanity/cli-e2e/.
For local setup, CI behavior, and how to add new env vars, see packages/@sanity/cli-e2e/README.md.
Before writing any tests, read the command source code and present a plan for user approval.
Step 1: Understand the command. Run the command interactively and with --help to observe its behavior. Identify:
init has studio, app, and Next.js paths)Discover behavior by running the CLI, not by reading source code. The test should verify what the user experiences, and the actual prompt flow may differ from what the source suggests.
Step 2: Plan the test structure. Present to the user:
__tests__/<command>/...)Step 3: Get user approval. Wait for the user to review and revise the plan before writing any test code.
Example plan output (describe behaviors, not final test() names — those get refined during implementation):
__tests__/init/
init.test.ts
- smoke: rejects a known-bad flag (no auth)
- --bare mode prints project info without creating any files (non-interactive)
init.studio.test.ts
describe.each([{-y}, {no -y}]) — non-interactive, both unattended modes
- default flow creates a TypeScript studio with the expected config files
- --no-typescript produces JavaScript files
init.studio-interactive.test.ts
- Ctrl+C at the first prompt aborts with exit code 130
- walks through all prompts (template, TypeScript, package manager) and produces a working studio
init.app.test.ts
describe.each([{-y}, {no -y}]) — non-interactive
- app-quickstart template produces the expected app scaffold
- interactive: project/dataset prompts appear and selectable
Run commands to their final outcome — files created, data written, success message printed. A half-finished flow that kills the session partway gives false confidence. Since each full run is expensive, assert everything meaningful from that single invocation: exit code, stdout messages, generated files, config content.
Every command should have both interactive and non-interactive tests. Non-interactive tests verify agents and CI automation can use the command fully. Interactive tests verify humans get proper prompts, navigation, and feedback. Both modes must work completely — they serve different audiences.
Some tests cannot run to completion by design. These are the only acceptable exceptions to the "complete flows" principle:
sendControl('c') at the earliest prompt and asserts expect(exitCode).toBe(130) (SIGINT). One abort test is sufficient — the abort mechanism is the same regardless of which prompt stage it fires at, so testing Ctrl+C at multiple stages adds cost without value.Keep these minimal — one of each per command, at the top level (no describe wrapper). Every other test should run to completion.
Is the behavior driven by user prompts/navigation?
Non-interactive (spawn): Command fully driven by flags/args. Fast, reliable, easy stdout/stderr assertions.
Interactive (PTY): Tests the prompt experience — selection navigation, text input, abort handling.
Both: Most commands should have both. Non-interactive proves the command works. Interactive proves the UX works. Apply this per-flow, not per-command — if a command has studio, app, and Next.js flows, each flow needs both modes.
All commands get a folder. Split non-interactive and interactive tests into separate files (e.g., init.studio.test.ts and init.studio-interactive.test.ts) — this improves vitest sharding and keeps files focused.
For commands with multiple distinct flows, split files by user flow or functionality:
__tests__/
init/
init.studio.test.ts
init.nextjs.test.ts
init.bare.test.ts
init.errors.test.ts
deploy/
deploy.app.test.ts
deploy.studio.test.ts
For simpler commands, a single test file is enough:
__tests__/
datasets/
datasets.test.ts
help/
help.test.ts
Don't create single-test files. If a flow only has 1-2 tests (e.g., a smoke test or a simple mode like --bare), merge it into the command's base <command>.test.ts file rather than giving it a dedicated file.
Test naming: Descriptive names that accurately describe the behavior being verified. No numeric IDs. The name must match what the test actually asserts — a test for a deprecated flag should say "deprecated", not "invalid input".
// GOOD
test('creates TypeScript studio with correct config files', ...)
test('rejects deprecated --reconfigure flag', ...)
// BAD
test('2.4 default init creates correct TypeScript project', ...)
test('rejects invalid input with helpful error', ...) // if it's testing a deprecated flag, not invalid input
Flat over nested. Don't wrap single tests in describe blocks. Only use describe when tests share setup/teardown or parameterization (describe.each). Don't use describe just to label categories like "abort handling" or "complete flows" — a flat list under one top-level describe is clearer.
beforeEach/afterEach for cleanup. Use lifecycle hooks for temp directory creation and cleanup instead of try/finally in every test:
describe('sanity init', {timeout: 120_000}, () => {
let tmp: Awaited<ReturnType<typeof createTmpDir>>
beforeEach(async () => {
tmp = await createTmpDir({useSystemTmp: true})
})
afterEach(async () => {
await tmp.cleanup()
})
test('creates studio', async () => {
// use tmp.path directly — no try/finally needed
})
})
test.each for variants. When testing the same flow with different inputs (e.g., templates, package managers), use test.each instead of duplicating tests:
test.each(['clean', 'blog'])('creates studio with %s template', async (template) => {
// ...
})
Consolidate interactive prompt tests. When a command has multiple sequential prompts (template, TypeScript, package manager), write one test that walks through all of them rather than separate tests that each omit one flag. Each CLI invocation is expensive (~12s); one test that exercises three prompts is better than three tests that each exercise one.
// BAD: three separate tests, three CLI invocations
test('shows template selection', ...) // omits --template
test('shows TypeScript prompt', ...) // omits --typescript
test('shows package manager prompt', ...) // omits --package-manager
// GOOD: one test, one invocation, all prompts exercised
test('walks through template, TypeScript, and package manager prompts', async () => {
const session = await runCli({
args: ['init', '--project', projectId, '--dataset', 'production',
'--output-path', tmp.path, '--no-mcp', '--no-git'],
interactive: true,
})
await session.waitForText(/Select project template/i)
session.sendKey('Enter')
await session.waitForText(/Do you want to use TypeScript/i)
session.sendKey('Enter')
await session.waitForText(/package manager/i)
session.sendKey('Enter')
const exitCode = await session.waitForExit(90_000)
expect(exitCode).toBe(0)
expect(existsSync(`${tmp.path}/sanity.config.ts`)).toBe(true)
})
Minimal flags. For non-interactive tests, only include flags the test is specifically testing. For interactive tests, leave prompts unpinned so selectOption exercises them — only pin infrastructure values like --organization or --output-path that aren't UX under test. Use --no-git and --no-mcp to skip side effects irrelevant to the flow being tested.
Precise assertions. Assert on actual content, not proxies:
// BAD: vague, tells you nothing on failure
expect(stderr.length).toBeGreaterThan(0)
expect(stdout).toMatch(/import/i)
expect(exitCode).not.toBe(0)
// GOOD: specific, failure message is immediately useful
expect(stderr).toContain('--reconfigure is deprecated')
expect(stdout).toMatch(/Done! Imported \d+ documents/)
expect(exitCode).toBe(1)
Inline args, don't abstract. Inline CLI args directly in each test rather than building helper functions that produce them — abstraction obscures what each test actually runs. The full runCli() API and interactive session methods are documented in packages/@sanity/cli-e2e/README.md; shared testFixture() / createTmpDir() helpers live in @sanity/cli-test.
describe('sanity init - studio', {timeout: 120_000}, () => {
let tmp: Awaited<ReturnType<typeof createTmpDir>>
beforeEach(async () => {
tmp = await createTmpDir({useSystemTmp: true})
})
afterEach(async () => {
await tmp.cleanup()
})
test('creates studio with TypeScript and correct config', async () => {
const {error, stdout} = await runCli({
args: ['init', '-y', '--project', projectId, '--dataset', 'production',
'--output-path', tmp.path, '--typescript'],
})
if (error) throw error
expect(existsSync(`${tmp.path}/sanity.config.ts`)).toBe(true)
expect(existsSync(`${tmp.path}/package.json`)).toBe(true)
const config = readFileSync(`${tmp.path}/sanity.cli.ts`, 'utf8')
expect(config).toContain(projectId)
expect(stdout).toMatch(/sanity docs|sanity help/i)
})
})
Use selectOption(pattern) to navigate select prompts by text instead of counting ArrowDown presses. It scrolls through the list, handles off-screen options, and throws if zero or multiple options match. Pass a string for a literal substring match or a regex for pattern matching (e.g., matching a project ID inside "Project Name (id)"). Strings are escaped before matching, so they're treated as literal text — but they still match anywhere in the option line, so prefer regex with anchors when collisions are likely (e.g., 'dev' would also match 'development').
test('complete interactive flow selects project and dataset', async () => {
const session = await runCli({
args: ['init', '--template', 'app-quickstart', '--organization', orgId,
'--output-path', tmp.path, '--no-git', '--no-mcp'],
interactive: true,
})
await session.waitForText(/Configure a project for this app/i)
await session.selectOption(new RegExp(`\\(${projectId}\\)`))
await session.waitForText(/Select dataset to use/i)
await session.selectOption('production')
await session.waitForText(/Package manager to use/i)
await session.selectOption('pnpm')
const exitCode = await session.waitForExit(90_000)
expect(exitCode).toBe(0)
expect(existsSync(`${tmp.path}/src/App.tsx`)).toBe(true)
})
Auth token is always available. To test unauthenticated behavior, override the env:
test('no token triggers login prompt', async () => {
const session = await runCli({
args: ['init'],
env: {SANITY_AUTH_TOKEN: ''},
interactive: true,
})
await session.waitForText(/log in|create.*account/i)
// ... complete the flow
})
Set timeouts at the describe block level. The vitest config provides defaults (testTimeout: 30_000). If a group of tests needs more time, set it once:
describe('studio creation flows', {timeout: 120_000}, () => {
test('creates studio with TypeScript', async () => {
// inherits describe timeout
})
test('creates studio with JavaScript', async () => {
// inherits describe timeout
})
})
Interactive tests must not depend on list order or item positions. Use selectOption to find items by text — it handles scrolling and position automatically.
// BAD: counts ArrowDown presses, breaks if list order changes
session.sendKey('ArrowDown')
session.sendKey('ArrowDown')
session.sendKey('Enter')
// GOOD: finds the option by text regardless of position
await session.selectOption('production')
await session.selectOption(new RegExp(`\\(${projectId}\\)`))
For assertions on output, match structural prompt text, not dynamic API content:
// BAD: asserts on specific dynamic content
expect(output).toContain('My Specific Project Name')
// GOOD: asserts on prompt structure or known output strings
await session.waitForText(/Select project|Configure a project/i)
expect(output).toContain('Your custom app has been scaffolded')
Rules:
selectOption(pattern) for all select prompts — never count ArrowDown presseswaitForText(regex) to detect prompt appearance before interactingselectOption, pass a string for a literal substring match ('production') and regex when you need anchoring or pattern matching (e.g., new RegExp(\(${projectId}\)) to find an ID inside parentheses). Note that selectOption will throw if multiple options match the substring, so collisions surface loudly.Non-interactive runs have no TTY, so the CLI treats them as unattended and skips prompts (see the README's gotcha for the mechanics). The implication for tests: any behavior gated behind a prompt uses its fallback default, which often differs from the prompt's default selection. For example, a prompt that defaults to "Yes" interactively may fall back to undefined (falsy) when skipped — so the spawn-mode result differs from the interactive default.
Before omitting a flag from a non-interactive test, run the command without it and verify the unattended default matches what your test expects. If it doesn't, pass the flag explicitly.
Test both -y and non--y unattended modes. Both -y and a non-interactive terminal trigger isUnattended() === true, but they enter through different code paths. Use describe.each to parameterize all non-interactive tests across both modes. This also gives vitest distinct test entries for better sharding.
describe.each([
{label: 'with -y flag', yFlag: ['-y']},
{label: 'unattended (no -y)', yFlag: [] as string[]},
])('sanity init - studio ($label)', {timeout: 120_000}, ({yFlag}) => {
test('creates studio with default settings', async () => {
const {error} = await runCli({
args: ['init', ...yFlag, '--project', projectId, '--dataset', 'production',
'--output-path', tmp.path, '--typescript'],
})
if (error) throw error
// assertions...
})
})
If a test failure reveals a product bug rather than a test bug, file an issue in Linear, skip the affected test with a link to the issue, and move on. Don't paper over product bugs with workarounds in test assertions.
// Skipped: --bare flag doesn't create package.json. See https://linear.app/sanity/issue/SDK-XXXX
test.skip('bare init creates minimal project', () => {
// ...
})
E2e tests validate real commands against real infrastructure with real side effects. They are expensive to run and should focus on proving the full flow works.
Unit tests (packages/@sanity/cli/src/commands/__tests__/) exercise command handlers with mocked HTTP or client methods. They are fast, isolated, and appropriate for testing the full matrix of input validation, flag parsing, error messages, and edge cases.
| Concern | Where to test |
|---|---|
| Flag validation, argument parsing | Unit tests |
| Error messages and exit codes for bad input | Unit tests |
| Config file parsing, input sanitization | Unit tests |
| Complete command flows with real side effects | E2e tests |
| Files generated, APIs called, prompts displayed | E2e tests |
For error handling in e2e, keep a single smoke test per command that confirms the binary rejects bad input. Test the full matrix of validation rules as unit tests.
// ONE e2e smoke test for error rejection
test('rejects deprecated --reconfigure flag', async () => {
const {exitCode, stderr} = await runCli({
args: ['init', '--reconfigure'],
env: {SANITY_AUTH_TOKEN: ''},
})
expect(exitCode).toBe(2)
expect(stderr).toContain('--reconfigure is deprecated')
})
// Individual validation rules → unit tests in
// packages/@sanity/cli/src/commands/__tests__/
| Mistake | Fix |
|---|---|
| Fragment tests that kill session halfway | Run to completion, assert on outcome |
| One assertion per CLI invocation | Batch related assertions in one test |
| Hardcoded list positions via arrow keys | Use selectOption(pattern) to find by text |
| Per-test timeouts | Set timeout on describe block |
try/finally cleanup in every test | Use beforeEach/afterEach for temp dir lifecycle |
Wrapping a single test in a describe | Only use describe for 2+ related tests |
Vague assertions (stderr.length > 0) | Assert on actual content (stderr.toContain(...)) |
expect(exitCode).not.toBe(0) | Use toBe(1) when you know the expected code |
| Test name doesn't match behavior | Name must describe what the test actually asserts |
| Duplicating tests with different inputs | Use test.each for variants |
| Pinning prompts with flags in interactive tests | Leave prompts unpinned and use selectOption to exercise them |
| Papering over product bugs in assertions | File issue, skip test with link, move on |
| Separate tests for each interactive prompt | One test that walks through all prompts sequentially |
Only testing non-interactive with -y | Use describe.each to test both -y and non--y unattended |
| Multiple abort tests at different prompt stages | One abort test at the earliest prompt is sufficient |
describe blocks that just label categories | Only use describe for shared setup/teardown or parameterization |
skipIf(!hasToken) for unauthed tests | Override with env: {SANITY_AUTH_TOKEN: ''} |
| Asserting on dynamic API content | Use structural regex patterns |
Mutating process.env directly | Use vi.stubEnv() for env overrides |
| Mocking functions, APIs, or services | Never mock in e2e tests — test real infrastructure |
| Testing flag validation/deprecation in e2e | One smoke test per command; validation matrix belongs in unit tests |
| Reading source to predict prompt order | Run the test and observe the actual CLI output to understand the flow |
| Only running lint/types to validate | Always run the actual e2e tests before reporting done |