원클릭으로 Manus에서 모든 스킬 실행

web-asset-capture

스타0

포크0

업데이트2026년 6월 12일 18:10

Explore a website (public OR behind a login) and capture high-quality assets — retina stills and crisp motion video (.mp4; .webm opt-in) — into the asset library, by driving a REAL Chrome attached over CDP. Use when: (1) an agent needs UI footage/ stills from a page, especially an authenticated dashboard/app/console, (2) someone says "capture my app / grab the dashboard / record this flow", (3) you need UI-as-texture material (FloatingScreen / CaptureSurface / screen-rec) for a video. The agent EXPLORES first (it does not assume what's on the site), reports an inventory, then captures what the designer chooses. Never one-shot.

설치

Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.

Manus에서 실행

출처

blueif16

blueif16/vlog_test

GitHub 저장소 열기 Creator 저장소 보기

다운로드

Manus에서 실행

파일 탐색기

3 개 파일

SKILL.md

readonly

이 저장소의 다른 Skills

같은 저장소

apple-keynote-motion

blueif16/vlog_test

Define Apple-Keynote-style motion grammar for educational videos. Use when selecting builds, handoffs, object continuity, stage transitions, framed content motion, typography reveals, or smooth presenter-to-canvas movement for the Remotion/Hyperframes vlog pipeline.

2026-06-120

generated-video-director

blueif16/vlog_test

Specify inserted generated-video clips for the composer pipeline. Use when a transcript section needs a generated visual metaphor, illustrative clip, b-roll, product-style animation, or cinematic insert that must match the global style and neighboring shots.

2026-06-120

narration-slide-division

blueif16/vlog_test

Split content between the voice and the slide so they complement instead of repeat. Use at node 03 (script / cue-plan) AND node 04 (visual / slide copy) when authoring any beat that has both narration and on-screen text — the anti-duplication contract that keeps the voice from reading the slide aloud.

2026-06-120

render-lab-design-system

blueif16/vlog_test

Use when a recorded-video composition should use the Render Lab design system: production-control-room visuals, browser/render/QC instruments, frame-driven Remotion motion, and semantic components instead of generic cards or decorative UI.

2026-06-120

render-qc

blueif16/vlog_test

Quality-check rendered video outputs, Hyperframes atoms, Remotion previews, and composer artifacts for this project. Use after creating or changing any atom, generated clip, render plan, timeline, caption layout, presenter framing, or final MP4.

2026-06-120

shot-brief-writer

blueif16/vlog_test

Convert video-composer decisions into precise shot briefs for sub-agents or sub-nodes. Use when a section needs a self-contained spec containing global style, transcript excerpt, neighboring shots, dependencies, asset requirements, presenter/content layout, and expected output.

2026-06-120

name

web-asset-capture

description

Web Asset Capture (agentic)

Two layers, one clean split. The agent owns judgment; scripts own mechanics.

Explore + decide = Playwright MCP — the model's native browser tools (browser_navigate, browser_snapshot, browser_click, browser_take_screenshot, …). The agent wanders an unknown app, reads accessibility snapshots, finds what exists. ALL navigation judgment lives here.
Capture = scripts/capture.mjs — deterministic. Retina stills + crisp screencast → x264 CRF18 motion, written to the asset library + a single-writer catalog. MECHANICS only — zero judgment.

Both drive the same real Chrome — a shared, logged-in profile attached over CDP — so the agent sees exactly what you see, login included.

The browser: one shared logged-in profile, attached over CDP

Profile: /Users/tk/.capture-chrome — repo-agnostic; log ALL your accounts into it once. Not /tmp (survives reboot).
It runs as real Google Chrome with --remote-debugging-port=9222 (CDP endpoint http://127.0.0.1:9222). It's a NON-default profile, so Chrome v136+ honors the debug flag (it silently ignores it on the Default profile).
Playwright MCP is wired to it (--cdp-endpoint http://127.0.0.1:9222, project- local scope). capture.mjs attaches to the same endpoint. They share the session.
Bring it up: node scripts/capture.mjs ensure → {"endpoint":"…","running":true}. Idempotent — launches the shared Chrome only if the port is down.
Re-auth (the one manual touch): sessions eventually expire. When a probe lands on a login wall, run node scripts/capture.mjs ensure --reauth --url <loginUrl> — it opens a HEADED window; log in once (real Chrome, so Google SSO works); the session persists on disk. NEVER capture a login wall as if it were content.

Why CDP-attach and not a fresh headless launch: on macOS a freshly-launched Playwright Chrome can't get Keychain access to decrypt Chrome's cookies → it loads logged-OUT, even on a profile that's logged in. The real, user-launched Chrome holds the live decrypted session; we attach to it (connectOverCDP(..., {noDefaults:true}) — Chrome 148 rejects the default Browser.setDownloadBehavior attach call, so noDefaults is required). Full rationale + the rejected alternatives: shared/research/playwright-logged-in-session-reuse-2026-06-01.md.

What to capture & how much (the judgment layer)

This is the part the agent OWNS — read it, interpret it, decide. It is guidance, not a schema; nothing here is hard-coded in capture.mjs. The DOM tells you what's available; the video's intent tells you what's needed. Capture is demand-driven, not supply-driven — every asset must earn its place by expressing a beat. If nothing in the edit will use it, don't capture it.

Two inputs to the decision:

Supply (DOM, via MCP snapshot) — what this specific site actually offers: a hero, a card grid, a diagram, a chart, console views.
Demand (the video's intent) — what the piece is trying to say (its claims / narrative spine / audience). For a real video, ask for this. With no script yet, propose a DEFAULT expressive manifest from the DOM + the roles below, then prune against intent. DOM alone gets you a candidate; intent prunes it.

Asset roles — what each is good for (soft taxonomy; map the page's surfaces onto these):

Hero / establishing — "this is the product." ~1.
Atomic proof unit — variety + specificity (a single real card / row). Capture a representative handful (6–12) of the recognizable ones, never the full set.
Scale shot — magnitude ("86 models") — the wide grid that reads as "a lot." ~1. A plain still --selector of a tall virtualized grid UNDER-MOUNTS: it captures the element's full bounds but only the rows that mounted during settle (the tokenrouter models wall left ~2/3 whitespace). For a scale shot or the full set use cards (scroll-mounts each step) or still --scroll-y (mount below-fold rows first).
Diagram / concept — the mental model. 1–2.
Data / evidence — one capture per distinct quantitative claim (chart, stat, number).
Flow / motion — how it works / what it feels like (a real action / scroll / zoom). One per process beat.
State / detail — an emphasis close-up (hover, selected, single stat) — only when a beat lands on it.

How much: derived from the beats, not from the page. For repeating sets the rule is representative, not exhaustive — enough to land the claim + a small margin.

Extract vs skip: skip site chrome that won't express anything — top nav bands (and their PII), cookie bars, footers, breadcrumbs, pagination chrome, empty states. A surface is worth capturing only if you can name the beat it serves.

Capture sequencing: issue the captures sequentially — compose a sequence of deterministic capture.mjs calls from the manifest (one still per element, one cards per grid, one record per motion beat). No parallel multi-tab: cards already amortizes the attach over a whole grid, stills are cheap, and motion must be sequential (screencast is per-page and x264 is CPU-bound — parallel records contend and corrupt timing). Different views = re-navigate the same tab or a fresh call.

Downstream cropping is deferred (open): deliver clean atomic assets and let reframing / zoom / masking happen at compose time in Remotion, per scene, where the scene knows how it wants to use the asset. Whether any capture-side secondary crop is needed gets decided only after a full end-to-end loop has run.

The loop: explore → anchor → capture-by-selector → ONE contact-sheet QC

Anchor-first. Coordinates come from the DOM, never from eyeballing pixels. Resolve each asset to a stable selector, then let locator.screenshot() auto-scroll to it and crop AT its rendered bounds — pixel-exact, device-scaled, one shot per element. Vision is QC-only: a SINGLE contact sheet, read once. This dissolves the old full-capture → magick- regrid → Read → re-grid → crop-by-guessed-pixels loop (that loop stalled the agent at the 600s watchdog and produced DPR/pitch artifacts). Brief: shared/research/efficient-element-asset-capture-2026-06-01.md.

Ensure the browser: node scripts/capture.mjs ensure.
Explore + anchor with Playwright MCP — do NOT assume what's on the site:
- browser_navigate to a page; let it settle.
- browser_snapshot → the accessibility tree with stable [ref=eN] ids (10–100× cheaper than screenshots). Read it to see what exists; act by ref, never from pixels.
- Resolve each target to a stable selector: prefer a data-* / role / aria hook; fall back to the repeating grid-child class (e.g. .semi-card.cursor-pointer on the tokenrouter wall). For a repeating grid, infer ONE selector from a couple of sample cards, then replicate to all (CherryPick pattern). Avoid fragile positional selectors (.nth(8)); durable anchors survive re-render.
- When several siblings share a class (e.g. one .semi-card of five), anchor by a unique descendant via :has() — never by text or :nth. The dashboard chart card resolves cleanly as .semi-card:has(.semi-card-header-wrapper-title > div.lg\:justify-between). Cheap-revalidate the disambiguating selector with capture.mjs check (expect count:1) before trusting it.
- browser_click / scroll / type by ref to discover states (modals, tabs, empty vs populated). Re-snapshot after every navigation — refs go stale on re-render.
- Enumerate nav/link roles to build a site map of reachable views. Report an inventory: "here's what exists + the selector for each." Nothing is captured one-shot.
Reuse first: node scripts/capture.mjs assets [--source <name>] [--query <substr>] reads the catalog so you don't re-shoot what you already have.
Capture by anchor (deterministic):
- Single element, retina still: node scripts/capture.mjs still --url --selector <css> --source <name> --id <assetId> (element-exact locator.screenshot; animations frozen; ~retina. --clip x,y,w,h / --full / viewport remain as fallbacks for canvas / non-anchorable UI only.)
- A whole repeating grid in ONE attach: node scripts/capture.mjs cards --url --selector <css> [--scroll-selector <css>] [--scroll-step <px>] [--max <n>] [--no-contact] --source <name> --id-prefix  captures EVERY match of --selector, amortizing one CDP attach over all N cards. For virtualized grids it steps --scroll-selector (the inner overflow container, not the window) in --scroll-step increments and collects newly-mounted matches each step, deduped by innerText hash. Each card → <id-prefix>-NN.png cropped at its measured boundingBox() (recorded as the catalog bbox — evidence, not a guess) and all crops are montaged into <id-prefix>-contact.png.
- Crisp motion: node scripts/capture.mjs record --url --seconds <n> --motion <scroll|into-view|hold|zoom> [--target <css>] [--logged-out] [--webm] --source <name> --id <assetId> (emits <id>.mp4, retina, CRF18 two-pass. --logged-out records via a FRESH headless browser — a clean public-page session with NO logged-in chrome — for a PUBLIC marketing shot, still retina (logged-in is the default; see Auth/data sensitivity). --webm ALSO emits a VP9 <id>.webm — default OFF: no visible gain, larger + ~2x encode; Remotion decodes the mp4 fine.)
- All upsert shared/asset-library/<source>/catalog.json (single writer → never drifts; preserves any curated views[]).
QC via ONE contact sheet (project habit — never "watch the video", and never N per-card Reads): cards hands you a single montaged <id-prefix>-contact.png. Read it ONCE and reject bad cells. For a still/record, read the one captured still / first- mid-last frames. Right view? Crisp text (not VP8 mush)? No PII band? A clean exit that landed on a login wall or empty state is still a FAIL.

Site memory (`site-map.json`)

A per-site learned anchor cache the agent reads/writes — shared/asset-library/<source>/site-map.json (data, not code). Read it FIRST, before opening Playwright MCP. It records, per view: the URL, each anchor's selector + role + count/bbox + confidence + lastVerified, plus nav (label→url) and piiZones (the top nav/header band that holds account email + balance — anchor BELOW it).

Reuse high-confidence anchors WITHOUT re-snapshotting. Only explore views/anchors that are unknown or stale.
Cheap-revalidate before trusting: node scripts/capture.mjs check --url --selector <css> (token-free probe → {count, firstBbox, visible, ok}). Confirm ok:true (and count matches) before you capture against a cached selector.
Snapshot once PER PAGE and extract ALL that page's anchors in one pass — not one snapshot per asset.
Write new/updated anchors back, evidence-stamped: selector, role, count/bbox, lastVerified. On drift (a check returns ok:false or a changed count), re-explore that ONE anchor and update its lastVerified + confidence. Never fake a lastVerified.
Goal: confirm less each run. The site-map is a cache, not a source of truth.

Storage layout — one folder per product (reproducible contract)

Every product gets ONE home: shared/asset-library/<product>/, and capture.mjs writes there by role automatically — stills/, cards/, motion/, plus _contact/ (QC sheets) — with catalog.json (single-writer machine inventory) and site-map.json (anchor cache) at the root. This layout IS the contract: re-running the node on the same site reproduces the same structure and the same kinds of assets — the role folders and catalog shape don't change run to run.

The product folder is the home for all of that product's surfaces, not just captures. Generated / downstream surfaces live here too as sibling folders — e.g. frames/ for Remotion-rendered frames, processed/ for any compose-time crops/masks. Same folder, different surfaces. Archived pre-anchor experiments go under _legacy/ (moved, never deleted; never cataloged).

Efficient crop (anchor-first) — why this, not the regrid loop

Per the brief (shared/research/efficient-element-asset-capture-2026-06-01.md): stop asking vision for coordinates. locator.screenshot() auto-scrolls to the element and clips at its true rendered bounds — pixel-exact, cropped at capture, no grid overlay, no eyeballing. Measuring each card's live boundingBox() and clipping per-card (with a pinned deviceScaleFactor) kills the DPR / fractional-layout artifacts (the 1190≠1082 pitch bug, the banner-shift) at the source. Batch all matches through ONE attach (cards) so N cards share one connect — the whole point of the verb. Then ONE vision pass over the contact sheet replaces N regrids + N Reads. Set-of-mark / "ask the VLM for boxes" is the documented failure mode (54% hallucinated boxes on dense pages) — don't reintroduce it.

Motion: when it beats a still, and which preset

Most "motion" is the camera, not the site — and that's honest, we supply it:
- scroll — slow eased pan down a page (or to --target).
- into-view — bring --target to center, then dwell.
- zoom — gentle push toward --target.
hold captures the page's OWN in-place animation. Use it only when the element actually animates — verify first (a 1–2s hold test, then look). If the page is static, do NOT fabricate motion: use a camera move or a still, and flag it.
Prefer a still when the value is a single settled state (a stat, a config panel, a list). A few decisive frames beat a long noisy scroll.

Best-practice recipes (2026 research — baked into the tools; mirror them in exploration)

Perception: snapshot the a11y tree (interactive-only, compact), act by ref; annotated screenshot only for icon-only / canvas / visual-state cases.
Stabilize before capture: domcontentloaded + a bounded ~4s networkidle wait (console SPAs stream telemetry and never go fully idle — a hard networkidle wait burns ~45s/capture for nothing), then document.fonts.ready + in-viewport image decode (no mid-reflow text, no blank placeholders), a known-heading wait, dismiss cookie/login overlays, freeze animations for stills. capture.mjs does all this in navigateAndSettle.
Retina everywhere: deviceScaleFactor 2 — pinned via CDP Emulation on BOTH the CDP-attached and the --logged-out launched path; stills + motion come back ~3840×2160.
Recording is two-pass: live x264 -preset ultrafast -crf 18 (must beat realtime so CDP never backpressures) → deliverable mp4 (x264 CRF18). A VP9 webm is opt-in via --webm (default OFF: no visible gain over the mp4, inconsistent size, ~2x encode time; Remotion OffthreadVideo decodes the mp4 fine). NEVER the built-in VP8 recordVideo (mosquito-noise text). yuv420p + color_range tv so Remotion decodes at true levels.

Auth / data sensitivity / safety

Capture the REAL, logged-in UI by default — show the true form of the data. This is the operator's OWN product and OWN account; they see this data anyway, and empty logged-out states make worse assets (no real rows, no real numbers). Do NOT reflexively log out, blank panels, or anchor around the account chrome just to dodge ordinary account info.

Fine to show (do NOT scrub): account email, name, avatar, balances, spend/usage history, dashboard numbers, model lists. This ordinary account data is exactly what makes a demo read as real.
NEVER leak — mask or skip ONLY these live credentials: full API-key / secret-token values, credit-card / payment numbers, passwords. If one is shown unmasked, frame to exclude it, mask it, or skip that shot and flag it — never publish a working secret. (Most consoles already mask keys, e.g. sk-…**** — just verify before shooting.)
--logged-out is an OPTIONAL tool, not a sensitivity mandate — use it only when you specifically want a clean PUBLIC marketing capture (the logged-out landing with Login / Sign-Up). For anything where the real data IS the point, capture logged-in.
Never fabricate footage — real site only. If a view is gated and the session is dead, ensure --reauth. If a view can't be reached, FLAG it as a gap; don't fake it.

Tooling map

scripts/capture.mjs — ensure | still | record | cards | check | assets (this skill's mechanics). cards = batch anchor-first element capture (whole repeating grid, one attach, one contact sheet). check = token-free selector probe (re-validate a cached site-map anchor: {count, firstBbox, visible, ok}) — a PROBE, not a capture.
Playwright MCP (playwright server, --cdp-endpoint http://127.0.0.1:9222) — the exploration/action tools the agent calls natively.
scripts/cdp-probe.mjs — minimal "attach + screenshot a URL" probe (handy sanity check).
scripts/capture-site.mjs — LEGACY public-page tool that launches its OWN browser (probe/sections/full + storageState). For logged-in + motion, use the CDP-attach path above; keep capture-site.mjs only for quick public-page stills.
references/video-quality-upgrade.md — the screencast→ffmpeg rationale.
references/library-and-manifest.md — asset-library layout + catalog conventions.

web-asset-capture

이 저장소의 다른 Skills

Web Asset Capture (agentic)

The browser: one shared logged-in profile, attached over CDP

What to capture & how much (the judgment layer)

The loop: explore → anchor → capture-by-selector → ONE contact-sheet QC

Site memory (site-map.json)

Storage layout — one folder per product (reproducible contract)

Efficient crop (anchor-first) — why this, not the regrid loop

Motion: when it beats a still, and which preset

Best-practice recipes (2026 research — baked into the tools; mirror them in exploration)

Auth / data sensitivity / safety

Tooling map

Web Asset Capture (agentic)

The browser: one shared logged-in profile, attached over CDP

What to capture & how much (the judgment layer)

The loop: explore → anchor → capture-by-selector → ONE contact-sheet QC

Site memory (site-map.json)

Storage layout — one folder per product (reproducible contract)

Efficient crop (anchor-first) — why this, not the regrid loop

Motion: when it beats a still, and which preset

Best-practice recipes (2026 research — baked into the tools; mirror them in exploration)

Auth / data sensitivity / safety

Tooling map

이 저장소의 다른 Skills

Site memory (`site-map.json`)

Site memory (`site-map.json`)