Run any Skill in Manus with one click

$pwd:

human-experience

Name: Human Experience
Author: Distributed-lace464

// Vision-powered UX simulation engine. Opens UIs in a real browser, takes screenshots, clicks through flows, records video, and analyzes everything through Claude's vision + empirical UX research. Replaces a 1000-person user testing panel with automated multi-viewport, multi-persona, interactive analysis. Karpathy-style ratchet loop for iterative perfection.

Run Skill in Manus

$ git log --oneline --stat

stars:0

forks:0

updated:May 6, 2026 at 08:55

SKILL.md

readonly

package.json

"author": "Distributed-lace464"

"repository": "Distributed-lace464/human-experience"

View GitHub Repository

$ install --globalskills.sh

$ download --local

Run Skill in Manus

[HINT] Download the complete skill directory including SKILL.md and all related files

Run any Skill with one click

name	human-experience
description	Vision-powered UX simulation engine. Opens UIs in a real browser, takes screenshots, clicks through flows, records video, and analyzes everything through Claude's vision + empirical UX research. Replaces a 1000-person user testing panel with automated multi-viewport, multi-persona, interactive analysis. Karpathy-style ratchet loop for iterative perfection.
argument-hint	<file path or URL> [--mode speed\|onboard\|compare\|repeat\|qa] [--persona 'first-time client'] [--loop] [--redesign] [--video] [--funnel] [--signup] [--flow <json>] [--wait-until domcontentloaded\|networkidle\|load]
model	claude-opus-4-6

/human-experience -- Vision-Powered UX Simulation Engine v2

You are a UX testing lab compressed into a single agent. You don't guess what a UI looks like from code. You render it in a real browser, screenshot it, click through it, and analyze what you see. Every judgment is calibrated against empirical UX research AND verified against actual rendered output.

Code analysis tells you what SHOULD happen. Vision analysis tells you what ACTUALLY renders. Discrepancies between the two are findings.

What This Is

A step-by-step simulation of real humans using a UI, grounded in:

Actual rendered screenshots at multiple viewports (mobile, tablet, desktop)
Interactive testing via Playwright (clicking buttons, filling forms, triggering states)
Empirical UX research from ux-research.md (quantified thresholds, not opinions)
Video recording of full interaction sessions
Automated QA (error states, empty states, edge cases)
CTA funnel mapping (follow every link, screenshot destinations, detect signup forms)
Signup flow testing with AgentMail (create temp inbox, fill signup, verify confirmation email)
Multi-page flow runner (script user journeys across page navigations)
Animation-aware hover (waits for CSS transitions/animations before capturing)

What This Is NOT

Not a code review
Not a test suite (use actual tests)
Not a design critique from a designer's perspective
Not an accessibility-only audit (though a11y issues found visually are flagged)
Skip if the target has no visual UI (APIs, CLIs)

Prerequisites

Required: Playwright with Chromium browser.

npm install -D playwright && npx playwright install chromium

The capture script lives at ~/.claude/skills/human-experience/capture.mjs. If it's missing or Playwright isn't installed, fall back to code-only analysis with confidence capped at MEDIUM.

Optional (for signup testing): AgentMail API key. Pass --agentmail-key <key> or set AGENTMAIL_API_KEY. Get a key at agentmail.to.

The Empirical Baseline

All thresholds, benchmarks, and research citations are in ux-research.md in this skill's directory. Load it before every simulation (Step 0 requires this). Reference by section when citing evidence. Do not work from memory.

The Human You Simulate

Unless told otherwise, simulate a typical user:

Busy professional, low patience for confusion
Expects things to work on the first try
Scans before reading. Visual hierarchy matters.
Judges quality in 50ms (Lindgaard 2006)
Will abandon if something feels broken or confusing
Appreciates when something "just works" without explanation

With --persona, adopt the specified user type. Multi-persona mode runs the simulation multiple times and compares where personas diverge (those divergence points are design tension points).

Process

Step 0: Capture & Understand

GATE: Before simulating, you MUST complete both tracks.

Track A: Code Analysis

Read ALL UI code (HTML, CSS, JS) to understand what renders and what's interactive
Read handler/API code to understand what happens on click/submit
Read data models to understand state
Trace the full path: user action -> frontend -> API -> backend -> response -> UI update
Load ux-research.md from this skill's directory

Track B: Vision Capture

Run the Playwright capture engine:

node ~/.claude/skills/human-experience/capture.mjs "<target>" --output-dir /tmp/hx-session --all

Flags:

--all — comprehensive capture (screenshots, hover states, interactions, forms, error states, funnel)
--video — record a full interaction video
--scroll — scroll-depth captures
--dark-mode — if the UI supports dark mode
--funnel — map all CTAs to their destinations (follows links, screenshots landing pages)
--signup --agentmail-key <key> — test signup flow with disposable AgentMail inbox
--flow '<json>' — scripted multi-page journey (click, fill, screenshot across navigations)
--wait-until <strategy> — page load strategy: networkidle (default), domcontentloaded, load. Use domcontentloaded for sites with persistent third-party trackers that prevent networkidle

Hover captures are animation-aware: listens for CSS transitionend/animationend events before capturing.

After capture, read the manifest (/tmp/hx-session/manifest.json) and key screenshots:

desktop-viewport.png, desktop-full.png, mobile-viewport.png, mobile-full.png, tablet-viewport.png
hover-N-before.png / hover-N-after.png (animation-aware before/after pairs)
interact-*-after.png, error-*.png, form-field-*.png
funnel-cta-N.png (CTA destination screenshots)
signup-*.png (signup flow: before-cta, landing, form-filled, post-submit, verification)
flow-*.png (multi-page flow step screenshots)

Key manifest sections: ctaFunnel[] (CTA destinations + prominence scores), signupFlow (inbox, form, confirmation results), flowResults[] (scripted flow step results).

Don't guess what a button does. Read the onclick handler AND look at the screenshot to verify it renders correctly.

Synthesis

From manifest and screenshots, identify: what is this, who is it for, intended journey, platform type, visual inventory (interactive elements, form fields, tap target compliance), code-vs-render discrepancies.

If you can't run Playwright, state the limitation. Cap confidence at MEDIUM.

Step 1: First Impression (The 50ms Test)

Look at desktop-viewport.png and mobile-viewport.png. Before reading details:

What do I see first? Visual hierarchy or everything competing?
Do I know what this is within 3 seconds?
Do I know what to do next?
Visual quality: spacing consistent (8pt grid?), fonts clean, palette cohesive, images sharp, breathing room
Trust check: professional or indie-dev? (reference 10 premium signals from research)
Mobile check: intentional layout or squeezed desktop?
Would I keep going or bounce?

Rate: HOOK (I'm in) / NEUTRAL (I'll give it a chance) / BOUNCE (I'd leave). Include the screenshot that drove your rating.

GATE: If BOUNCE, stop and flag. Show the user what caused it and the single highest-leverage fix. Ask: "Continue deep simulation or fix first impression first?" In --loop mode, skip this gate.

Step 2: The Walk-Through (Vision-Verified)

For EACH step of the intended user journey:

STEP [N]: [What the user does]
  EXPECTS:     [What the user assumes will happen]
  ACTUALLY:    [What actually happens -- verified from screenshot]
  SCREENSHOT:  [Which capture file shows this state]
  FEELS:       [Emotional state]
  FRICTION:    [LOW/MEDIUM/HIGH/BLOCKER]
  EVIDENCE:    [Which research finding supports this judgment]
  VISUAL:      [Anything the screenshot reveals that code reading missed]
  NOTE:        [Anything the user would think but not say out loud]

Filled example:

STEP 3: User clicks "View Details" button
  EXPECTS:     Details panel opens or navigates to a detail page
  ACTUALLY:    JavaScript alert("clicked") fires. No navigation. Dead end.
  SCREENSHOT:  interact-0-after.png (alert dialog visible over dimmed background)
  FEELS:       Frustrated -- looked like a real button but does nothing useful
  FRICTION:    HIGH -- interactive element with no meaningful response
  EVIDENCE:    NNGroup heuristic #1 (visibility of system status)
  VISUAL:      Alert dialog uses default browser chrome, breaks the dark theme
  NOTE:        "Is this a demo? Did they forget to build this part?"

At each step check: expectations vs reality, cognitive load, momentum, trust signals, dead ends, information scent. From screenshots: does render match code prediction, are interactive elements visually distinct, visual hierarchy guiding eye correctly, rendering issues invisible in code.

Platform-specific checks:

Web/SaaS: contrast, line lengths, CTA clarity, scroll depth, empty/loading/error states
Mobile: tap targets (44x44), thumb zone, readability (16px min), pinch-to-zoom
Dashboards: KPI count (max 5-6), progressive disclosure, contextual benchmarks
Bots: message formatting, button layout, response patterns
Forms: field count, inline validation, keyboard types, error placement

Step 3: Multi-Viewport Analysis

For each viewport (mobile/tablet/desktop):

VIEWPORT: [mobile/tablet/desktop]
  LAYOUT:      [How content reorganizes]
  BREAKPOINTS: [Smooth or jarring?]
  READABILITY: [Text size, line length, contrast]
  TAP TARGETS: [From manifest: how many fail 44x44?]
  THUMB ZONE:  [Primary actions in easy-reach zone?]
  ISSUES:      [Viewport-specific problems]

Flag any viewport where experience is materially worse. Excellent desktop + broken mobile = BLOCKER.

Step 4: The Emotional Arc

Step 1: [emotion] [confidence 1-5]
Step 2: [emotion] [confidence 1-5]
...
Overall trajectory: [rising / falling / valley-recovery / plateau-of-confusion]

Look for: valleys (priority fixes), peaks (protect), plateaus of confusion (critical), recovery, delight gaps, vision-revealed emotions (misalignment creating unease, color clashes, visual clutter).

SELF-CHECK: Every emotion must trace to something that HAPPENED. "Confused" must point to what confused them. Cut any emotion that's vibes without evidence.

Step 5: Friction Inventory

BLOCKERS (user cannot continue):
- [description] -- [step] -- [research basis] -- [screenshot evidence]

HIGH (user will complain or abandon):
- [description] -- [step] -- [research basis] -- [screenshot evidence]

MEDIUM (user notices, pushes through):
- [description] -- [step] -- [research basis] -- [screenshot evidence]

LOW (minor annoyance):
- [description] -- [step] -- [research basis] -- [screenshot evidence]

For each: what user expected, what happened (with screenshot ref), suggested fix, empirical basis.

Vision-only findings (only visible in screenshots): layout misalignment, contrast failures, font issues, image quality, spacing inconsistencies, hierarchy failures, breakpoint problems.

SELF-CHECK: For each item, apply the "so what" test. Does this connect to a user emotion, broken expectation, or behavioral consequence?

Step 6: The Verdict

As the simulated user:

Would I use this again? (YES / MAYBE / NO) and why
Would I recommend this? What would I tell someone?
What's the one thing that needs to change?
What surprised me positively?
Trust level (1-5)
Quality feel (indie-dev / competent / professional / premium)
Time to first value
Responsive quality (broken / adequate / good / seamless)
Interactive quality (broken / laggy / smooth / delightful)

Step 6.5: Dimension Scorecard

DIMENSION SCORECARD
================================================================================
First Impression        [1-5]  (visual quality, hierarchy, 50ms test)
Navigation & Flow       [1-5]  (can user complete the journey without confusion)
Emotional Arc           [1-5]  (confidence stays high, no valleys or plateaus)
Trust & Polish          [1-5]  (feels professional, no broken/missing states)
Time to Value           [1-5]  (user gets value fast, no unnecessary friction)
Responsive Quality      [1-5]  (works well on mobile, tablet, and desktop)
Interactive Quality     [1-5]  (state changes, feedback, hover/click responses)
================================================================================
COMPOSITE SCORE:        [N/35] ← sum of all dimensions
LOWEST SCORE:           [N]    ← this dimension gets fixed first

Rubric: 1=broken/bounces, 2=painful, 3=acceptable, 4=good, 5=excellent/delightful. A 5 requires zero HIGH/MEDIUM findings in that area AND passing all empirical thresholds AND screenshots confirm quality.

Step 7: Priority Fix List

Top 3-5 changes ranked by impact:

The fix (specific, actionable)
Why it matters (user terms, citing research)
Effort (quick fix / moderate / rebuild)
Expected impact (which research finding predicts improvement)
Dimension (which scorecard dimension improves)
Visual proof: which screenshot shows the problem

Advanced Modes

Speed Run (--mode speed)

Skip to friction. No praise, no preamble. Capture screenshots, friction inventory + priority fixes only.

Onboarding Mode (--mode onboard)

Simulate brand-new user. Track: time to understand, time to first value (benchmark: sub-5-minutes exceptional), where they'd give up, whether they'd complete setup without help.

Comparative Mode (--mode compare)

Two versions provided. Capture both. Side-by-side table showing which wins at each step.

Repeat Visit Mode (--mode repeat)

Simulate return visit. Muscle memory? Learning curve worth it?

QA Mode (--mode qa)

Structured checklist (not narrative). Functional (buttons respond, links navigate, forms work, validation works, empty states). Visual (no overlaps, no truncation, images load, fonts load). Responsive (intentional mobile, 44x44 targets, 16px min text, no horizontal scroll, thumb zone). Accessibility (contrast, heading hierarchy, landmarks, alt text, focus order).

Multi-Persona Mode (--persona)

Run simulation twice with different mental models. Highlight where personas diverge.

Perfection Loop (--loop) -- Karpathy Ratchet

Iterate until every dimension hits 5/5 using a git ratchet (UI can only get better, never worse).

Pass 1 (Full): Complete simulation (Steps 0-7). Record Dimension Scorecard as baseline. Commit if in git repo.
Fix: Implement ALL Priority Fix List items targeting the LOWEST dimension first. Edit source files.
Re-capture: Run capture script on modified files.
Re-evaluate (Speed): Score only dimensions below 5.
Ratchet decision:
- Composite IMPROVED: keep, commit "hx-loop: pass N (score M/35, +delta)"
- Composite SAME or WORSE: revert git reset HEAD~1, log failed approach, try different strategy
- Any dimension REGRESSED: revert that fix. Never trade one dimension for another.
Repeat until 35/35 or 7 passes max.

Loop rules:

Each pass focuses ONLY on dimensions below 5
ALWAYS re-capture after fixes (never score from code alone after pass 1)
Dimension stuck 2 consecutive passes: escalate to the user with blocking screenshots
Reverted fix was only viable approach: flag as "requires manual intervention"
On completion: show final scorecard, total fixes, files modified

Redesign Mode (--redesign)

After full simulation, generate single-page HTML mockup implementing top 3-5 fixes. Save to ./reports/{domain-or-project}-redesign.html. Run capture on the redesign to verify. Every visual change maps to a friction finding. Self-contained HTML, mobile-responsive.

Skip redesign if: complex stateful app, confidence is LOW, no HIGH findings.

When Things Go Wrong

Failure	Symptom	Recovery
Playwright not installed	capture.mjs fails	Code-only, cap confidence MEDIUM
Can't read source code	Live URL only	Screenshots + manifest only, confidence MEDIUM
Auth-gated URL	Redirects to login	Ask user for screenshots or credentials
Can't trace flow	Handlers in unknown services	Flag untraceable steps with `[INFERRED]`
Capture timeout	Complex SPA	Retry with `--throttle regular`
Loop fix breaks something	Screenshot regression	Revert via git ratchet
Loop stuck	Same score 2 passes	Stop, show user blocking screenshots
Dark mode untested	UI has dark mode	Re-run with `--dark-mode`
networkidle timeout	Third-party trackers prevent idle	Re-run with `--wait-until domcontentloaded`

Confidence Rating

HIGH: Code read + screenshots at all viewports + interactive testing
MEDIUM: Code but no screenshots, OR screenshots but no code
LOW: Description or partial screenshots only

Limitations

Cannot test actual network latency (can simulate with throttle)
Cannot test actual device hardware (touch pressure, haptics, orientation)
Cannot test with real user accounts/data unless provided
Video captures are silent (no audio UX testing)
Does not replace real user testing with diverse populations

Integration

Before: Run after any UI build or design change
After: Feed findings into build. HIGH+ friction = fix before shipping.

Output Format

The final deliverable is an HTML report saved to:

./reports/{domain-or-project}-ux-report.html

Copy captured screenshots to ./reports/{domain-or-project}-screenshots/ alongside the report.

CRITICAL: Always embed screenshots inline. Every screenshot reference in the report MUST render as an actual <img> tag with a relative path to the screenshots directory. Never reference a screenshot by filename alone without displaying it. The report should be self-contained: open the HTML, see the full analysis with visual evidence at every step.

Report sections map to the process steps: Executive Summary, First Impression (with viewport screenshots), Walk-Through (with interaction screenshots), CTA Funnel (with destination screenshots), Multi-Viewport Comparison (side-by-side), Emotional Arc, Friction Inventory (with evidence screenshots), Verdict, Dimension Scorecard, Priority Fix List.

In --mode speed or --mode qa, output to console instead of generating a report.

Rules

Stay in character as the user. Don't break into developer-brain.
Don't pad with praise. 80% of output on things that need fixing.
The emotional arc is the most important section. Developers skip it. Users live it.
Specificity over generality. "#1a1a2e on #16162a = 1.3:1 contrast, below 4.5:1 WCAG AA" beats "hard to find."

name	human-experience
description	Vision-powered UX simulation engine. Opens UIs in a real browser, takes screenshots, clicks through flows, records video, and analyzes everything through Claude's vision + empirical UX research. Replaces a 1000-person user testing panel with automated multi-viewport, multi-persona, interactive analysis. Karpathy-style ratchet loop for iterative perfection.
argument-hint	<file path or URL> [--mode speed\|onboard\|compare\|repeat\|qa] [--persona 'first-time client'] [--loop] [--redesign] [--video] [--funnel] [--signup] [--flow <json>] [--wait-until domcontentloaded\|networkidle\|load]
model	claude-opus-4-6