一键在 Manus 中运行任何 Skill

playtest-design

星标5

分支3

更新时间2026年4月4日 17:18

Question generation for playtests, what to observe vs. ask, metrics to track, and how to interpret playtest data without confirmation bias. Use when planning a playtest session, designing a feedback survey, setting up analytics, or when you have playtest data and need to make decisions from it.

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

rbergman

rbergman/dark-matter-marketplace

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

相关职业SOC

基于 SOC 职业分类

市场调研分析师与营销专员商业与金融运营类职业·SOC 13-1161

SKILL.md

readonly

同仓库更多 Skills

同仓库

lead

rbergman/dark-matter-marketplace

Activate at session start when using Agent Teams for complex multi-agent work. Establishes team lead role with delegation protocols, teammate spawning, model selection, and beads integration. You coordinate the team; teammates implement.

2026-05-285

repo-init

rbergman/dark-matter-marketplace

Initialize a new repository with standard scaffolding - git, gitignore, AGENTS.md, justfile, mise, beads, and timbers. Use when starting a new project or setting up an existing repo for Claude Code workflows.

2026-05-285

accessibility-design

rbergman/dark-matter-marketplace

Use when designing any player-facing feature, evaluating a game for accessibility, responding to accessibility feedback, designing difficulty or assist options, adding subtitle/caption systems, implementing input remapping, or when a player reports they can't play. Covers the four accessibility pillars (visual, auditory, motor, cognitive), implementation tiers, colorblind design, subtitle standards, input accessibility, and testing methodology. Accessibility is a design discipline, not a post-launch checklist.

2026-05-035

experience-design

rbergman/dark-matter-marketplace

Engagement loop design, pacing frameworks, the Experience Triangle (mechanics + dynamics + aesthetics), emotion layering across a session, and evaluating whether choices feel meaningful. Use when designing the core loop, structuring an emotional arc across 5-30 min sessions, debugging 'feels flat' or 'feels samey' play, evaluating whether decisions matter, planning peaks and valleys of intensity, or when playtesters describe sessions as 'fine but forgettable.' Sits one level above game-design (mechanic-level) and one below game-vision (north-star-level).

2026-05-035

game-balance

rbergman/dark-matter-marketplace

Numeric balance across game objects with stats — cost curves, transitive vs intransitive systems, dominant strategy detection, sandbagging signals, and anti-degenerate-strategy levers. Use when designing item or weapon stats, pricing storefronts, combat damage/HP/range numbers, upgrade trees, character classes, faction asymmetry, or anywhere two objects have numeric attributes that should relate fairly. Apply when playtesters say 'X is just better,' 'one path always wins,' or 'I never use Y.' Pairs with economy-design (currency flow) and progression-systems (curves over time).

2026-05-035

game-design

rbergman/dark-matter-marketplace

The mechanic-level evaluation toolkit — apply the 5-Component Framework (Clarity, Motivation, Response, Satisfaction, Fit) to any individual mechanic. Use when designing or evaluating a single mechanic, reviewing whether a feature pulls its weight, debugging why a specific action feels off, comparing alternative implementations of the same mechanic, or doing a first-pass critique on a player-facing feature. For session-level pacing or emotional arcs see experience-design; for moment-to-moment juice see game-feel; for system-of-systems interactions see systems-design. This skill is the per-mechanic lens.

2026-05-035

name	playtest-design
description	Question generation for playtests, what to observe vs. ask, metrics to track, and how to interpret playtest data without confirmation bias. Use when planning a playtest session, designing a feedback survey, setting up analytics, or when you have playtest data and need to make decisions from it.

Playtest Design

Purpose: Get useful signal from playtests. Most playtest sessions are wasted — observers confirm what they already believe, ask leading questions, and draw conclusions from noise. This skill provides structured methods to avoid those traps.

Influences: Frameworks here draw on cognitive UX research methodology, metrics-driven iterative design practice, and experience engineering theory (emergent behavior observation, planning under uncertainty).

When to Activate

Use this skill when:

Planning a playtest session (what to test, who to recruit, what to measure)
Designing post-playtest surveys or interview questions
Setting up analytics/metrics for ongoing data collection
Interpreting playtest results and deciding what to change
Resolving team disagreements about what the data shows

Core Principle: Observe, Then Ask

Players are reliable reporters of their experience (what they felt) but unreliable reporters of causes (why they felt it). Design your process accordingly.

Most Reliable ←———————————————→ Least Reliable
  What they did    What they felt    Why they think
  (behavior)       (experience)      they felt it
                                     (attribution)

Hierarchy of evidence:

Behavioral data — what players actually did (metrics, video, observation)
Experience reports — what players say they felt ("I was frustrated," "that was exciting")
Causal attribution — what players think caused their experience ("the controls are bad")

Players attributing frustration to "bad controls" might actually be experiencing a perception failure (they couldn't see the indicator) or a pacing problem (too many new concepts at once). Use behavior to diagnose; use self-report to locate.

Question Generation Framework

The Three-Pillar Method

Generate questions along the perception → attention → memory pipeline:

Perception Questions (Did they see it?)

Did the player notice [critical UI element / feedback / environmental cue]?
How long before they noticed?
Did they look at it before acting or after?
Did they confuse it with something else?

Attention Questions (Did they focus on the right thing?)

Where was the player looking during [critical moment]?
Did they engage with [intended system] or get distracted by [ancillary system]?
Did they understand what was important vs. optional?
Was there a moment where they seemed overwhelmed?

Memory Questions (Will they retain it?)

After a break, can they recall how to [key mechanic]?
Did they apply a lesson learned earlier to a later challenge?
Did they remember the goal after a distraction?
Can they explain the core rules to someone else?

Stage-Specific Questions

Dev Stage	Focus	Key Questions
Prototype	Core loop viability	Is the core action inherently interesting? Do they want to do it again?
Alpha	System comprehension	Do they understand the rules? Can they make intentional decisions?
Beta	Pacing and polish	Does the session arc feel right? Where do they get bored or frustrated?
Pre-launch	Edge cases and balance	What breaks? What's exploitable? What did we miss?

Observation Protocol

What to Watch (Not Ask)

Observable	What It Tells You
First action	What the UI communicates as "start here"
Hesitation points	Where clarity fails or cognitive load spikes
Repeated failures	Where difficulty exceeds skill (or UI is misleading)
Where they look	What's grabbing attention (intended or not)
Body language	Leaning in = engaged; leaning back = disengaged; fidgeting = frustrated
Utterances	Unprompted comments ("what?", "oh!", "come on") are gold
Where they quit	The most valuable data point you'll collect
What they skip	Content they ignore reveals priority mismatches

The Silent Observer Protocol

Say nothing unless they're about to break the test setup
Don't explain — if they're confused, that's data
Don't reassure — "you're doing great" biases the session
Note timestamps — when you feel the urge to help, write down the time and why
Record everything — your memory of the session will be biased toward your expectations

Metrics to Track

Core Metrics (Track Always)

Metric	What It Measures	Warning Signal
Session length	Engagement	Bimodal distribution (some quit fast, some stay long)
Quit points	Pain points	Cluster of quits at same location/moment
Completion rate	Difficulty/clarity	< 70% on intended-critical-path content
Time per section	Pacing	Sections taking 2x+ longer than designed
Death/failure rate	Difficulty curve	Spike = wall; zero = too easy

Balance Metrics (When Tuning Systems)

Metric	What It Measures	Warning Signal
Pick rate by option	Strategy diversity	One option > 50% pick rate
Win rate by strategy	Balance	Any strategy > 55% win rate at comparable skill
Average game/match length	Pacing	Games consistently shorter or longer than intended
Resource accumulation rate	Economy health	Exponential growth = inflation incoming
Strategy churn	Meta health	If dominant strategy shifts too fast, balance is noisy

UX Metrics (When Testing Comprehension)

Metric	What It Measures	Warning Signal
Time to first meaningful action	Onboarding quality	> 60 seconds before the player does something
Tutorial completion rate	Tutorial design	< 90% = tutorial is the problem, not the player
Hint/help usage	Clarity	High usage = UI isn't communicating; zero usage = help system is invisible
Error rate on intended actions	Usability	Player tries to do the right thing but fails due to UI

Avoiding Confirmation Bias

The biggest threat to useful playtest data is your own expectations.

Pre-Test Protocol

Before the session:

Write down your predictions — what do you expect to happen?
Define "surprising" outcomes — what would change your mind?
Assign a skeptic — one team member whose job is to challenge interpretations
Pre-commit to sample size — decide how many sessions before drawing conclusions (minimum 5 for qualitative, 30+ for quantitative)

Post-Test Protocol

After the session:

Review predictions vs. reality — where were you wrong? Those are the insights.
Separate observation from interpretation — "Player hesitated for 8 seconds at the door" (observation) vs. "Player didn't understand the door mechanic" (interpretation)
Look for disconfirming evidence — actively search for data that contradicts your preferred narrative
Quantify before concluding — "it felt like everyone struggled" vs. "3 of 7 players failed this section"
Delay solutions — understand the problem fully before proposing fixes

Common Bias Traps

Trap	Mechanism	Counter
Anchoring	First session dominates your impression	Review all sessions before concluding
Availability	Dramatic moments overshadow quiet ones	Use metrics, not memory
Projection	Attributing your own experience to players	Watch what they do, not what you'd do
Sunk cost	Defending features you spent time on	Ask "would we add this today?" not "should we cut this?"
Survivorship	Only hearing from players who stayed	Track quit points with equal priority

Survey Design

Good Questions (Experience-Focused)

"How would you describe the experience in one word?"
"What moment stands out most?" (Then probe: "What made it stand out?")
"Was there a point where you wanted to stop? What was happening?"
"What would you do differently on a second playthrough?"
"Rate how [specific emotion] you felt during [specific moment]" (1-5 scale)

Bad Questions (Leading or Attributive)

"Did you find the controls intuitive?" (Leading — assumes controls are the issue)
"What would you change?" (Too broad — gets surface-level answers)
"Did you like it?" (Binary, social pressure toward "yes")
"Was it too hard?" (Leading — frames difficulty as the variable)
"What features would you add?" (Players aren't designers; this generates noise)

The One-Question Shortcut

If you can only ask one question: "Tell me about a moment that stood out — good or bad."

Then follow up with: "What were you trying to do?" and "What happened next?"

Interpreting Data

Decision Framework

Signal	Confidence	Action
Metrics + observation + self-report all agree	High	Act on it
Metrics show it, observation confirms, self-report disagrees	Moderate-High	Trust behavior over self-report
Self-report says it, but metrics/observation don't show it	Low	Investigate further — the report may point to a different real problem
Single session shows it, others don't	Very Low	Note it but don't act — one data point isn't a pattern

Sample Size Guidance

5-8 sessions — finds ~85% of major usability problems
15-20 — identifies behavioral patterns
30+ — minimum for quantitative conclusions (win rates, balance)
A/B tests — require statistical power calculation; varies by effect size

Solo Developer Validation

When you're building alone, you can't run traditional playtests during development. These techniques bridge the gap:

Self-Testing Techniques

Technique	How	What It Catches
The 2-week break	Play your own game after not touching it for 2 weeks	UX failures, forgotten controls, unclear objectives
The mute test	Play with sound off	Audio-dependent information, missing visual feedback
The squint test	Squint at the screen or reduce resolution	Visual clarity, contrast, UI readability
The record-and-review	Record gameplay, watch it the next day	Pacing problems, dead time, repetitive patterns
The explain test	Explain what you're doing out loud while playing	Logic gaps, unjustified assumptions, unclear goals
The wrong-hand test	Play with your non-dominant hand	Input complexity, timing windows, control accessibility

Recruiting First Testers

When you're ready for external eyes (earlier than you think):

Friends/family who DON'T play games — best for UX/clarity testing
Friends who play games in your genre — best for feel/depth testing
Online communities (itch.io, indie forums, Discord) — best for unbiased feedback
Start with 3 testers — even 3 external players reveal more than 100 hours of self-play

Solo Metrics

If you're a solo developer shipping updates:

Track your own play session length (are YOU getting bored?)
Count deaths/failures per section (is difficulty spiking where you don't intend?)
Time each section (is pacing matching your design?)
Screenshot every moment of confusion or frustration — these are your UX bugs

Cross-References

game-design — Playtest scenarios from the 5-Component Framework (new player, stress, skill, abuse, readability tests)
systems-design — System health metrics (behavioral diversity, archetype formation) measured through playtesting
player-ux — The cognitive pillars (perception/attention/memory) drive the question generation framework
game-balance — Metrics-driven iteration for detecting and resolving balance problems
economy-design — Economy health monitoring metrics to track during playtests
experience-design — Testing whether the intended experience matches actual player experience
motivation-design — Testing retention and motivation through session length and return rate
encounter-design — Testing spatial readability and encounter fairness
narrative-design — Testing narrative comprehension and engagement
game-feel — "Does this feel good?" requires observation, not surveys