// Framework for clustering users into behavioral segments and personas. Use when analyzing user cohorts, identifying patterns, and personalizing product features.
| name | user-segmentation |
| description | Framework for clustering users into behavioral segments and personas. Use when analyzing user cohorts, identifying patterns, and personalizing product features. |
Cluster users into actionable segments based on behavioral patterns and usage data.
Use this skill when you need to:
Effective user segmentation analyzes behavior across multiple dimensions:
1. Engagement Level (Frequency & Recency)
2. Feature Usage (Breadth)
3. Value Signals (Quality)
Combine these dimensions to create personas - users who share similar behavioral patterns across all three.
Purpose: Gather raw data for each user across all dimensions
-- User Behavioral Metrics
-- Calculates engagement, feature usage, and value signals for each user
WITH user_activity AS (
SELECT
p.id as person_id,
p.name,
-- ENGAGEMENT METRICS
COUNT(DISTINCT m.id) as total_messages_sent,
COUNT(DISTINCT CASE WHEN m.provider_timestamp >= NOW() - INTERVAL '30 days' THEN m.id END) as messages_last_30d,
COUNT(DISTINCT CASE WHEN m.provider_timestamp >= NOW() - INTERVAL '7 days' THEN m.id END) as messages_last_7d,
NOW()::date - MAX(m.provider_timestamp)::date as days_since_last_message,
-- FEATURE USAGE METRICS
COUNT(DISTINCT CASE WHEN c.type = 'ONE_ON_ONE' THEN m.id END) as messages_1on1,
COUNT(DISTINCT CASE WHEN c.type = 'GROUP' THEN m.id END) as messages_group,
-- VALUE SIGNAL METRICS
COUNT(DISTINCT CASE
WHEN m.content LIKE 'Loved %' OR m.content LIKE 'Liked %' OR m.content LIKE 'Disliked %'
THEN m.id
END) as total_reactions,
-- TIME PATTERNS (optional)
AVG(EXTRACT(HOUR FROM m.provider_timestamp))::int as avg_hour_of_day,
MAX(m.provider_timestamp)::date as last_message_date
FROM persons p
LEFT JOIN person_contacts pc ON pc.person_id = p.id
LEFT JOIN message m ON m.sender_person_contact_id = pc.id
LEFT JOIN conversation c ON m.conversation_id = c.id
WHERE p.name NOT LIKE '%Coach%'
AND p.name NOT LIKE '%Codel%'
AND p.name NOT LIKE '%Wren%'
GROUP BY p.id, p.name
HAVING COUNT(DISTINCT m.id) > 10 -- Filter out users with insufficient data
)
SELECT
person_id,
name,
total_messages_sent,
messages_last_30d,
messages_last_7d,
total_reactions,
messages_1on1,
messages_group,
last_message_date,
days_since_last_message,
-- CALCULATED SEGMENTS
-- Engagement Level
CASE
WHEN messages_last_7d >= 50 THEN 'Very Active'
WHEN messages_last_7d >= 20 THEN 'Active'
WHEN messages_last_7d >= 5 THEN 'Moderate'
WHEN messages_last_30d >= 5 THEN 'Low'
WHEN days_since_last_message <= 60 THEN 'Inactive'
ELSE 'Churned'
END as engagement_level,
-- Reaction Behavior (Value Signal)
CASE
WHEN total_reactions >= 10 THEN 'Power Reactor'
WHEN total_reactions >= 3 THEN 'Regular Reactor'
WHEN total_reactions >= 1 THEN 'Occasional Reactor'
ELSE 'Non-Reactor'
END as reaction_behavior,
-- Conversation Preference (Feature Usage)
CASE
WHEN messages_1on1 > messages_group * 2 THEN '1:1 Focused'
WHEN messages_group > messages_1on1 * 2 THEN 'Group Focused'
WHEN messages_1on1 > 0 AND messages_group > 0 THEN 'Balanced'
WHEN messages_1on1 > 0 THEN '1:1 Only'
ELSE 'Group Only'
END as conversation_preference
FROM user_activity
ORDER BY total_messages_sent DESC
LIMIT 100;
How to Use:
arsenal/dot-claude/skills/sql-reader/connect.sh "
[Paste SQL above]
"
Expected Output: Table with each user's raw metrics and their segment assignments
Purpose: Understand how users are distributed across segments
-- Segment Distribution Summary
-- Shows percentage of users in each segment
WITH user_activity AS (
SELECT
p.id as person_id,
COUNT(DISTINCT m.id) as total_messages_sent,
COUNT(DISTINCT CASE WHEN m.provider_timestamp >= NOW() - INTERVAL '7 days' THEN m.id END) as messages_last_7d,
COUNT(DISTINCT CASE WHEN m.provider_timestamp >= NOW() - INTERVAL '30 days' THEN m.id END) as messages_last_30d,
COUNT(DISTINCT CASE
WHEN m.content LIKE 'Loved %' OR m.content LIKE 'Liked %' OR m.content LIKE 'Disliked %'
THEN m.id
END) as total_reactions,
COUNT(DISTINCT CASE WHEN c.type = 'ONE_ON_ONE' THEN m.id END) as messages_1on1,
COUNT(DISTINCT CASE WHEN c.type = 'GROUP' THEN m.id END) as messages_group,
NOW()::date - MAX(m.provider_timestamp)::date as days_since_last_message
FROM persons p
LEFT JOIN person_contacts pc ON pc.person_id = p.id
LEFT JOIN message m ON m.sender_person_contact_id = pc.id
LEFT JOIN conversation c ON m.conversation_id = c.id
WHERE p.name NOT LIKE '%Coach%'
AND p.name NOT LIKE '%Codel%'
AND p.name NOT LIKE '%Wren%'
GROUP BY p.id
HAVING COUNT(DISTINCT m.id) > 10
),
user_segments AS (
SELECT
CASE
WHEN messages_last_7d >= 50 THEN 'Very Active'
WHEN messages_last_7d >= 20 THEN 'Active'
WHEN messages_last_7d >= 5 THEN 'Moderate'
WHEN messages_last_30d >= 5 THEN 'Low'
WHEN days_since_last_message <= 60 THEN 'Inactive'
ELSE 'Churned'
END as engagement_level,
CASE
WHEN total_reactions >= 10 THEN 'Power Reactor'
WHEN total_reactions >= 3 THEN 'Regular Reactor'
WHEN total_reactions >= 1 THEN 'Occasional Reactor'
ELSE 'Non-Reactor'
END as reaction_behavior,
CASE
WHEN messages_1on1 > messages_group * 2 THEN '1:1 Focused'
WHEN messages_group > messages_1on1 * 2 THEN 'Group Focused'
WHEN messages_1on1 > 0 AND messages_group > 0 THEN 'Balanced'
WHEN messages_1on1 > 0 THEN '1:1 Only'
ELSE 'Group Only'
END as conversation_preference
FROM user_activity
)
SELECT
'=== ENGAGEMENT LEVEL DISTRIBUTION ===' as metric, '' as segment, 0 as user_count, 0.0 as percentage
UNION ALL
SELECT
'Engagement' as metric,
engagement_level as segment,
COUNT(*) as user_count,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER(), 1) as percentage
FROM user_segments
GROUP BY engagement_level
UNION ALL
SELECT '=== REACTION BEHAVIOR DISTRIBUTION ===' as metric, '' as segment, 0 as user_count, 0.0 as percentage
UNION ALL
SELECT
'Reaction' as metric,
reaction_behavior as segment,
COUNT(*) as user_count,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER(), 1) as percentage
FROM user_segments
GROUP BY reaction_behavior
UNION ALL
SELECT '=== CONVERSATION PREFERENCE DISTRIBUTION ===' as metric, '' as segment, 0 as user_count, 0.0 as percentage
UNION ALL
SELECT
'Conversation' as metric,
conversation_preference as segment,
COUNT(*) as user_count,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER(), 1) as percentage
FROM user_segments
GROUP BY conversation_preference
ORDER BY metric, user_count DESC;
Expected Output:
=== ENGAGEMENT LEVEL DISTRIBUTION ===
Engagement | Churned | 37 | 39.4%
Engagement | Inactive | 17 | 18.1%
Engagement | Very Active | 15 | 16.0%
...
=== REACTION BEHAVIOR DISTRIBUTION ===
Reaction | Non-Reactor | 42 | 44.7%
Reaction | Power Reactor | 34 | 36.2%
...
=== CONVERSATION PREFERENCE DISTRIBUTION ===
Conversation | Group Focused | 72 | 76.6%
Conversation | 1:1 Focused | 14 | 14.9%
...
Purpose: Create actionable user personas by combining all dimensions
-- User Personas: Combined Multi-Dimensional Segments
-- Creates archetypal users by combining engagement, feature usage, and value signals
WITH user_activity AS (
SELECT
p.id as person_id,
p.name,
COUNT(DISTINCT m.id) as total_messages_sent,
COUNT(DISTINCT CASE WHEN m.provider_timestamp >= NOW() - INTERVAL '7 days' THEN m.id END) as messages_last_7d,
COUNT(DISTINCT CASE WHEN m.provider_timestamp >= NOW() - INTERVAL '30 days' THEN m.id END) as messages_last_30d,
COUNT(DISTINCT CASE
WHEN m.content LIKE 'Loved %' OR m.content LIKE 'Liked %' OR m.content LIKE 'Disliked %'
THEN m.id
END) as total_reactions,
COUNT(DISTINCT CASE WHEN c.type = 'ONE_ON_ONE' THEN m.id END) as messages_1on1,
COUNT(DISTINCT CASE WHEN c.type = 'GROUP' THEN m.id END) as messages_group,
NOW()::date - MAX(m.provider_timestamp)::date as days_since_last_message,
AVG(EXTRACT(HOUR FROM m.provider_timestamp))::int as avg_hour_of_day
FROM persons p
LEFT JOIN person_contacts pc ON pc.person_id = p.id
LEFT JOIN message m ON m.sender_person_contact_id = pc.id
LEFT JOIN conversation c ON m.conversation_id = c.id
WHERE p.name NOT LIKE '%Coach%'
AND p.name NOT LIKE '%Codel%'
AND p.name NOT LIKE '%Wren%'
GROUP BY p.id, p.name
HAVING COUNT(DISTINCT m.id) > 10
),
user_personas AS (
SELECT
person_id,
name,
total_messages_sent,
total_reactions,
-- PERSONA ASSIGNMENT LOGIC
-- Priority matters - first match wins
CASE
-- Power Users: Very active + Power reactors
WHEN messages_last_7d >= 50 AND total_reactions >= 10
THEN 'Power User'
-- Engaged Couples: Very active in group chats
WHEN messages_last_7d >= 20 AND messages_group > messages_1on1 * 2
THEN 'Engaged Couple'
-- 1:1 Seekers: Prefer coaching over couple chat
WHEN messages_1on1 > messages_group * 2
THEN '1:1 Coaching Seeker'
-- Feedback Givers: High reaction ratio to message ratio
WHEN total_reactions::float / NULLIF(total_messages_sent, 0) > 0.1
THEN 'Feedback Giver'
-- Lurkers: Active but rarely react
WHEN messages_last_30d >= 20 AND total_reactions < 3
THEN 'Active Lurker'
-- At Risk: Was active but declining
WHEN days_since_last_message BETWEEN 14 AND 60 AND total_messages_sent > 100
THEN 'At Risk Churn'
-- Churned: Haven't used in 60+ days
WHEN days_since_last_message > 60
THEN 'Churned'
-- Casual Users: Low volume, occasional use
WHEN messages_last_30d < 20 AND total_messages_sent > 50
THEN 'Casual User'
-- New Users: Recently joined, still exploring
WHEN total_messages_sent < 100 AND days_since_last_message <= 30
THEN 'New User'
ELSE 'Other'
END as persona,
messages_last_7d,
messages_1on1,
messages_group,
days_since_last_message,
avg_hour_of_day
FROM user_activity
)
SELECT
persona,
COUNT(*) as user_count,
ROUND(AVG(total_messages_sent)::numeric) as avg_total_messages,
ROUND(AVG(total_reactions)::numeric) as avg_reactions,
ROUND(AVG(messages_last_7d)::numeric) as avg_weekly_messages,
ROUND((100.0 * AVG(messages_1on1::float / NULLIF(total_messages_sent, 0)))::numeric, 1) as pct_1on1,
ROUND(AVG(avg_hour_of_day)::numeric) as avg_active_hour,
STRING_AGG(name, ', ' ORDER BY total_messages_sent DESC) FILTER (WHERE total_messages_sent > 500) as top_users
FROM user_personas
GROUP BY persona
ORDER BY user_count DESC;
Expected Output:
persona | user_count | avg_total_messages | avg_reactions | avg_weekly_messages | pct_1on1 | avg_active_hour | top_users
--------------------+------------+--------------------+---------------+---------------------+----------+-----------------+-----------------------
Churned | 27 | 180 | 5 | 0 | 10.1 | 15 | lynn, anna, yoni
1:1 Coaching Seeker | 14 | 385 | 0 | 17 | 83.5 | 13 | Matt
Power User | 11 | 2832 | 234 | 166 | 3.2 | 13 | craig, amy, mark, vai
Feedback Giver | 11 | 275 | 38 | 2 | 3.8 | 14 | karen, ryan
Engaged Couple | 7 | 1046 | 53 | 82 | 5.8 | 14 | tracy, greg, arthur
At Risk Churn | 6 | 821 | 33 | 0 | 8.9 | 14 | abby, daniel
Other | 6 | 700 | 24 | 11 | 24.2 | 13 | samuel, Mallory
Casual User | 5 | 103 | 0 | 1 | 32.8 | 12 |
Active Lurker | 4 | 179 | 0 | 16 | 23.6 | 14 |
New User | 3 | 29 | 0 | 3 | 16.3 | 13 |
Criteria: messages_last_7d >= 50 AND total_reactions >= 10 Characteristics:
Criteria: messages_last_7d >= 20 AND messages_group > messages_1on1 * 2 Characteristics:
Criteria: messages_1on1 > messages_group * 2 Characteristics:
Criteria: total_reactions / total_messages > 0.1 Characteristics:
Criteria: messages_last_30d >= 20 AND total_reactions < 3 Characteristics:
Criteria: days_since_last_message BETWEEN 14 AND 60 AND total_messages_sent > 100 Characteristics:
Criteria: days_since_last_message > 60 Characteristics:
Criteria: messages_last_30d < 20 AND total_messages_sent > 50 Characteristics:
Criteria: total_messages_sent < 100 AND days_since_last_message <= 30 Characteristics:
# 1. Get raw metrics for each user
arsenal/dot-claude/skills/sql-reader/connect.sh "[Query 1]" > user_metrics.csv
# 2. Get segment distributions
arsenal/dot-claude/skills/sql-reader/connect.sh "[Query 2]" > segment_distribution.txt
# 3. Get user personas
arsenal/dot-claude/skills/sql-reader/connect.sh "[Query 3]" > user_personas.txt
Look at segment distribution to understand:
Based on personas, identify:
For each persona:
Question: Should we enable suggested responses by default?
Analysis:
Insight: Make SR opt-in based on reaction behavior segment
Action:
Question: Why are users churning?
Analysis:
Insight: At-risk users stopped after conflict without resolution
Action:
Question: Should we build group video calls or 1:1 video coaching?
Analysis:
Insight: More users prefer 1:1 coaching, but Engaged Couples are using product as intended
Decision:
To add a new segment dimension:
Step 1: Define the metric
-- Example: Add "Time of Day Preference"
AVG(EXTRACT(HOUR FROM m.provider_timestamp))::int as avg_hour_of_day
Step 2: Create classification logic
CASE
WHEN avg_hour_of_day BETWEEN 6 AND 12 THEN 'Morning User'
WHEN avg_hour_of_day BETWEEN 12 AND 17 THEN 'Afternoon User'
WHEN avg_hour_of_day BETWEEN 17 AND 22 THEN 'Evening User'
ELSE 'Night Owl'
END as time_preference
Step 3: Add to persona logic
-- Combine with other dimensions
WHEN messages_last_7d >= 50 AND avg_hour_of_day BETWEEN 22 AND 6
THEN 'Night Owl Power User'
Change the thresholds based on your product:
High-frequency products (daily use):
Low-frequency products (weekly use):
Different features:
Use with:
Feeds into:
Wrong: "Users aged 25-35" Right: "Power Users who engage daily" Why: Behavior predicts future behavior; demographics don't
Wrong: "Users who like the color blue" Right: "Users who prefer 1:1 coaching over group chat" Why: You can't build features for color preference
Wrong: Segment once, use forever Right: Re-segment monthly, track migration Why: Users move between segments (activate, churn, etc.)
Wrong: Build features for every segment equally Right: Prioritize high-value segments Why: Limited resources, 80/20 rule
| Metric | Engagement | Feature Usage | Value Signal |
|---|---|---|---|
| DAU/MAU ratio | ✅ | ||
| Messages per week | ✅ | ||
| Days since last use | ✅ | ||
| Feature adoption rate | ✅ | ||
| Features per user | ✅ | ||
| Reaction rate | ✅ | ||
| NPS | ✅ | ||
| Referrals | ✅ |
| Product Type | Very Active | Active | At Risk Days |
|---|---|---|---|
| Daily (social) | 50+/week | 20+/week | 7 days |
| Weekly (productivity) | 20+/month | 8+/month | 14 days |
| Monthly (coaching) | 10+/month | 4+/month | 30 days |
Remember: The goal isn't perfect segmentation. It's actionable segmentation that drives product decisions.