| name | talk-aggregation |
| description | Analyzes multiple talks to extract themes, expertise areas, and CNCF project focus |
| version | 1.0.0 |
Talk Aggregation Skill
Purpose
Analyze multiple presenter talks to identify expertise patterns, recurring themes, CNCF project focus, and speaking statistics. This skill synthesizes data across all talks to create a comprehensive view of the presenter's technical focus and community contributions.
Input Format
{
"talks": [
{
"video_id": "abc123",
"title": "Kubernetes Community Management Best Practices",
"date": "2025-10-15",
"duration": 2700,
"transcript": "Full transcript text...",
"description": "Jeffrey discusses community management...",
"event": "KubeCon North America 2025"
},
{
"video_id": "def456",
"title": "GitOps with Argo CD",
"date": "2024-05-20",
"duration": 1320,
"transcript": "Full transcript text...",
"description": "Introduction to GitOps practices...",
"event": "CloudNativeCon Europe 2024"
}
]
}
Field Descriptions:
talks (array): List of all talks by the presenter
video_id (string): YouTube video identifier
title (string): Talk title
date (string): Presentation date (YYYY-MM-DD format)
duration (number): Talk length in seconds
transcript (string): Full corrected transcript
description (string): YouTube video description
event (string, optional): Conference or event name
Output Format
{
"expertise_areas": [
{
"area": "Kubernetes",
"context": "Deep community involvement, contributor experience, scalability discussions",
"talk_count": 5,
"evidence": ["Container orchestration patterns", "Kubernetes governance", "SIG management"]
},
{
"area": "Community Management",
"context": "Best practices for open source communities, contributor engagement, sustainability",
"talk_count": 3,
"evidence": ["Building inclusive communities", "Maintainer burnout", "Governance models"]
}
],
"cncf_projects": [
{
"name": "Kubernetes",
"talk_count": 5,
"usage_context": "Container orchestration, community governance, contributor experience",
"first_mention": "2022-05",
"latest_mention": "2025-10"
},
{
"name": "Argo CD",
"talk_count": 3,
"usage_context": "GitOps continuous delivery, declarative deployments",
"first_mention": "2023-03",
"latest_mention": "2025-10"
}
],
"recurring_themes": [
"Open source community sustainability",
"Scalable infrastructure patterns",
"GitOps workflows and best practices",
"Developer experience improvements"
],
"talk_summaries": [
{
"video_id": "abc123",
"summary": "Explores best practices for managing large open source communities, focusing on contributor onboarding, maintainer support, and sustainable governance models. Draws from Kubernetes community experience to provide actionable insights.",
"key_points": [
"Contributor onboarding reduces time-to-first-PR",
"Maintainer rotation prevents burnout",
"Clear governance enables scaling"
],
"topics": ["Kubernetes", "Community Management", "Governance"]
}
],
"stats": {
"total_talks": 8,
"years_active": {
"first": 2022,
"latest": 2025,
"span": 3
},
"total_speaking_minutes": 272,
"most_active_year": 2025,
"average_talk_length_minutes": 34
}
}
Field Descriptions:
expertise_areas (array): Technical or domain expertise identified
area (string): Expertise domain name
context (string): Description of expertise manifestation
talk_count (number): Number of talks covering this area
evidence (array of strings): Specific topics demonstrating expertise
cncf_projects (array): CNCF projects discussed across talks
name (string): Official CNCF project name
talk_count (number): Number of talks mentioning project
usage_context (string): How project is discussed/used
first_mention (string): Earliest talk date (YYYY-MM format)
latest_mention (string): Most recent talk date (YYYY-MM format)
recurring_themes (array of strings): Cross-talk themes and patterns
talk_summaries (array): Concise summary for each talk
video_id (string): Video identifier
summary (string): 50-150 word talk summary
key_points (array of strings): 3-5 main takeaways
topics (array of strings): Primary topics covered
stats (object): Speaking activity statistics
total_talks (number): Total presentation count
years_active (object): Speaking timeframe
total_speaking_minutes (number): Sum of all talk durations
most_active_year (number): Year with most presentations
average_talk_length_minutes (number): Mean talk duration
Execution Instructions
Step 1: Analyze Individual Talks
For each talk, extract:
Topics Covered:
- Main technical subjects
- CNCF projects mentioned
- Problem domains addressed
- Solutions discussed
Key Points:
- Main arguments or lessons
- Novel insights
- Practical recommendations
- Case studies or examples
CNCF Projects:
- Project names (exact capitalization)
- How projects are used/discussed
- Context: implementation, comparison, case study
- Relationship between projects
Technical Depth:
- Implementation details
- Architecture patterns
- Specific features discussed
- Production experiences
Step 2: Identify Expertise Areas
Look across all talks to identify recurring technical domains:
Criteria for Expertise Area:
- Appears in 2+ talks OR
- Central to 1 major talk (>30 min) with deep technical content
- Demonstrable knowledge depth (not just mentions)
Common Expertise Areas:
- Specific technologies (Kubernetes, Prometheus, Istio)
- Technical domains (Observability, Security, Networking)
- Practices (GitOps, SRE, Platform Engineering)
- Organizational topics (Community Management, Developer Experience)
For each area, determine:
- Context: How expertise manifests (architecture, operations, community, etc.)
- Talk count: Number of talks covering this area
- Evidence: Specific subtopics that demonstrate expertise
Example:
{
"area": "Observability",
"context": "Distributed tracing, metrics collection, monitoring best practices for microservices",
"talk_count": 4,
"evidence": [
"Prometheus query optimization",
"Jaeger deployment patterns",
"OpenTelemetry instrumentation",
"SLO/SLI definition"
]
}
Step 3: Track CNCF Projects
Common CNCF Projects:
- Kubernetes, Prometheus, Envoy, CoreDNS, containerd
- Fluentd, Jaeger, Helm, Argo, Flux, Vitess
- Cilium, Linkerd, Istio, etcd, Harbor, Falco
- Dragonfly, Rook, TiKV, gRPC, CNI, Knative
- OpenTelemetry, SPIFFE, SPIRE, Cortex, Thanos
For each CNCF project mentioned:
- Count talks mentioning project
- Extract usage context - How is it used/discussed?
- Implementation ("deployed Istio for service mesh")
- Comparison ("evaluated Linkerd vs Istio")
- Case study ("Prometheus handles 10M metrics/sec")
- Tutorial ("how to configure Helm charts")
- Track timeline
- First mention date (YYYY-MM)
- Latest mention date (YYYY-MM)
Prioritization:
- Order by talk_count (descending)
- Include all projects mentioned in 2+ talks
- Include projects central to 1 major talk
Step 4: Identify Recurring Themes
Look for conceptual patterns across talks:
Theme Types:
Technical Patterns:
- "Scalable architecture design"
- "Progressive delivery strategies"
- "Multi-cluster management"
Organizational:
- "Developer experience optimization"
- "Platform engineering approaches"
- "Inner source adoption"
Community/Cultural:
- "Open source sustainability"
- "Building inclusive communities"
- "Contributor engagement"
Operational:
- "Production reliability practices"
- "Cost optimization strategies"
- "Security-first architecture"
Criteria for Theme:
- Appears in 3+ talks OR
- Central message of 2+ talks
- Represents higher-level concept (not specific tool)
Format: Short phrase capturing the theme (5-8 words)
Step 5: Generate Talk Summaries
For each talk, write a 50-150 word summary:
Structure:
- Opening sentence: Main topic/focus (what is this talk about?)
- Body (2-3 sentences): Key insights, approaches, or findings
- Closing: Outcome, recommendation, or takeaway
Extract 3-5 key points:
- Concrete takeaways
- Actionable recommendations
- Notable insights or findings
- Measurable results if applicable
Identify 3-5 primary topics:
- Specific technologies
- Domain areas
- Problem/solution categories
Example:
{
"video_id": "xyz789",
"summary": "Explores the challenges of scaling Kubernetes clusters beyond 1000 nodes, focusing on etcd performance, scheduler optimization, and network plugin selection. Presents real-world case studies from managing a 3000-node cluster, including lessons learned about control plane architecture and monitoring strategies. Provides actionable recommendations for organizations planning large-scale Kubernetes deployments.",
"key_points": [
"etcd performance becomes critical above 1000 nodes",
"Custom scheduler plugins reduce pod scheduling latency",
"CNI choice significantly impacts network performance",
"Control plane HA requires 5+ etcd members",
"Monitoring overhead grows non-linearly with scale"
],
"topics": ["Kubernetes", "Scalability", "etcd", "Networking", "Performance"]
}
Step 6: Calculate Statistics
Total Talks:
- Count of talks in input array
Years Active:
- Extract year from each talk date
first: Earliest year
latest: Most recent year
span: latest - first (years active)
Total Speaking Minutes:
- Sum all talk durations (in seconds)
- Convert to minutes:
sum(durations) / 60
- Round to nearest integer
Most Active Year:
- Count talks per year
- Return year with highest count
- If tie, use most recent year
Average Talk Length:
total_speaking_minutes / total_talks
- Round to nearest integer
Examples
Example 1: Multi-Topic Speaker
Input:
{
"talks": [
{
"video_id": "k8s001",
"title": "Scaling Kubernetes to 5000 Nodes",
"date": "2025-06-15",
"duration": 2400,
"transcript": "...discusses etcd performance, scheduler optimization...",
"description": "Deep dive into large-scale Kubernetes"
},
{
"video_id": "gitops002",
"title": "GitOps with Flux and Argo CD",
"date": "2025-03-20",
"duration": 1800,
"transcript": "...compares Flux and Argo CD for continuous delivery...",
"description": "GitOps patterns and tooling comparison"
},
{
"video_id": "k8s003",
"title": "Kubernetes Networking with Cilium",
"date": "2024-11-10",
"duration": 2100,
"transcript": "...eBPF-based networking and observability...",
"description": "Advanced Kubernetes networking"
}
]
}
Output:
{
"expertise_areas": [
{
"area": "Kubernetes",
"context": "Large-scale cluster management, networking, and performance optimization",
"talk_count": 2,
"evidence": [
"Scaling beyond 1000 nodes",
"etcd performance tuning",
"CNI selection and optimization",
"eBPF-based networking"
]
},
{
"area": "GitOps",
"context": "Continuous delivery patterns, tooling evaluation, and best practices",
"talk_count": 1,
"evidence": [
"Flux vs Argo CD comparison",
"Declarative deployment workflows",
"Git-based infrastructure management"
]
}
],
"cncf_projects": [
{
"name": "Kubernetes",
"talk_count": 2,
"usage_context": "Container orchestration at scale, networking and performance optimization",
"first_mention": "2024-11",
"latest_mention": "2025-06"
},
{
"name": "Cilium",
"talk_count": 1,
"usage_context": "eBPF-based networking and observability for Kubernetes",
"first_mention": "2024-11",
"latest_mention": "2024-11"
},
{
"name": "Flux",
"talk_count": 1,
"usage_context": "GitOps continuous delivery tool evaluation",
"first_mention": "2025-03",
"latest_mention": "2025-03"
},
{
"name": "Argo CD",
"talk_count": 1,
"usage_context": "GitOps continuous delivery tool evaluation",
"first_mention": "2025-03",
"latest_mention": "2025-03"
}
],
"recurring_themes": [
"Scalable infrastructure architecture",
"Production Kubernetes operations",
"GitOps deployment patterns"
],
"talk_summaries": [
{
"video_id": "k8s001",
"summary": "Examines the technical challenges of operating Kubernetes clusters at extreme scale, specifically addressing etcd performance bottlenecks, scheduler optimization techniques, and control plane architecture. Shares production experiences from managing a 5000-node cluster, including monitoring strategies and capacity planning approaches.",
"key_points": [
"etcd performance critical beyond 1000 nodes",
"Custom scheduler configuration reduces latency",
"Dedicated control plane nodes required at scale",
"Monitoring overhead grows non-linearly",
"Capacity planning requires predictive models"
],
"topics": ["Kubernetes", "Scalability", "etcd", "Performance", "Architecture"]
},
{
"video_id": "gitops002",
"summary": "Compares Flux and Argo CD as GitOps continuous delivery tools, evaluating features, architecture, and production suitability. Discusses declarative deployment patterns, multi-cluster management, and integration with existing CI/CD pipelines. Provides decision framework for tool selection.",
"key_points": [
"Flux better for Helm-centric workflows",
"Argo CD offers superior UI and visualization",
"Both support multi-cluster deployments",
"GitOps enables audit trails and rollback",
"Tool selection depends on existing toolchain"
],
"topics": ["GitOps", "Flux", "Argo CD", "Continuous Delivery", "Kubernetes"]
},
{
"video_id": "k8s003",
"summary": "Explores Cilium as an eBPF-based networking solution for Kubernetes, covering performance benefits, observability capabilities, and network policy enforcement. Demonstrates how eBPF technology provides deep visibility into network traffic and enables efficient packet processing without traditional iptables overhead.",
"key_points": [
"eBPF eliminates iptables performance bottlenecks",
"Cilium provides network-layer observability",
"Identity-based security policies more scalable",
"Hubble UI visualizes service dependencies",
"Network policies enforce zero-trust architecture"
],
"topics": ["Cilium", "Kubernetes", "Networking", "eBPF", "Observability"]
}
],
"stats": {
"total_talks": 3,
"years_active": {
"first": 2024,
"latest": 2025,
"span": 1
},
"total_speaking_minutes": 105,
"most_active_year": 2025,
"average_talk_length_minutes": 35
}
}
Example 2: Specialized Speaker
Input:
{
"talks": [
{
"video_id": "obs001",
"title": "OpenTelemetry in Production",
"date": "2025-09-12",
"duration": 1920,
"transcript": "...implementing distributed tracing at scale...",
"description": "OpenTelemetry adoption journey"
},
{
"video_id": "obs002",
"title": "Prometheus Query Optimization",
"date": "2025-05-18",
"duration": 1680,
"transcript": "...optimizing PromQL queries for large datasets...",
"description": "Performance tuning for Prometheus"
},
{
"video_id": "obs003",
"title": "Building Observability Culture",
"date": "2024-08-22",
"duration": 2520,
"transcript": "...organizational practices for effective observability...",
"description": "Cultural and organizational aspects"
}
]
}
Output:
{
"expertise_areas": [
{
"area": "Observability",
"context": "Distributed tracing, metrics collection, monitoring best practices, and organizational culture",
"talk_count": 3,
"evidence": [
"OpenTelemetry production deployment",
"Prometheus query optimization",
"SLO/SLI definition",
"Observability-driven development",
"Cross-team instrumentation standards"
]
}
],
"cncf_projects": [
{
"name": "OpenTelemetry",
"talk_count": 1,
"usage_context": "Distributed tracing and telemetry collection at scale",
"first_mention": "2025-09",
"latest_mention": "2025-09"
},
{
"name": "Prometheus",
"talk_count": 1,
"usage_context": "Metrics collection and query performance optimization",
"first_mention": "2025-05",
"latest_mention": "2025-05"
}
],
"recurring_themes": [
"Observability-driven development",
"Production monitoring at scale",
"Building data-driven engineering culture"
],
"talk_summaries": [
{
"video_id": "obs001",
"summary": "Documents the journey of implementing OpenTelemetry for distributed tracing across a microservices architecture. Covers instrumentation strategies, data volume management, sampling techniques, and integration with existing monitoring tools. Shares practical lessons from migrating from proprietary tracing to OpenTelemetry.",
"key_points": [
"Automatic instrumentation reduces adoption friction",
"Tail-based sampling controls data volume",
"Context propagation requires cross-team coordination",
"OpenTelemetry Collector provides flexibility",
"Migration from existing tools requires phased approach"
],
"topics": ["OpenTelemetry", "Distributed Tracing", "Observability", "Microservices"]
},
{
"video_id": "obs002",
"summary": "Explores techniques for optimizing Prometheus query performance when dealing with large-scale time-series data. Covers recording rules, query patterns to avoid, storage considerations, and federation strategies. Provides actionable recommendations for reducing query latency and resource consumption.",
"key_points": [
"Recording rules pre-compute expensive queries",
"Avoid high-cardinality labels in metrics",
"Query splitting reduces memory pressure",
"Remote read enables federation at scale",
"Alert queries need separate optimization"
],
"topics": ["Prometheus", "Performance Optimization", "Metrics", "Observability"]
},
{
"video_id": "obs003",
"summary": "Discusses organizational and cultural practices for building effective observability capabilities. Focuses on team collaboration, instrumentation standards, on-call practices, and using observability data to drive technical decisions. Emphasizes the importance of cross-functional buy-in and continuous improvement.",
"key_points": [
"Observability requires cross-team standards",
"SLOs align engineering and business goals",
"Instrumentation should be default, not optional",
"Postmortems drive observability improvements",
"Developer experience impacts adoption"
],
"topics": ["Observability", "Culture", "SRE", "Team Practices", "SLO"]
}
],
"stats": {
"total_talks": 3,
"years_active": {
"first": 2024,
"latest": 2025,
"span": 1
},
"total_speaking_minutes": 104,
"most_active_year": 2025,
"average_talk_length_minutes": 35
}
}
Quality Guidelines
Expertise Area Identification
- Minimum 2 talks for expertise claim (or 1 deep technical talk)
- Evidence-based: List specific subtopics demonstrating depth
- Avoid over-generalization: "Kubernetes" not "Cloud Computing"
- Context matters: Explain how expertise manifests
CNCF Project Tracking
- Use official names: "Argo CD" not "ArgoCD"
- Accurate context: Describe actual usage, not generic descriptions
- Timeline precision: Use YYYY-MM format from talk dates
- Prioritize by frequency: Order by talk_count descending
Recurring Theme Detection
- Higher-level concepts: Not specific tools, but patterns/practices
- Consistent across talks: Appears in multiple presentations
- Concise phrasing: 5-8 words max
- Actionable or descriptive: Clear meaning
Talk Summary Quality
- Length: 50-150 words (strict)
- Factual: Based on transcript/description content
- Specific: Concrete topics, not vague overviews
- Key points: 3-5 actionable takeaways
- Topics: 3-5 specific subject areas
Statistics Accuracy
- Count carefully: Verify total_talks matches input length
- Date parsing: Handle various date formats
- Duration conversion: Seconds to minutes, rounded
- Most active year: Handle ties (use most recent)
Common Pitfalls to Avoid
❌ Claiming Expertise Without Evidence
Bad:
{
"area": "Cloud Native Architecture",
"context": "General cloud-native knowledge",
"talk_count": 1,
"evidence": []
}
Why: Too broad, insufficient evidence
Good:
{
"area": "Service Mesh Architecture",
"context": "Istio implementation, sidecar patterns, traffic management",
"talk_count": 2,
"evidence": ["Istio deployment strategies", "mTLS configuration", "Traffic routing patterns"]
}
❌ Incorrect Project Names
Bad: "ArgoCD", "K8s", "OTel"
Good: "Argo CD", "Kubernetes", "OpenTelemetry"
❌ Vague Usage Context
Bad:
{
"name": "Prometheus",
"usage_context": "Used for monitoring"
}
Good:
{
"name": "Prometheus",
"usage_context": "Metrics collection, query optimization, large-scale time-series data management"
}
❌ Too Many Themes
Bad: 15 themes for 5 talks (over-segmented)
Good: 4-6 major themes that genuinely recur
❌ Summary Too Long or Too Short
Bad (too short): "This talk is about Kubernetes networking."
Bad (too long): 300-word detailed description
Good: 75-100 word focused summary with key insights
❌ Incorrect Statistics
Bad: Claiming 10 talks when input has 8
Good: Count matches input array length exactly
❌ Missing Evidence for Expertise
Bad:
{
"area": "Security",
"evidence": []
}
Good:
{
"area": "Security",
"evidence": ["Zero-trust architecture", "mTLS implementation", "RBAC policies", "Secrets management"]
}
Important Notes
- This skill feeds into
presenter-profile-generation skill
- Statistics drive the stats table in presenter profiles
- Expertise areas and themes become profile narrative content
- Talk summaries populate the "Talk Highlights" section
- CNCF project data identifies presenter's technical focus
- Quality here determines profile depth and accuracy
Validation Checklist
Before returning output, verify: