| name | data-visualization |
| description | Create effective data visualizations with Python. 优先使用 plotly(交互式图表),seaborn 和 matplotlib 作为备选(静态图表)。Use when building charts, choosing the right chart type for a dataset, creating publication-quality figures, or applying design principles like accessibility and color theory. |
Data Visualization Skill
Chart selection guidance, Python visualization code patterns, design principles, and accessibility considerations for creating effective data visualizations.
Chart Selection Guide
Choose by Data Relationship
| What You're Showing | Best Chart | Alternatives |
|---|
| Trend over time | Line chart | Area chart (if showing cumulative or composition) |
| Comparison across categories | Vertical bar chart | Horizontal bar (many categories), lollipop chart |
| Ranking | Horizontal bar chart | Dot plot, slope chart (comparing two periods) |
| Part-to-whole composition | Stacked bar chart | Treemap (hierarchical), waffle chart |
| Composition over time | Stacked area chart | 100% stacked bar (for proportion focus) |
| Distribution | Histogram | Box plot (comparing groups), violin plot, strip plot |
| Correlation (2 variables) | Scatter plot | Bubble chart (add 3rd variable as size) |
| Correlation (many variables) | Heatmap (correlation matrix) | Pair plot |
| Geographic patterns | Choropleth map | Bubble map, hex map |
| Flow / process | Sankey diagram | Funnel chart (sequential stages) |
| Relationship network | Network graph | Chord diagram |
| Performance vs. target | Bullet chart | Gauge (single KPI only) |
| Multiple KPIs at once | Small multiples | Dashboard with separate charts |
When NOT to Use Certain Charts
- Pie charts: Avoid unless <6 categories and exact proportions matter less than rough comparison. Humans are bad at comparing angles. Use bar charts instead.
- 3D charts: Never. They distort perception and add no information.
- Dual-axis charts: Use cautiously. They can mislead by implying correlation. Clearly label both axes if used.
- Stacked bar (many categories): Hard to compare middle segments. Use small multiples or grouped bars instead.
- Donut charts: Slightly better than pie charts but same fundamental issues. Use for single KPI display at most.
Python Visualization Code Patterns
📌 Library Priority
优先使用 Plotly(推荐)
- 交互式图表,支持缩放、悬停、筛选
- 适合探索性分析和仪表板
- 输出为 HTML 文件,易于分享
备选方案:Seaborn / Matplotlib
- 静态图表,适合报告和出版物
- 更精细的样式控制
- 轻量级,无需 JavaScript
Interactive Charts with Plotly(推荐优先使用)
Setup
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
Line Chart (Time Series)
fig = px.line(df, x='date', y='value', color='category',
title='Interactive Metric Trend',
labels={'value': 'Metric Value', 'date': 'Date'},
template='plotly_white')
fig.update_layout(
hovermode='x unified',
font=dict(size=12),
title_font_size=16
)
fig.write_html('interactive_chart.html')
fig.show()
Bar Chart (Comparison)
df_sorted = df.sort_values('metric', ascending=True)
fig = px.bar(df_sorted, x='metric', y='category',
orientation='h',
title='Metric by Category (Ranked)',
labels={'metric': 'Metric Value', 'category': 'Category'},
template='plotly_white')
fig.update_traces(marker_color='#4C72B0', text=df_sorted['metric'])
fig.update_layout(showlegend=False)
fig.show()
Scatter Plot (Correlation)
fig = px.scatter(df, x='metric_a', y='metric_b',
color='category',
size='size_metric',
hover_data=['name', 'detail_field'],
title='Correlation Analysis',
template='plotly_white')
fig.update_traces(marker=dict(line=dict(width=0.5, color='DarkSlateGrey')))
fig.show()
Histogram (Distribution)
fig = px.histogram(df, x='value', nbins=30,
title='Distribution of Values',
labels={'value': 'Value', 'count': 'Frequency'},
template='plotly_white')
mean_val = df['value'].mean()
median_val = df['value'].median()
fig.add_vline(x=mean_val, line_dash="dash", line_color="red",
annotation_text=f"Mean: {mean_val:.1f}")
fig.add_vline(x=median_val, line_dash="dash", line_color="green",
annotation_text=f"Median: {median_val:.1f}")
fig.show()
Heatmap
pivot = df.pivot_table(index='row_dim', columns='col_dim',
values='metric', aggfunc='sum')
fig = px.imshow(pivot,
text_auto=True,
aspect='auto',
color_continuous_scale='YlOrRd',
title='Metric by Row and Column Dimension')
fig.update_xaxes(side="bottom")
fig.show()
Small Multiples (Facets)
fig = px.line(df, x='date', y='value',
facet_col='category', facet_col_wrap=3,
title='Trends by Category',
template='plotly_white')
fig.update_yaxes(matches=None)
fig.show()
Static Charts with Matplotlib/Seaborn(备选方案)
Setup and Style
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import seaborn as sns
import pandas as pd
import numpy as np
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams.update({
'figure.figsize': (10, 6),
'figure.dpi': 150,
'font.size': 11,
'axes.titlesize': 14,
'axes.titleweight': 'bold',
'axes.labelsize': 11,
'xtick.labelsize': 10,
'ytick.labelsize': 10,
'legend.fontsize': 10,
'figure.titlesize': 16,
})
PALETTE_CATEGORICAL = ['#4C72B0', '#DD8452', '#55A868', '#C44E52', '#8172B3', '#937860']
PALETTE_SEQUENTIAL = 'YlOrRd'
PALETTE_DIVERGING = 'RdBu_r'
Line Chart (Time Series)
fig, ax = plt.subplots(figsize=(10, 6))
for label, group in df.groupby('category'):
ax.plot(group['date'], group['value'], label=label, linewidth=2)
ax.set_title('Metric Trend by Category', fontweight='bold')
ax.set_xlabel('Date')
ax.set_ylabel('Value')
ax.legend(loc='upper left', frameon=True)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
fig.autofmt_xdate()
plt.tight_layout()
plt.savefig('trend_chart.png', dpi=150, bbox_inches='tight')
Bar Chart (Comparison)
fig, ax = plt.subplots(figsize=(10, 6))
df_sorted = df.sort_values('metric', ascending=True)
bars = ax.barh(df_sorted['category'], df_sorted['metric'], color=PALETTE_CATEGORICAL[0])
for bar in bars:
width = bar.get_width()
ax.text(width + 0.5, bar.get_y() + bar.get_height()/2,
f'{width:,.0f}', ha='left', va='center', fontsize=10)
ax.set_title('Metric by Category (Ranked)', fontweight='bold')
ax.set_xlabel('Metric Value')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.tight_layout()
plt.savefig('bar_chart.png', dpi=150, bbox_inches='tight')
Histogram (Distribution)
fig, ax = plt.subplots(figsize=(10, 6))
ax.hist(df['value'], bins=30, color=PALETTE_CATEGORICAL[0], edgecolor='white', alpha=0.8)
mean_val = df['value'].mean()
median_val = df['value'].median()
ax.axvline(mean_val, color='red', linestyle='--', linewidth=1.5, label=f'Mean: {mean_val:,.1f}')
ax.axvline(median_val, color='green', linestyle='--', linewidth=1.5, label=f'Median: {median_val:,.1f}')
ax.set_title('Distribution of Values', fontweight='bold')
ax.set_xlabel('Value')
ax.set_ylabel('Frequency')
ax.legend()
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.tight_layout()
plt.savefig('histogram.png', dpi=150, bbox_inches='tight')
Heatmap
fig, ax = plt.subplots(figsize=(10, 8))
pivot = df.pivot_table(index='row_dim', columns='col_dim', values='metric', aggfunc='sum')
sns.heatmap(pivot, annot=True, fmt=',.0f', cmap='YlOrRd',
linewidths=0.5, ax=ax, cbar_kws={'label': 'Metric Value'})
ax.set_title('Metric by Row Dimension and Column Dimension', fontweight='bold')
ax.set_xlabel('Column Dimension')
ax.set_ylabel('Row Dimension')
plt.tight_layout()
plt.savefig('heatmap.png', dpi=150, bbox_inches='tight')
Small Multiples
categories = df['category'].unique()
n_cats = len(categories)
n_cols = min(3, n_cats)
n_rows = (n_cats + n_cols - 1) // n_cols
fig, axes = plt.subplots(n_rows, n_cols, figsize=(5*n_cols, 4*n_rows), sharex=True, sharey=True)
axes = axes.flatten() if n_cats > 1 else [axes]
for i, cat in enumerate(categories):
ax = axes[i]
subset = df[df['category'] == cat]
ax.plot(subset['date'], subset['value'], color=PALETTE_CATEGORICAL[i % len(PALETTE_CATEGORICAL)])
ax.set_title(cat, fontsize=12)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
for j in range(i+1, len(axes)):
axes[j].set_visible(False)
fig.suptitle('Trends by Category', fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.savefig('small_multiples.png', dpi=150, bbox_inches='tight')
Number Formatting Helpers
def format_number(val, format_type='number'):
"""Format numbers for chart labels."""
if format_type == 'currency':
if abs(val) >= 1e9:
return f'${val/1e9:.1f}B'
elif abs(val) >= 1e6:
return f'${val/1e6:.1f}M'
elif abs(val) >= 1e3:
return f'${val/1e3:.1f}K'
else:
return f'${val:,.0f}'
elif format_type == 'percent':
return f'{val:.1f}%'
elif format_type == 'number':
if abs(val) >= 1e9:
return f'{val/1e9:.1f}B'
elif abs(val) >= 1e6:
return f'{val/1e6:.1f}M'
elif abs(val) >= 1e3:
return f'{val/1e3:.1f}K'
else:
return f'{val:,.0f}'
return str(val)
ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, p: format_number(x, 'currency')))
Design Principles
Color
- Use color purposefully: Color should encode data, not decorate
- Highlight the story: Use a bright accent color for the key insight; grey everything else
- Sequential data: Use a single-hue gradient (light to dark) for ordered values
- Diverging data: Use a two-hue gradient with neutral midpoint for data with a meaningful center
- Categorical data: Use distinct hues, maximum 6-8 before it gets confusing
- Avoid red/green only: 8% of men are red-green colorblind. Use blue/orange as primary pair
Typography
- Title states the insight: "Revenue grew 23% YoY" beats "Revenue by Month"
- Subtitle adds context: Date range, filters applied, data source
- Axis labels are readable: Never rotated 90 degrees if avoidable. Shorten or wrap instead
- Data labels add precision: Use on key points, not every single bar
- Annotation highlights: Call out specific points with text annotations
Layout
- Reduce chart junk: Remove gridlines, borders, backgrounds that don't carry information
- Sort meaningfully: Categories sorted by value (not alphabetically) unless there's a natural order (months, stages)
- Appropriate aspect ratio: Time series wider than tall (3:1 to 2:1); comparisons can be squarer
- White space is good: Don't cram charts together. Give each visualization room to breathe
Accuracy
- Bar charts start at zero: Always. A bar from 95 to 100 exaggerates a 5% difference
- Line charts can have non-zero baselines: When the range of variation is meaningful
- Consistent scales across panels: When comparing multiple charts, use the same axis range
- Show uncertainty: Error bars, confidence intervals, or ranges when data is uncertain
- Label your axes: Never make the reader guess what the numbers mean
Accessibility Considerations
Color Blindness
- Never rely on color alone to distinguish data series
- Add pattern fills, different line styles (solid, dashed, dotted), or direct labels
- Test with a colorblind simulator (e.g., Coblis, Sim Daltonism)
- Use the colorblind-friendly palette:
sns.color_palette("colorblind")
Screen Readers
- Include alt text describing the chart's key finding
- Provide a data table alternative alongside the visualization
- Use semantic titles and labels
General Accessibility
- Sufficient contrast between data elements and background
- Text size minimum 10pt for labels, 12pt for titles
- Avoid conveying information only through spatial position (add labels)
- Consider printing: does the chart work in black and white?
Accessibility Checklist
Before sharing a visualization: