| name | python-plotting |
| description | Comprehensive plotting and visualization in Python - matplotlib (static publication-quality plots), seaborn (statistical visualization), and plotly (interactive plots); includes plot types, customization, best practices, and library selection guidance |
| allowed-tools | ["*"] |
Python Plotting & Visualization
Overview
Master data visualization in Python through three complementary libraries: matplotlib (foundational static plots), seaborn (statistical visualization), and plotly (interactive graphics). Each library has distinct strengths, and knowing when to use which—or how to combine them—is key to effective data communication.
Core value: Create publication-quality static plots, insightful statistical graphics, and interactive dashboards with the right tool for each visualization need.
Library Selection Guide
Matplotlib - The Foundation
Use when:
- Need fine-grained control over every plot element
- Creating publication-quality static figures
- Building custom visualizations not available elsewhere
- Working with low-level plotting requirements
- Need maximum compatibility (most widely supported)
Strengths:
- Complete control over every visual element
- Mature, stable, extensively documented
- Foundation for seaborn and other libraries
- Export to any format (PDF, SVG, PNG, etc.)
- Extensive customization options
Weaknesses:
- Verbose for common statistical plots
- Steeper learning curve for complex plots
- Static output (no interactivity)
- Default aesthetics less modern
Seaborn - Statistical Visualization
Use when:
- Creating statistical plots (distributions, correlations, regressions)
- Need attractive defaults with minimal code
- Working with pandas DataFrames
- Want automatic statistical estimation
- Need faceted/multi-panel plots
Strengths:
- Concise syntax for statistical plots
- Beautiful default themes
- Built-in statistical estimation
- Seamless pandas integration
- FacetGrid for complex multi-panel layouts
Weaknesses:
- Less control than matplotlib
- Limited to statistical plot types
- Static output only
- Requires understanding of underlying matplotlib for deep customization
Plotly - Interactive Visualization
Use when:
- Need interactive plots (zoom, hover, pan)
- Building dashboards or web apps
- Want 3D visualizations
- Need animations
- Sharing exploratory analysis
Strengths:
- Rich interactivity out-of-the-box
- Beautiful defaults
- 3D and geographic plotting
- Integration with Dash for web apps
- Export to HTML for sharing
Weaknesses:
- Larger file sizes
- Less control than matplotlib for publications
- Different API paradigm
- Not ideal for static publication figures
Quick Decision Tree
Need interactivity?
├─ Yes → Plotly
└─ No → Statistical plot?
├─ Yes → Seaborn (can customize with matplotlib)
└─ No → Complex customization needed?
├─ Yes → Matplotlib
└─ No → Seaborn or Matplotlib (preference)
Matplotlib - Foundational Plotting
Two APIs: pyplot vs Object-Oriented
pyplot (MATLAB-style, implicit state):
import matplotlib.pyplot as plt
plt.plot([1, 2, 3], [1, 4, 9])
plt.xlabel('x')
plt.ylabel('y')
plt.title('Simple Plot')
plt.show()
Object-Oriented (explicit, recommended for complex plots):
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot([1, 2, 3], [1, 4, 9])
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_title('Simple Plot')
plt.show()
Recommendation: Use OO API for scripts, functions, and complex plots. Use pyplot for quick interactive exploration.
Figure Anatomy
import matplotlib.pyplot as plt
import numpy as np
fig, axes = plt.subplots(2, 2, figsize=(10, 8))
ax1 = axes[0, 0]
ax2 = axes[0, 1]
ax3 = axes[1, 0]
ax4 = axes[1, 1]
ax1.plot([1, 2, 3], [1, 4, 9])
ax2.scatter([1, 2, 3], [1, 4, 9])
ax3.bar([1, 2, 3], [1, 4, 9])
ax4.hist(np.random.randn(1000))
plt.tight_layout()
plt.show()
Common Plot Types
Line Plot:
fig, ax = plt.subplots()
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)
ax.plot(x, y1, label='sin(x)', linewidth=2, color='blue', linestyle='-')
ax.plot(x, y2, label='cos(x)', linewidth=2, color='red', linestyle='--')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_title('Trigonometric Functions')
ax.legend()
ax.grid(True, alpha=0.3)
plt.show()
Scatter Plot:
fig, ax = plt.subplots()
x = np.random.randn(100)
y = 2*x + np.random.randn(100)*0.5
colors = np.random.rand(100)
sizes = 100 * np.random.rand(100)
scatter = ax.scatter(x, y, c=colors, s=sizes, alpha=0.6, cmap='viridis')
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_title('Scatter Plot with Color and Size')
plt.colorbar(scatter, ax=ax, label='Color Value')
plt.show()
Bar Plot:
fig, ax = plt.subplots()
categories = ['A', 'B', 'C', 'D', 'E']
values = [23, 45, 56, 78, 32]
bars = ax.bar(categories, values, color=['red', 'blue', 'green', 'orange', 'purple'])
for bar, value in zip(bars, values):
height = bar.get_height()
ax.text(bar.get_x() + bar.get_width()/2., height,
f'{value}',
ha='center', va='bottom')
ax.set_ylabel('Values')
ax.set_title('Bar Chart')
plt.show()
Histogram:
fig, ax = plt.subplots()
data = np.random.randn(1000)
ax.hist(data, bins=30, alpha=0.7, color='skyblue', edgecolor='black')
ax.set_xlabel('Value')
ax.set_ylabel('Frequency')
ax.set_title('Histogram')
ax.axvline(data.mean(), color='red', linestyle='--', linewidth=2, label='Mean')
ax.legend()
plt.show()
Subplots with Shared Axes:
fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True, figsize=(10, 6))
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)
ax1.plot(x, y1)
ax1.set_ylabel('sin(x)')
ax1.grid(True)
ax2.plot(x, y2, color='red')
ax2.set_xlabel('x')
ax2.set_ylabel('cos(x)')
ax2.grid(True)
plt.tight_layout()
plt.show()
Customization
Styles:
import matplotlib.pyplot as plt
print(plt.style.available)
plt.style.use('seaborn-v0_8-darkgrid')
with plt.style.context('ggplot'):
plt.plot([1, 2, 3], [1, 4, 9])
plt.show()
Colors and Colormaps:
ax.plot(x, y, color='steelblue')
ax.plot(x, y, color='#FF6347')
ax.plot(x, y, color=(0.2, 0.4, 0.6))
from matplotlib import cm
colors = cm.viridis(np.linspace(0, 1, 10))
for i, color in enumerate(colors):
ax.plot([i, i+1], [0, 1], color=color)
Markers and Line Styles:
ax.plot(x, y,
marker='o',
markersize=8,
markerfacecolor='red',
markeredgecolor='black',
markeredgewidth=2,
linestyle='--',
linewidth=2,
color='blue')
Legends:
ax.plot(x, y1, label='Data 1')
ax.plot(x, y2, label='Data 2')
ax.legend(
loc='upper right',
frameon=True,
shadow=True,
fancybox=True,
fontsize=12
)
Annotations:
ax.plot(x, y)
ax.annotate(
'Maximum',
xy=(x[50], y[50]),
xytext=(x[50]+1, y[50]+1),
arrowprops=dict(arrowstyle='->', color='red', lw=2),
fontsize=12,
color='red'
)
Saving Figures:
fig.savefig('plot.png', dpi=300, bbox_inches='tight')
fig.savefig('plot.pdf', bbox_inches='tight')
fig.savefig('plot.svg', bbox_inches='tight')
fig.savefig('plot.png', transparent=True, bbox_inches='tight')
Seaborn - Statistical Visualization
Basic Setup
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
sns.set_theme(style='whitegrid')
tips = sns.load_dataset('tips')
Distribution Plots
Histogram with KDE:
fig, ax = plt.subplots()
sns.histplot(data=tips, x='total_bill', kde=True, ax=ax)
ax.set_title('Distribution of Total Bill')
plt.show()
KDE Plot:
fig, ax = plt.subplots()
sns.kdeplot(data=tips, x='total_bill', hue='time', fill=True, ax=ax)
ax.set_title('Total Bill Distribution by Time')
plt.show()
Distribution Plot (ECDF):
fig, ax = plt.subplots()
sns.ecdfplot(data=tips, x='total_bill', hue='sex', ax=ax)
ax.set_title('Cumulative Distribution of Total Bill')
plt.show()
Categorical Plots
Box Plot:
fig, ax = plt.subplots(figsize=(10, 6))
sns.boxplot(data=tips, x='day', y='total_bill', hue='sex', ax=ax)
ax.set_title('Total Bill by Day and Sex')
plt.show()
Violin Plot:
fig, ax = plt.subplots(figsize=(10, 6))
sns.violinplot(data=tips, x='day', y='total_bill', hue='sex',
split=True, ax=ax)
ax.set_title('Total Bill Distribution by Day')
plt.show()
Strip Plot / Swarm Plot:
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
sns.stripplot(data=tips, x='day', y='total_bill', hue='sex',
dodge=True, alpha=0.6, ax=axes[0])
axes[0].set_title('Strip Plot')
sns.swarmplot(data=tips, x='day', y='total_bill', hue='sex',
dodge=True, ax=axes[1])
axes[1].set_title('Swarm Plot')
plt.tight_layout()
plt.show()
Bar Plot with Error Bars:
fig, ax = plt.subplots()
sns.barplot(data=tips, x='day', y='total_bill', hue='sex',
errorbar=("ci", 95),
ax=ax)
ax.set_title('Average Total Bill by Day')
plt.show()
Relational Plots
Scatter Plot:
fig, ax = plt.subplots(figsize=(10, 6))
sns.scatterplot(data=tips, x='total_bill', y='tip',
hue='time', size='size', style='sex',
sizes=(50, 200), alpha=0.6, ax=ax)
ax.set_title('Tip vs Total Bill')
plt.show()
Line Plot:
fmri = sns.load_dataset('fmri')
fig, ax = plt.subplots(figsize=(10, 6))
sns.lineplot(data=fmri, x='timepoint', y='signal',
hue='event', style='region', ax=ax)
ax.set_title('fMRI Signal Over Time')
plt.show()
Regression Plots
Linear Regression:
fig, ax = plt.subplots()
sns.regplot(data=tips, x='total_bill', y='tip', ax=ax)
ax.set_title('Linear Regression: Tip vs Total Bill')
plt.show()
Residual Plot:
fig, ax = plt.subplots()
sns.residplot(data=tips, x='total_bill', y='tip', ax=ax)
ax.set_title('Residual Plot')
ax.axhline(0, color='red', linestyle='--')
plt.show()
Heatmaps and Matrices
Correlation Heatmap:
fig, ax = plt.subplots(figsize=(8, 6))
corr = tips[['total_bill', 'tip', 'size']].corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', center=0,
square=True, linewidths=1, ax=ax)
ax.set_title('Correlation Matrix')
plt.show()
Clustermap:
iris = sns.load_dataset('iris')
iris_data = iris.drop('species', axis=1)
g = sns.clustermap(iris_data.T, cmap='viridis',
standard_scale=1, figsize=(8, 8))
plt.show()
Multi-Panel Plots
FacetGrid:
g = sns.FacetGrid(tips, col='time', row='sex',
margin_titles=True, height=4)
g.map(sns.scatterplot, 'total_bill', 'tip', alpha=0.6)
g.set_axis_labels('Total Bill', 'Tip')
g.set_titles(col_template="{col_name}", row_template="{row_name}")
g.add_legend()
plt.show()
PairGrid:
iris = sns.load_dataset('iris')
g = sns.PairGrid(iris, hue='species', height=2.5)
g.map_upper(sns.scatterplot)
g.map_lower(sns.kdeplot)
g.map_diag(sns.histplot)
g.add_legend()
plt.show()
Pairplot (simpler):
sns.pairplot(iris, hue='species', diag_kind='kde', height=2.5)
plt.show()
Themes and Color Palettes
Setting Themes:
sns.set_theme(style='whitegrid')
sns.set_context('talk')
sns.set_theme(style='darkgrid', context='poster',
palette='deep', font_scale=1.2)
Color Palettes:
sns.set_palette('deep')
sns.set_palette('Blues')
sns.set_palette('coolwarm')
custom_palette = ['#FF6347', '#4682B4', '#32CD32']
sns.set_palette(custom_palette)
sns.color_palette('husl', 10).plot()
plt.show()
Plotly - Interactive Visualization
Plotly Express - High-Level API
Scatter Plot:
import plotly.express as px
df = px.data.iris()
fig = px.scatter(df, x='sepal_width', y='sepal_length',
color='species', size='petal_length',
hover_data=['petal_width'],
title='Iris Dataset')
fig.show()
Line Plot:
df = px.data.gapminder()
fig = px.line(df[df['country'] == 'Canada'],
x='year', y='gdpPercap',
title='Canada GDP per Capita')
fig.show()
Bar Chart:
df = px.data.tips()
fig = px.bar(df, x='day', y='total_bill', color='sex',
barmode='group',
title='Total Bill by Day and Sex')
fig.show()
Histogram:
fig = px.histogram(df, x='total_bill', color='sex',
marginal='box',
title='Total Bill Distribution')
fig.show()
Box Plot:
fig = px.box(df, x='day', y='total_bill', color='sex',
title='Total Bill by Day')
fig.show()
Heatmap:
import numpy as np
z = np.random.randn(20, 20)
fig = px.imshow(z, color_continuous_scale='RdBu_r',
title='Heatmap')
fig.show()
3D Scatter:
df = px.data.iris()
fig = px.scatter_3d(df, x='sepal_length', y='sepal_width', z='petal_length',
color='species', size='petal_width',
title='3D Iris Dataset')
fig.show()
Animated Plots:
df = px.data.gapminder()
fig = px.scatter(df, x='gdpPercap', y='lifeExp',
animation_frame='year',
animation_group='country',
size='pop', color='continent',
hover_name='country',
log_x=True, size_max=60,
range_x=[100, 100000], range_y=[25, 90],
title='Gapminder Data Over Time')
fig.show()
Graph Objects - Low-Level API
Basic Scatter:
import plotly.graph_objects as go
x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]
fig = go.Figure(data=go.Scatter(x=x, y=y, mode='markers+lines'))
fig.update_layout(
title='Custom Scatter Plot',
xaxis_title='X Axis',
yaxis_title='Y Axis'
)
fig.show()
Multiple Traces:
fig = go.Figure()
fig.add_trace(go.Scatter(x=[1, 2, 3], y=[1, 4, 9],
mode='lines', name='Series 1'))
fig.add_trace(go.Scatter(x=[1, 2, 3], y=[2, 5, 10],
mode='lines+markers', name='Series 2'))
fig.update_layout(title='Multiple Series')
fig.show()
Subplots:
from plotly.subplots import make_subplots
fig = make_subplots(
rows=2, cols=2,
subplot_titles=('Plot 1', 'Plot 2', 'Plot 3', 'Plot 4')
)
fig.add_trace(go.Scatter(x=[1, 2, 3], y=[1, 4, 9]), row=1, col=1)
fig.add_trace(go.Bar(x=[1, 2, 3], y=[2, 5, 8]), row=1, col=2)
fig.add_trace(go.Scatter(x=[1, 2, 3], y=[3, 6, 9]), row=2, col=1)
fig.add_trace(go.Box(y=[1, 2, 3, 4, 5, 6, 7]), row=2, col=2)
fig.update_layout(height=600, showlegend=False, title_text='Subplots')
fig.show()
Interactive Features:
fig = go.Figure()
fig.add_trace(go.Scatter(x=[1, 2, 3, 4], y=[10, 11, 12, 13]))
fig.update_layout(
updatemenus=[
dict(
buttons=list([
dict(label="Linear",
method="relayout",
args=[{"yaxis.type": "linear"}]),
dict(label="Log",
method="relayout",
args=[{"yaxis.type": "log"}])
]),
direction="down",
)
]
)
fig.show()
Exporting:
fig.write_html('plot.html')
fig.write_image('plot.png', width=1200, height=800)
fig.write_image('plot.pdf')
Integration Patterns
Seaborn with Matplotlib Customization
import seaborn as sns
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10, 6))
sns.boxplot(data=tips, x='day', y='total_bill', ax=ax)
ax.set_title('Customized Seaborn Plot', fontsize=16, fontweight='bold')
ax.set_xlabel('Day of Week', fontsize=12)
ax.set_ylabel('Total Bill ($)', fontsize=12)
ax.grid(axis='y', alpha=0.3)
ax.text(0.5, 0.95, 'Custom annotation',
transform=ax.transAxes, ha='center')
plt.tight_layout()
plt.show()
Plotly with Pandas
import pandas as pd
import plotly.express as px
df = pd.DataFrame({
'x': range(10),
'y': [i**2 for i in range(10)],
'category': ['A', 'B'] * 5
})
fig = px.line(df, x='x', y='y', color='category')
fig.show()
Best Practices
1. Choose Appropriate Plot Type
Distributions:
- Histogram, KDE plot, box plot, violin plot
- Use: Understanding data spread and shape
Comparisons:
- Bar chart, box plot, strip plot
- Use: Comparing groups or categories
Relationships:
- Scatter plot, line plot, regression plot
- Use: Showing correlations or trends
Compositions:
- Stacked bar, pie chart (use sparingly), treemap
- Use: Part-to-whole relationships
Time Series:
- Line plot, area chart
- Use: Temporal patterns
2. Design Principles
Less is More:
fig, ax = plt.subplots()
ax.plot(x, y, linewidth=5, color='red', linestyle='--',
marker='o', markersize=15, markerfacecolor='yellow')
ax.grid(True, linewidth=2, color='blue')
ax.set_facecolor('lightgray')
fig, ax = plt.subplots()
ax.plot(x, y, linewidth=2, color='steelblue')
ax.grid(True, alpha=0.3)
Effective Use of Color:
sns.set_palette('colorblind')
import plotly.express as px
fig = px.scatter(df, x='x', y='y', color='category',
color_discrete_sequence=px.colors.qualitative.Safe)
Readable Text:
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x, y)
ax.set_title('Clear Title', fontsize=16, pad=20)
ax.set_xlabel('X Axis Label', fontsize=12)
ax.set_ylabel('Y Axis Label', fontsize=12)
ax.tick_params(labelsize=10)
3. Consistent Styling
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme(style='whitegrid', context='talk')
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['figure.dpi'] = 100
4. Save High-Quality Figures
fig.savefig('figure.pdf', dpi=300, bbox_inches='tight')
fig.savefig('figure.png', dpi=300, bbox_inches='tight')
fig.savefig('figure.png', dpi=150, bbox_inches='tight')
fig.savefig('figure.png', dpi=96, bbox_inches='tight', optimize=True)
Common Gotchas
1. Figure Size and DPI
fig, ax = plt.subplots()
fig, ax = plt.subplots(figsize=(10, 6))
2. Overlapping Labels
fig, ax = plt.subplots()
ax.bar(range(10), values)
ax.set_xticklabels(long_labels)
ax.set_xticklabels(long_labels, rotation=45, ha='right')
plt.tight_layout()
3. Color Mapping Consistency
sns.scatterplot(data=df1, x='x', y='y', hue='category')
sns.scatterplot(data=df2, x='x', y='y', hue='category')
palette = {'A': 'red', 'B': 'blue', 'C': 'green'}
sns.scatterplot(data=df1, x='x', y='y', hue='category', palette=palette)
sns.scatterplot(data=df2, x='x', y='y', hue='category', palette=palette)
4. Seaborn Doesn't Return Axes
ax = sns.boxplot(data=df, x='x', y='y')
fig, ax = plt.subplots()
sns.boxplot(data=df, x='x', y='y', ax=ax)
ax.set_title('My Title')
5. Plotly Memory with Large Datasets
fig = px.scatter(huge_df)
fig = px.scatter(huge_df.sample(10000))
6. Matplotlib State Machine Confusion
plt.figure()
ax = plt.gca()
ax.plot(x, y)
plt.xlabel('x')
ax.set_ylabel('y')
fig, ax = plt.subplots()
ax.plot(x, y)
ax.set_xlabel('x')
ax.set_ylabel('y')
Quick Reference
Matplotlib
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x, y)
ax.scatter(x, y)
ax.bar(x, y)
ax.hist(data, bins=30)
ax.boxplot([data1, data2])
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_title('Title')
ax.legend()
ax.grid(True)
fig.savefig('plot.png', dpi=300, bbox_inches='tight')
Seaborn
import seaborn as sns
sns.set_theme(style='whitegrid')
sns.histplot(data=df, x='col')
sns.scatterplot(data=df, x='x', y='y', hue='category')
sns.boxplot(data=df, x='category', y='value')
sns.heatmap(corr_matrix, annot=True)
sns.pairplot(df, hue='species')
Plotly Express
import plotly.express as px
fig = px.scatter(df, x='x', y='y', color='category')
fig = px.line(df, x='x', y='y')
fig = px.bar(df, x='x', y='y', color='category')
fig = px.histogram(df, x='value')
fig = px.box(df, x='category', y='value')
fig.show()
fig.write_html('plot.html')
Installation
pip install matplotlib
pip install seaborn
pip install plotly
pip install kaleido
pip install matplotlib seaborn plotly kaleido
Additional Resources
Related Skills
pycse - Scientific computing with confidence intervals for plots
python-ase - Atomic structure visualization
python-best-practices - Code quality for visualization scripts