en un clic
pandas-data-analysis
Master data manipulation, analysis, and visualization with Pandas, NumPy, and Matplotlib
Menu
Master data manipulation, analysis, and visualization with Pandas, NumPy, and Matplotlib
Python debugging techniques, pdb, and IDE debugging tools
FastAPI web framework for building modern APIs with async support
Python machine learning with scikit-learn, PyTorch, and TensorFlow
Python security best practices, OWASP, and vulnerability prevention
Python type hints, type checking, and static analysis with mypy
Master asynchronous programming with asyncio, async/await, concurrent operations, and async frameworks
| name | Pandas Data Analysis |
| description | Master data manipulation, analysis, and visualization with Pandas, NumPy, and Matplotlib |
| version | 2.1.0 |
| sasmp_version | 1.3.0 |
| bonded_agent | 03-data-science |
| bond_type | PRIMARY_BOND |
| retry_strategy | exponential_backoff |
| observability | {"logging":true,"metrics":"data_processing_time"} |
Master data analysis with Pandas, the powerful Python library for data manipulation and analysis. Learn to clean, transform, analyze, and visualize data effectively.
Code Example:
import pandas as pd
import numpy as np
# Create DataFrame
data = {
'name': ['Alice', 'Bob', 'Charlie', 'David'],
'age': [25, 30, 35, 28],
'salary': [50000, 60000, 75000, 55000],
'department': ['IT', 'HR', 'IT', 'Sales']
}
df = pd.DataFrame(data)
# Indexing and filtering
it_employees = df[df['department'] == 'IT']
high_earners = df.loc[df['salary'] > 55000, ['name', 'salary']]
# Adding calculated columns
df['annual_bonus'] = df['salary'] * 0.10
df['age_group'] = pd.cut(df['age'], bins=[0, 30, 40, 100], labels=['Young', 'Mid', 'Senior'])
print(df)
Code Example:
import pandas as pd
# Load data with missing values
df = pd.read_csv('sales_data.csv')
# Handle missing values
df['price'].fillna(df['price'].median(), inplace=True)
df['category'].fillna('Unknown', inplace=True)
df.dropna(subset=['customer_id'], inplace=True)
# Clean text data
df['product_name'] = df['product_name'].str.strip().str.lower()
df['product_name'] = df['product_name'].str.replace('[^a-zA-Z0-9 ]', '', regex=True)
# Convert dates
df['order_date'] = pd.to_datetime(df['order_date'])
df['year'] = df['order_date'].dt.year
df['month'] = df['order_date'].dt.month
# Remove duplicates
df.drop_duplicates(subset=['order_id'], keep='first', inplace=True)
# Apply custom function
def categorize_price(price):
if price < 50:
return 'Low'
elif price < 100:
return 'Medium'
else:
return 'High'
df['price_category'] = df['price'].apply(categorize_price)
Code Example:
import pandas as pd
# Sample sales data
df = pd.read_csv('sales.csv')
# GroupBy aggregation
dept_stats = df.groupby('department').agg({
'salary': ['mean', 'min', 'max'],
'employee_id': 'count'
})
# Multiple groupby
sales_by_region_product = df.groupby(['region', 'product_category'])['sales'].sum()
# Pivot table
pivot = df.pivot_table(
values='sales',
index='product_category',
columns='quarter',
aggfunc='sum',
fill_value=0
)
# Rolling window (moving average)
df['sales_ma_7d'] = df.groupby('product_id')['sales'].transform(
lambda x: x.rolling(window=7, min_periods=1).mean()
)
# Cumulative sum
df['cumulative_sales'] = df.groupby('product_id')['sales'].cumsum()
Code Example:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Set style
sns.set_style('whitegrid')
# Load data
df = pd.read_csv('sales_data.csv')
# 1. Line plot - Sales trend over time
df.groupby('month')['sales'].sum().plot(kind='line', figsize=(10, 6))
plt.title('Monthly Sales Trend')
plt.xlabel('Month')
plt.ylabel('Total Sales ($)')
plt.show()
# 2. Bar plot - Sales by category
category_sales = df.groupby('category')['sales'].sum().sort_values(ascending=False)
category_sales.plot(kind='bar', figsize=(10, 6))
plt.title('Sales by Category')
plt.xlabel('Category')
plt.ylabel('Total Sales ($)')
plt.xticks(rotation=45)
plt.show()
# 3. Histogram - Price distribution
df['price'].hist(bins=30, figsize=(10, 6))
plt.title('Price Distribution')
plt.xlabel('Price ($)')
plt.ylabel('Frequency')
plt.show()
# 4. Box plot - Salary by department
df.boxplot(column='salary', by='department', figsize=(10, 6))
plt.title('Salary Distribution by Department')
plt.suptitle('')
plt.show()
# 5. Heatmap - Correlation matrix
corr = df[['age', 'salary', 'years_experience']].corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', center=0)
plt.title('Correlation Matrix')
plt.show()
Analyze customer purchase behavior and segmentation.
Requirements:
Key Skills: Data cleaning, aggregation, visualization
Analyze sales trends and forecast future performance.
Requirements:
Key Skills: Time series operations, rolling windows, plotting
Build automated data quality assessment tool.
Requirements:
Key Skills: Data validation, statistical analysis, reporting
After mastering Pandas, explore: