| name | data-table-analysis |
| description | Use this skill for converting researched facts or user-provided data into structured tables by writing code, then running Python/pandas calculations in the job-scoped sandbox. This skill is for numeric normalization, tabular analysis, rankings, growth rates, summary statistics, CSV/JSON generation, and markdown tables. Triggers: "compute table", "calculate growth", "normalize values", "extract figures", "rank companies", "QoQ", "YoY", "CAGR", "summary statistics", "CSV", "JSON", "markdown table", "standardize quarters", "standardize currencies", "compare over time" Outputs: Markdown tables, CSV text, JSON records, summary statistics, rankings, and data-quality notes.
|
Data Table Analysis Skill
Generate accurate, source-grounded tables and computed quantitative summaries using Python/pandas.
This skill produces text artifacts that can be read and included in the final report.
Required Execution Standard
To ensure the calculation is reproducible and useful, you MUST:
- Structure Inputs: Convert facts from research notes or the user request into explicit rows before running pandas.
- Preserve Provenance: Keep source URLs, filing names, or note references in the input table when available.
- Normalize Units: Convert currencies, magnitudes, periods, and date labels into consistent fields before comparing values.
- Compute Deterministically: Call the
execute tool to run Python/pandas for arithmetic, rankings, growth rates, aggregates, and formatting. Do not hand-compute these values in prose.
- Write Text Outputs: Save markdown, CSV, or JSON outputs to
/shared/... with descriptive filenames using write_file.
- Report Caveats: Include assumptions, missing values, restatements, estimated figures, or non-comparable metrics in the output notes.
Execution Flow
-
Gather candidate facts from researcher outputs, user-provided data, or source excerpts.
-
Create a normalized input table with one row per comparable observation. Prefer explicit CSV or JSON records embedded in the Python script. If the source rows are in /shared/..., call read_file first and embed the returned content in the script, or write a sandbox-local input file under /workspace. Sandbox code cannot open /shared/... directly.
-
Call the execute tool with a Python command or script that:
- imports pandas,
- builds a DataFrame from the normalized rows,
- validates data types,
- standardizes units and period labels,
- computes the requested metrics,
- prints markdown, CSV, JSON, and data-quality notes as text.
- uses
/workspace for any sandbox-local input or output files.
- does not read from or write to
/shared/... inside the sandbox process.
-
Inspect the execute output. If the code fails, fix the code and call execute again. Do not continue with hand-computed fallback tables unless the sandbox or pandas is unavailable.
-
Write final text artifacts from the successful execute output to /shared/... using write_file, for example:
/shared/capex_growth_table.md
/shared/capex_normalized.csv
/shared/capex_analysis.json
-
In the response or report, cite the original sources for the input figures. Computed columns should be clearly labeled as calculations.
Required Tool Use: For tasks that request calculated tables, growth rates, rankings, summary statistics, normalization, CSV, or JSON, this skill requires at least one execute call that runs Python/pandas before writing the final artifacts.
Input Normalization Guidelines
| Input Issue | Required Handling |
|---|
| Mixed magnitudes | Convert millions/billions/trillions into one numeric unit, such as USD billions. |
| Mixed currencies | Convert to one currency only when an exchange-rate source is available; otherwise keep currencies separate and flag the limitation. |
| Fiscal vs. calendar quarters | Preserve the reported fiscal period and add a normalized sortable period field when possible. |
| Company-specific definitions | Keep metric names explicit, such as "capital expenditures", "PP&E additions", or "cash capex". |
| Missing values | Use null/blank values, not zero, unless the source explicitly reports zero. |
| Approximate figures | Mark estimates with an is_estimate column or a notes field. |
| Conflicting figures | Keep both rows with source notes unless one source is clearly authoritative. |
Calculation Specifications
| Calculation | Formula / Logic Guide |
|---|
| QoQ Growth | (current_value / prior_quarter_value - 1) * 100 within each entity and metric. |
| YoY Growth | (current_value / value_four_quarters_ago - 1) * 100 within each entity and metric. |
| CAGR | (ending_value / beginning_value) ** (1 / years) - 1, only when periods are comparable. |
| Ranking | Sort by the normalized numeric value and include rank ties deterministically. |
| Share of Total | value / group_total * 100, computed within the relevant period or category. |
| Summary Stats | Include count, mean, median, min, max, and missing-value count when useful. |
Output Formats
The sandbox supports text outputs that should be saved through /shared/:
.md - Markdown tables and explanatory notes for report inclusion.
.csv - Normalized tabular data for reuse.
.json - Structured records, assumptions, and summary metrics.
Storage Note: Always use descriptive filenames (e.g., ai_capex_8q_growth.md) rather than generic names like output.md.
Example Code Templates
A. Normalize Rows and Compute QoQ/YoY
Use this when researched figures need growth calculations.
import pandas as pd
rows = [
{
"company": "ExampleCo",
"period": "FY2025-Q1",
"period_index": 202501,
"metric": "capital_expenditures",
"value_usd_billions": 12.4,
"source": "https://example.com/filing",
"notes": "",
},
]
df = pd.DataFrame(rows)
df = df.sort_values(["company", "metric", "period_index"])
df["qoq_growth_pct"] = (
df.groupby(["company", "metric"])["value_usd_billions"].pct_change(1) * 100
)
df["yoy_growth_pct"] = (
df.groupby(["company", "metric"])["value_usd_billions"].pct_change(4) * 100
)
display_cols = [
"company",
"period",
"metric",
"value_usd_billions",
"qoq_growth_pct",
"yoy_growth_pct",
"source",
"notes",
]
markdown_table = df[display_cols].to_markdown(index=False, floatfmt=".1f")
csv_text = df[display_cols].to_csv(index=False)
B. Rank Entities by Latest Comparable Period
Use this for company rankings or top-N comparisons.
import pandas as pd
df = pd.DataFrame(rows)
latest_period = df["period_index"].max()
latest = df[df["period_index"] == latest_period].copy()
latest = latest.sort_values(
["value_usd_billions", "company"],
ascending=[False, True],
)
latest["rank"] = range(1, len(latest) + 1)
ranking_table = latest[
["rank", "company", "period", "value_usd_billions", "source", "notes"]
].to_markdown(index=False, floatfmt=".1f")
C. Generate Data-Quality Notes
Use this to make limitations explicit before synthesis.
import pandas as pd
df = pd.DataFrame(rows)
notes = []
missing = df["value_usd_billions"].isna().sum()
if missing:
notes.append(f"{missing} rows have missing normalized values.")
if "is_estimate" in df.columns and df["is_estimate"].fillna(False).any():
notes.append("Some values are estimates and should be labeled as such.")
if df.duplicated(["company", "period", "metric"]).any():
notes.append("Some company-period-metric combinations have multiple source rows.")
data_quality_notes = "\n".join(f"- {note}" for note in notes) or "- No major data-quality issues identified."
Troubleshooting in the Sandbox
- Missing pandas: If
import pandas fails, report that the sandbox image needs pandas installed. Do not hand-compute large tables in prose.
- Sorting Periods: Do not sort fiscal quarters alphabetically. Create a numeric
period_index or date column.
- Percent Formatting: Keep computed growth as numeric values in CSV/JSON; format percentages only in markdown tables.
- Zero Division: If a prior period is zero or missing, leave growth blank/null and explain the limitation.