| name | analyze-results |
| description | Analyze EDSL Results objects - load by UUID or file path, export survey documentation, and generate analysis reports |
| allowed-tools | Read, Glob, Bash(python:*), Bash(pandoc *), Write, AskUserQuestion |
| user-invocable | true |
| arguments | UUID or path to results.json.gz file |
Analyze EDSL Results
Load an EDSL Results object from Expected Parrot (by UUID) or from a local file (results.json.gz), export documentation files, and generate a comprehensive analysis report.
Usage
/edsl-analyze-results <uuid-or-path>
Examples:
/edsl-analyze-results 123e4567-e89b-12d3-a456-426614174000
/edsl-analyze-results ./my_experiment/results.json.gz
After loading the results, the skill will ask you to choose an analysis focus (full analysis, summary statistics, cross-tabulation, or a specific custom focus).
IMPORTANT: Always ask this question, even if the user provided a query with the UUID. The question helps clarify what type of analysis they want. If they select "Specific focus", follow up to get their specific question or hypothesis.
Workflow
1. Parse the Input
Determine if the input is:
- UUID: A 36-character UUID (e.g.,
123e4567-e89b-12d3-a456-426614174000)
- File path: A path ending in
.json.gz or .json
If unclear, use AskUserQuestion to clarify.
2. Load the Results
from edsl import Results
results = Results.pull("123e4567-e89b-12d3-a456-426614174000")
results = Results.load("path/to/results")
3. Create Output Directory and Ask about Report Focus
Create a directory for the analysis outputs using sequential numbering:
import os
import glob
existing = glob.glob("./analysis_*")
existing_nums = []
for d in existing:
try:
num = int(d.split("_")[-1])
existing_nums.append(num)
except ValueError:
pass
next_num = max(existing_nums, default=0) + 1
output_dir = f"./analysis_{next_num}"
os.makedirs(output_dir, exist_ok=True)
always use AskUserQuestion to ask about the analysis focus.
This ensures the report is tailored to the user's needs. It's a free text question:
Question: "What would you like me to focus on in the analysis?"
Header: "Focus"
4. Export Documentation Files
Export three core documentation files:
survey = results.survey
survey_md = survey.to_markdown()
with open(f"{output_dir}/survey.md", "w") as f:
f.write(survey_md)
import re
survey_mermaid = survey.to_mermaid()
survey_mermaid = re.sub(r'<b>|</b>|<br/>', '\n', survey_mermaid)
survey_mermaid = re.sub(r'\n+', '\n', survey_mermaid)
with open(f"{output_dir}/survey.mermaid", "w") as f:
f.write(survey_mermaid)
results_csv = results.to_csv()
results_csv.write(f"{output_dir}/results.csv")
5. Initial Data Exploration
Before analysis, explore the data structure:
import pandas as pd
df = pd.read_csv(f"{output_dir}/results.csv")
print(f"Shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
answer_cols = [c for c in df.columns if c.startswith('answer.')]
print(f"Answer columns: {answer_cols}")
agent_cols = [c for c in df.columns if c.startswith('agent.')]
print(f"Agent columns: {agent_cols}")
scenario_cols = [c for c in df.columns if c.startswith('scenario.')]
print(f"Scenario columns: {scenario_cols}")
question_text_cols = [c for c in df.columns if c.startswith('question_text.')]
print(f"Question text columns: {question_text_cols}")
question_options_cols = [c for c in df.columns if c.startswith('question_options.')]
print(f"Question options columns: {question_options_cols}")
question_type_cols = [c for c in df.columns if c.startswith('question_type.')]
print(f"Question type columns: {question_type_cols}")
6. Generate Analysis Report
Create a comprehensive report.md with:
Structure
# Results Analysis Report
## Study Design
### Questions
[For EACH question in the survey, show:]
#### Q1: [question_name] ([question_type])
**Template:** [The raw question template text from question_text.* column]
**Options:** [The options from question_options.* column, if applicable]
**Realized versions:**
[If the question uses scenario variables (Jinja2 templates like {{ var }}), show a table
of ALL unique realized question texts across scenario conditions. Group by scenario
variables to show what each condition looked like.]
| Scenario Condition | Realized Question Text |
|--------------------|----------------------|
| [condition 1] | [full realized text] |
| [condition 2] | [full realized text] |
...
[Repeat for each question]
### Scenario Variables
[Show ALL scenario variables and their unique values]
| Variable | Unique Values |
|----------|--------------|
| scenario.domain | health_insurance, software_platform, ... |
| scenario.framing | neutral, status_quo |
...
### Scenario Matrix
[Show the full crossing of scenario variables as a table, so the reader can see
every unique experimental condition. Include the count of observations per cell.]
### Agents / Models
[Show the models or agents used, their configuration (temperature, etc.),
and how many responses each produced]
## Data Summary
- Number of responses: N
- Agent traits collected: [list]
- Scenarios tested: [list]
## Detailed Results
### Q1: [Question Name]
[Response distribution table]

[Interpretation of results]
### Q2: [Question Name]
[Same pattern - table, visualization, interpretation together]
## Key Findings
[Main insights from the data]
## Cross-Tabulations (if applicable)
[Relationships between variables - only include agent breakdowns if agents have meaningful names, not UUIDs]
## Files Generated
| File | Description |
|------|-------------|
| [survey.md](survey.md) | Survey documentation |
| [survey.mermaid](survey.mermaid) | Survey flow diagram |
| [results.csv](results.csv) | Raw results data |
| [report.html](report.html) | This report (HTML) |
Generating the Study Design Section
The Study Design section is critical — it should make the report self-contained so a reader understands exactly what was asked without needing to open separate files.
For each question, extract the realized text from the data:
for qt_col in question_text_cols:
q_name = qt_col.replace('question_text.', '').replace('_question_text', '')
template = df[qt_col].iloc[0]
prompt_col = f'prompt.{q_name}_user_prompt'
if prompt_col in df.columns:
if scenario_cols:
groups = df.groupby([c for c in scenario_cols if df[c].nunique() > 1])
for name, group in groups:
realized_text = group[prompt_col].iloc[0]
For scenario variables, enumerate all unique values:
for col in scenario_cols:
unique_vals = df[col].dropna().unique()
For the scenario matrix, show the full experimental design:
if len(scenario_cols) >= 2:
meaningful_scenario_cols = [c for c in scenario_cols
if not c.endswith('_index') and not c.endswith('_id')]
design = df.groupby(meaningful_scenario_cols).size().reset_index(name='n_observations')
IMPORTANT: When listing files in the report, always use relative hyperlinks (e.g., [survey.md](survey.md)) so users can click through to the files.
IMPORTANT:
- Do NOT include mermaid diagrams in the report (they often don't render correctly in HTML output). The mermaid file is still exported separately for reference.
- Only include per-agent analysis if agents have meaningful names (not UUIDs). Check if agent names look like UUIDs (36-character strings with hyphens in pattern 8-4-4-4-12) and skip agent breakdowns if so.
IMPORTANT: Place each visualization immediately after its corresponding question's data table, not in a separate section at the end. This keeps the analysis coherent and easy to follow.
Generate Visualizations (Inline with Questions)
For each question, generate and save a visualization, then reference it in the report immediately after the question's statistics:
import matplotlib.pyplot as plt
for col in answer_cols:
question_name = col.replace('answer.', '')
value_counts = df[col].value_counts()
report += f"### {question_name}\n\n"
report += "[Response distribution table here]\n\n"
if len(value_counts) <= 20:
fig, ax = plt.subplots(figsize=(10, 6))
value_counts.plot(kind='bar', ax=ax)
ax.set_title(f'Response Distribution: {question_name}')
ax.set_xlabel('Response')
ax.set_ylabel('Count')
plt.tight_layout()
chart_path = f"{question_name}_distribution.png"
plt.savefig(f'{output_dir}/{chart_path}', dpi=150)
plt.close()
report += f"\n\n"
report += "[Interpretation of this question's results]\n\n"
7. Save All Outputs
Ensure all files are saved to the output directory:
output_dir/
├── survey.md # Survey in markdown format
├── survey.mermaid # Survey flow diagram
├── results.csv # Full results data
├── report.md # Analysis report
├── report.html # Styled HTML report
├── *.png # Visualization files
└── analysis.py # Optional: reproducible analysis script
8. Generate HTML Report with Pandoc
After saving report.md, convert it to a styled HTML report using pandoc:
CSS_FILE="<discovered_css_path>"
pandoc "${output_dir}/report.md" \
-o "${output_dir}/report.html" \
--css="${CSS_FILE}" \
--standalone
Or in Python:
import subprocess
import os
import glob as g
css_matches = g.glob("**/assets/report.css", recursive=True)
css_file = css_matches[0] if css_matches else None
subprocess.run([
"pandoc",
f"{output_dir}/report.md",
"-o", f"{output_dir}/report.html",
f"--css={css_file}",
"--standalone"
], check=True)
print(f"Generated: {output_dir}/report.html")
Note: The mermaid diagram is exported separately as survey.mermaid but not embedded in the report due to rendering issues in HTML output.
Complete Example Script
"""
EDSL Results Analysis Script
"""
from edsl import Results
import pandas as pd
import matplotlib.pyplot as plt
import os
import re
from datetime import datetime
RESULTS_UUID = "123e4567-e89b-12d3-a456-426614174000"
results = Results.pull(RESULTS_UUID)
import glob
existing = glob.glob("./analysis_*")
existing_nums = []
for d in existing:
try:
num = int(d.split("_")[-1])
existing_nums.append(num)
except ValueError:
pass
next_num = max(existing_nums, default=0) + 1
output_dir = f"./analysis_{next_num}"
os.makedirs(output_dir, exist_ok=True)
survey = results.survey
with open(f"{output_dir}/survey.md", "w") as f:
f.write(survey.to_markdown())
survey_mermaid = survey.to_mermaid()
survey_mermaid = re.sub(r'<b>|</b>|<br/>', '\n', survey_mermaid)
survey_mermaid = re.sub(r'\n+', '\n', survey_mermaid)
with open(f"{output_dir}/survey.mermaid", "w") as f:
f.write(survey_mermaid)
results_csv = results.to_csv()
results_csv.write(f"{output_dir}/results.csv")
df = pd.read_csv(f"{output_dir}/results.csv")
answer_cols = [c for c in df.columns if c.startswith('answer.')]
agent_cols = [c for c in df.columns if c.startswith('agent.')]
scenario_cols = [c for c in df.columns if c.startswith('scenario.')]
import re
def is_uuid(s):
"""Check if a string looks like a UUID (8-4-4-4-12 hex pattern)."""
if not isinstance(s, str):
return False
uuid_pattern = r'^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$'
return bool(re.match(uuid_pattern, s.lower()))
has_meaningful_agents = False
if 'agent.agent_name' in df.columns:
agent_names = df['agent.agent_name'].dropna().unique()
has_meaningful_agents = len(agent_names) > 0 and not all(is_uuid(str(name)) for name in agent_names)
question_text_cols = [c for c in df.columns if c.startswith('question_text.')]
question_options_cols = [c for c in df.columns if c.startswith('question_options.')]
question_type_cols = [c for c in df.columns if c.startswith('question_type.')]
prompt_cols = [c for c in df.columns if c.startswith('prompt.') and c.endswith('_user_prompt')]
model_cols = [c for c in df.columns if c.startswith('model.')]
study_design = ""
study_design += "## Study Design\n\n### Questions\n\n"
for qt_col in question_text_cols:
q_name = qt_col.replace('question_text.', '').replace('_question_text', '')
template = str(df[qt_col].iloc[0])
qt_type_col = f'question_type.{q_name}_question_type'
q_type = str(df[qt_type_col].iloc[0]) if qt_type_col in df.columns else 'unknown'
qo_col = f'question_options.{q_name}_question_options'
q_options = str(df[qo_col].iloc[0]) if qo_col in df.columns and df[qo_col].notna().any() else None
study_design += f"#### {q_name} ({q_type})\n\n"
study_design += f"**Template:** {template}\n\n"
if q_options and q_options != 'nan':
study_design += f"**Options:** {q_options}\n\n"
prompt_col = f'prompt.{q_name}_user_prompt'
if prompt_col in df.columns and scenario_cols:
meaningful_scenario_cols = [c for c in scenario_cols
if not c.endswith('_index') and df[c].nunique() > 1]
if meaningful_scenario_cols:
unique_prompts = df.groupby(meaningful_scenario_cols)[prompt_col].first().reset_index()
if len(unique_prompts) > 1:
study_design += "**Realized versions by scenario:**\n\n"
study_design += "| " + " | ".join(c.replace('scenario.', '') for c in meaningful_scenario_cols) + " | Question Text |\n"
study_design += "| " + " | ".join("---" for _ in meaningful_scenario_cols) + " | --- |\n"
for _, row in unique_prompts.iterrows():
conditions = " | ".join(str(row[c]) for c in meaningful_scenario_cols)
prompt_text = str(row[prompt_col]).replace('\n', ' ').replace('|', '\\|')
if len(prompt_text) > 200:
prompt_text = prompt_text[:200] + "..."
study_design += f"| {conditions} | {prompt_text} |\n"
study_design += "\n"
study_design += "\n"
if scenario_cols:
study_design += "### Scenario Variables\n\n"
study_design += "| Variable | # Unique | Values |\n"
study_design += "|----------|----------|--------|\n"
for col in scenario_cols:
if not col.endswith('_index'):
unique_vals = df[col].dropna().unique()
vals_str = ", ".join(str(v) for v in sorted(unique_vals, key=str))
if len(vals_str) > 150:
vals_str = vals_str[:150] + "..."
study_design += f"| {col.replace('scenario.', '')} | {len(unique_vals)} | {vals_str} |\n"
study_design += "\n"
meaningful_scenario_cols = [c for c in scenario_cols
if not c.endswith('_index') and not c.endswith('_id') and df[c].nunique() > 1]
if len(meaningful_scenario_cols) >= 2:
study_design += "### Scenario Matrix\n\n"
design_matrix = df.groupby(meaningful_scenario_cols).size().reset_index(name='n_observations')
study_design += design_matrix.to_markdown(index=False) + "\n\n"
study_design += "### Agents / Models\n\n"
if 'model.model' in df.columns:
model_info = df.groupby('model.model').size().reset_index(name='n_responses')
study_design += "| Model | Responses |\n"
study_design += "|-------|-----------|\n"
for _, row in model_info.iterrows():
study_design += f"| {row['model.model']} | {row['n_responses']} |\n"
study_design += "\n"
config_cols = [c for c in model_cols if c not in ['model.model', 'model.model_index'] and df[c].nunique() == 1]
if config_cols:
study_design += "**Model configuration:** "
configs = [f"{c.replace('model.', '')}={df[c].iloc[0]}" for c in config_cols]
study_design += ", ".join(configs) + "\n\n"
if has_meaningful_agents:
agent_info = df.groupby('agent.agent_name').size().reset_index(name='n_responses')
study_design += "| Agent | Responses |\n"
study_design += "|-------|-----------|\n"
for _, row in agent_info.iterrows():
study_design += f"| {row['agent.agent_name']} | {row['n_responses']} |\n"
study_design += "\n"
report = f"""# Results Analysis Report
Generated: {datetime.now().strftime("%Y-%m-%d %H:%M:%S")}
{study_design}
## Data Summary
- **Total responses**: {len(df)}
- **Questions**: {len(answer_cols)}
- **Agent traits**: {len(agent_cols)} ({', '.join(agent_cols) if agent_cols else 'None'})
- **Scenario variables**: {len(scenario_cols)} ({', '.join(scenario_cols) if scenario_cols else 'None'})
## Response Distributions
"""
for col in answer_cols:
question_name = col.replace('answer.', '')
value_counts = df[col].value_counts()
report += f"### {question_name}\n\n"
report += "| Response | Count | Percentage |\n"
report += "|----------|-------|------------|\n"
for val, count in value_counts.items():
pct = count / len(df) * 100
report += f"| {val} | {count} | {pct:.1f}% |\n"
report += "\n"
if len(value_counts) <= 20:
fig, ax = plt.subplots(figsize=(10, 6))
value_counts.plot(kind='bar', ax=ax)
ax.set_title(f'Response Distribution: {question_name}')
ax.set_xlabel('Response')
ax.set_ylabel('Count')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
chart_path = f"{question_name}_distribution.png"
plt.savefig(f'{output_dir}/{chart_path}', dpi=150)
plt.close()
report += f"\n\n"
if has_meaningful_agents:
report += """## Analysis by Agent
"""
for col in answer_cols:
question_name = col.replace('answer.', '')
crosstab = pd.crosstab(df['agent.agent_name'], df[col], normalize='index') * 100
report += f"### {question_name} by Agent\n\n"
report += crosstab.to_markdown() + "\n\n"
report += """## Key Findings
[Add key findings based on the analysis]
## Methodology Notes
This analysis was generated from EDSL Results data. The survey was administered to AI agents
using the Expected Parrot platform.
"""
with open(f"{output_dir}/report.md", "w") as f:
f.write(report)
import subprocess
import glob as g
css_matches = g.glob("**/assets/report.css", recursive=True)
css_file = css_matches[0] if css_matches else None
subprocess.run([
"pandoc",
f"{output_dir}/report.md",
"-o", f"{output_dir}/report.html",
f"--css={css_file}",
"--standalone"
], check=True)
print(f"Analysis complete! Output saved to: {output_dir}/")
print(f" - survey.md")
print(f" - survey.mermaid")
print(f" - results.csv")
print(f" - report.md")
print(f" - report.html")
- Ask about PowerPoint
Use the AskUserQuestion if they'd like a PPTX slideshow as well. If they say yes, create a PPTX file for them based on the report.md
- Question: Would you like a Power Point version of the results?
-Options: Yes/No
Output Files
| File | Description |
|---|
survey.md | Human-readable survey documentation with questions, options, and rules |
survey.mermaid | Mermaid diagram showing survey flow and skip logic |
results.csv | Full results data in CSV format for analysis |
report.md | Comprehensive analysis report with findings and visualizations |
report.html | Styled HTML report (via pandoc with Expected Parrot CSS) |
*.png | Charts and visualizations referenced in the report |
analysis.py | (Optional) Reproducible Python script for the analysis |
Output options
Tips
- Check
survey.mermaid separately to understand skip logic before analyzing
- Look for patterns in agent traits vs. responses (only if agents have meaningful names, not UUIDs)
- Compare responses across scenarios (if scenarios were used)
- The
answer.* columns contain question responses
- The
agent.* columns contain agent trait values
- The
scenario.* columns contain scenario variable values
- Use
comment.* columns to see free-text explanations (if available)
- Per-agent breakdowns are automatically skipped when agent names are UUIDs (not meaningful for analysis)
Common Analysis Patterns
Cross-tabulation by Scenario
pd.crosstab(df['scenario.condition'], df['answer.question_name'], normalize='index')
Agent Trait Analysis
import re
def is_uuid(s):
uuid_pattern = r'^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$'
return bool(re.match(uuid_pattern, str(s).lower()))
if not all(is_uuid(name) for name in df['agent.agent_name'].dropna().unique()):
df.groupby('agent.agent_name')['answer.question_name'].value_counts(normalize=True)
Response Correlation
df[[c for c in answer_cols if df[c].dtype in ['int64', 'float64']]].corr()