en un clic
skill-autoctx-hillclimb
// Guides the agent to perform hill-climbing iterations to improve a ContextSet based on evaluation results.
// Guides the agent to perform hill-climbing iterations to improve a ContextSet based on evaluation results.
| name | skill-autoctx-hillclimb |
| description | Guides the agent to perform hill-climbing iterations to improve a ContextSet based on evaluation results. |
This skill guides the process of improving a ContextSet iteratively by analyzing evaluation failures (Gap Analysis) and applying automated structural refinements (Mutations).
[!IMPORTANT] Constraint: In this workflow, you are ONLY allowed use context types
templatesandfacets. Do not attempt to use other context types.
Follow these steps exactly in order:
autoctx/experiments/ directory and autoctx/state.md exist. If missing, warn the user that the workspace might not be initialized (suggest running /autoctx:init).eval_reports/ folder. If missing, suggest running the Evaluation workflow first.autoctx/state.md to identify the active experiment.autoctx/experiments/.autoctx/experiments/<experiment_name>/hillclimb/ folder for files matching improved_context_v*.json.vN by finding the maximum N and using N+1. If the folder is empty, start at v1.v1:
autoctx/state.md to see if a specific base context path was recorded for this experiment (e.g., during the Evaluation setup for user-provided contexts).state.md, default to the baseline generated by Bootstrap in the experiment folder.vN (where N > 1), the base context is improved_context_v(N-1).json.Validation:
eval_reports/. If multiple folders exist, find the most recent one by modified time. Prefer the latest run by default, but list other available runs as well (peeking into their summary.csv or configs.csv to show timestamps/metrics for visual context). Ask the user to confirm the selection.eval_reports/<job_id_folder>/ contains expected files (e.g., scores.csv, summary.csv). If missing or empty, STOP and inform the user.Read Evaluation Results: Use the read_evaluation_result MCP tool passing the path to eval_reports/<job_id_folder>/.
Generate Gap Analysis Report (Batched):
offset (0, 10, 20, ...) until all failed queries are analyzed.# Gap Analysis Report - vN header and ## Summary section, followed by the analysis of the first batch under ## Failed Queries Detail.## Failed Queries Detail section.Use the following structure for the report:
# Gap Analysis Report - vN
## Summary
- **Total Queries**: 10
- **Passed**: 7
- **Failed**: 3
- **Pass Rate**: 70%
## Failed Queries Detail
### Query 1: "How many users registered in 2023?"
- **Error Category**: `[FilterError]`
- **Expected SQL**: `SELECT count(*) FROM users WHERE year = 2023`
- **Actual SQL**: `SELECT count(*) FROM users` (Missing filter)
- **Root Cause**: The LLM did not know about the `year` column or how to filter by year for this entity.
- **Proposed Mutation**: Add a facet for "Users by Year".
### Query 2: "Show me top selling products"
- **Error Category**: `[OrderingError]`
- **Expected SQL**: `SELECT name FROM products ORDER BY sales DESC LIMIT 5`
- **Actual SQL**: `SELECT name FROM products LIMIT 5`
- **Root Cause**: Missing ordering instruction in context.
- **Proposed Mutation**: Update the template for "Product Sales" to include ordering.
### Query 3: "Get users older than 30"
- **Error Category**: `[GoldenDataError]`
- **Expected SQL**: `SELECT * FROM users WHERE age >> 30` (Syntax error `>>` in golden SQL)
- **Actual SQL**: `SELECT * FROM users WHERE age > 30`
- **Root Cause**: Invalid syntax in golden dataset.
- **Proposed Mutation**: None. Flag to user to fix the evaluation dataset.
Save Report: You MUST physically write the report file to autoctx/experiments/<experiment_name>/hillclimb/gap_analysis_vN.md. If you are processing in batches, ensure you append to this file until all failed queries are documented. Do not merely output it in chat; it must exist on the file system.
Log in State Tracking:
autoctx/state.md to record the mapping for Loop vN (Base Context <-> Eval Report <-> Gap Analysis).Human-in-the-Loop Review:
gap_analysis_vN.md exists and contains findings. Verify the base ContextSet file exists. If missing, STOP and inform the user.gap_analysis_vN.md to identify what needs to be fixed.facet for a column definition rather than a specific template for every query using that column).template, facet, and value_search types.autoctx/experiments/<experiment_name>/hillclimb/improved_context_vN.json.context-generation-guide skill to produce the final parameterized items.<source>-execute-sql (use dummy values for placeholders) to verify syntax.<source>-list-schemas.mutate_context_set MCP tool passing the new file path as file_path and mutations as mutations_json to mutate the context set.autoctx/state.md to include the output path of improved_context_vN.json for Loop vN.autoctx/tools.yaml (or db_config.yaml) to fetch the specific project, location, and instance/cluster details for the active database.generate_upload_url tool passing the extracted values to provide the direct console link to the user.improved_context_vN.json and the generated console link together in a single clear message.N+1.Upon successful completion, the workspace must contain:
autoctx/experiments/<experiment_name>/hillclimb/gap_analysis_vN.mdautoctx/experiments/<experiment_name>/hillclimb/improved_context_vN.jsonautoctx/state.md summarizing the run loop.autoctx/state.md)When updating autoctx/state.md, please append or update the Hill-Climbing Run Log section:
# Context Authoring Experiment State Tracking
## Active Experiment: my-exp-1
## Hill-Climbing Run Log
### Loop: v1
- **Base Context**: `baseline_context.json`
- **Eval Report Path**: `autoctx/experiments/my-exp-1/eval_reports/<job_id_uuid>/` (containing `configs.csv`, `evals.csv`, etc.)
- **Gap Analysis**: `autoctx/experiments/my-exp-1/hillclimb/gap_analysis_v1.md`
- **Mutated Context**: `autoctx/experiments/my-exp-1/hillclimb/improved_context_v1.json`
Guides the agent to bootstrap an initial context set (templates & facets) by deducing key information from the database schema and generating a ContextSet file.
Generate and expand datasets of Natural Language Questions (NLQ) and SQL pairs for evaluation.
Guides the agent to execute an evaluation of a generated ContextSet against a golden dataset utilizing the Evalbench framework.
Orchestrates the initialization workflow for auto context generation, and provides helper workflow for setting up dataset connection by creating or updating tools.yaml configurations.
Guidelines and best practices for generating context items (Templates, Facets, Value Searches). Use this skill whenever the user asks to create, author, or generate context for database enrichment, or asks for examples and instructions on how to write templates, facets, or value searches. It helps bridge the gap between LLMs and structured databases.