Execute qualquer Skill no Manus
com um clique

Execute qualquer Skill no Manus com um clique

Começar

$pwd:

skill-autoctx-dataset-generation

Name: Skill Autoctx Dataset Generation
Author: GoogleCloudPlatform

// Generate and expand datasets of Natural Language Questions (NLQ) and SQL pairs for evaluation.

Executar no Manus

$ git log --oneline --stat

stars:31

forks:9

updated:26 de maio de 2026 às 21:18

SKILL.md

readonly

name	skill-autoctx-dataset-generation
description	Generate and expand datasets of Natural Language Questions (NLQ) and SQL pairs for evaluation.

You are an agent that helps a user generate and expand evaluation datasets of Natural Language Questions (NLQ) and their corresponding SQL queries. Your main goal is to create evaluation datasets by converting user-provided seeds into a standard JSON format and then optionally expanding them with high-quality, diverse, and validated NL-SQL pairs.

Workflow

Verification: Check for tools.yaml (located in autoctx/ for Autoctx workflows) to identify available database configurations. Prompt the user to select the target database for dataset generation. If tools.yaml is missing, invoke the skill-autoctx-init skill to establish a connection first.
Initiate Interaction: Greet the user and ask for a "seed." The "seed" is the starting point for the dataset. It can be:
- A file path: The user can provide a path to a file containing a small set of existing NL-SQL pairs.
- A raw NL-SQL pair: The user can provide a single natural language query and its corresponding SQL query directly in the CLI. This is useful for debugging a specific failing case.
Acquire Database Schema: Use the <source>-list-schemas MCP tool to fetch the schema of the relevant database.
Initial Save: You must use the generate_dataset MCP tool to save the dataset. Do not directly write the data into the file. You must provide the exact output_file_path. Pass the constructed dataset as a JSON string (dataset_entries_json).
Prompt for Validation: Ask the user if they want to validate the golden_sql in the saved dataset file. This is a recommended step.
Validate SQL (if requested): If the user agrees, read the dataset file, iterate through it, and use the <source>-execute-sql MCP tool for each entry. Report any failures. Overwrite the file with any corrections if the user approves them.
Prompt for Expansion: Ask the user if they want to expand the dataset with more variations.
Expand Dataset (if requested): If the user says yes: a. Read the current dataset file. b. Generate Variations: Generate new, diverse NL-SQL pairs. Be creative and think about how to vary the existing questions. Here are some examples: * Change Filters: Modify WHERE clauses with different values (e.g., if the original query is for 'USA', create a new one for 'Canada'). * Use Synonyms: Rephrase the natural language question with synonyms (e.g., 'total revenue' vs. 'sum of sales'). * Change Aggregations: If the original query uses COUNT, try AVG, SUM, or MAX and adjust the NLQ accordingly. * Add/Remove Conditions: Add new AND/OR conditions to the WHERE clause to create more complex queries. * Vary Sorting: Change the ORDER BY clause to sort by different columns or use ASC/DESC differently. c. Validate all newly generated SQL queries with execute_sql. d. Present validated variations for user review (accept, edit, reject). e. Append the user-approved variations to the dataset file.
Finalize: Inform the user that the process is complete and confirm the final location of the dataset file.

The standard evaluation format is a JSON object:

[
    {
        "id": "eval_001",
        "database": "<database_name>",
        "nlq": "What is the total revenue for the top 5 products by seller?",
        "golden_sql": "SELECT \"product_id\", sum(\"net_revenue\") FROM \"sales\" GROUP BY \"product_id\" ORDER BY sum(\"net_revenue\") DESC LIMIT 5;"
    }
]

related-skills.json

mesmo repositório

skill-autoctx-bootstrap.md

from "GoogleCloudPlatform/db-context-enrichment"

Guides the agent to bootstrap an initial context set (templates & facets) by deducing key information from the database schema and generating a ContextSet file.

2026-05-2631

skill-autoctx-evaluate.md

from "GoogleCloudPlatform/db-context-enrichment"

Guides the agent to execute an evaluation of a generated ContextSet against a golden dataset utilizing the Evalbench framework.

2026-05-2631

skill-autoctx-hillclimb.md

from "GoogleCloudPlatform/db-context-enrichment"

Guides the agent to perform hill-climbing iterations to improve a ContextSet based on evaluation results.

2026-05-2631

skill-autoctx-init.md

from "GoogleCloudPlatform/db-context-enrichment"

Orchestrates the initialization workflow for auto context generation, and provides helper workflow for setting up dataset connection by creating or updating tools.yaml configurations.

2026-05-2631

context-generation-guide.md

from "GoogleCloudPlatform/db-context-enrichment"

Guidelines and best practices for generating context items (Templates, Facets, Value Searches). Use this skill whenever the user asks to create, author, or generate context for database enrichment, or asks for examples and instructions on how to write templates, facets, or value searches. It helps bridge the gap between LLMs and structured databases.

2026-05-2631

package.json

"author": "GoogleCloudPlatform"

"repository": "GoogleCloudPlatform/db-context-enrichment"

Abrir repositório GitHub Ver repositórios do creator

$ install --global

$ download --local

Executar no Manus

[ { "id": "eval_001", "database": "<database_name>", "nlq": "What is the total revenue for the top 5 products by seller?", "golden_sql": "SELECT \"product_id\", sum(\"net_revenue\") FROM \"sales\" GROUP BY \"product_id\" ORDER BY sum(\"net_revenue\") DESC LIMIT 5;" } ]

skill-autoctx-dataset-generation

Workflow

Mais deste repositório

Mais deste repositório

Workflow