com um clique
gcp-dataflow
// Provides guidance for writing, packaging and executing Apache Beam pipelines on GCP using Cloud Dataflow. Use when: - Creating an Apache Beam Dataflow pipeline. - Creating a Google Flex Template.
// Provides guidance for writing, packaging and executing Apache Beam pipelines on GCP using Cloud Dataflow. Use when: - Creating an Apache Beam Dataflow pipeline. - Creating a Google Flex Template.
Discovers and inspects BigQuery Data Transfer Service (DTS) configurations. Use this to identify existing ingestion pipelines and extract datasource or transfer config metadata for data pipelines. Use when a user asks for ingestion scenarios while building or managing data pipelines or when a user asks to "ingest" or "add" data that may already be managed by a DTS transfer.
Expertise in generating clean, correct, and efficient Dataform pipeline code for BigQuery ELT. Use this when creating or modifying Dataform pipelines, actions, or source declarations, when Dataform, SQLX, or BigQuery are mentioned in a transformation, when data needs to be ingested from GCS into BigQuery via Dataform, or when setting up a new Dataform project or configuring workflow_settings.yaml.
Expert guidance for creating, modifying, and optimizing dbt pipelines for BigQuery. Use this skill whenever user asks for generating or modifying a dbt model or project. Activate this skill when the user - Creates, modifies, or troubleshoots **dbt models or pipelines** - Needs to **optimize SQL** within a dbt project - Is **setting up a new dbt project** or configuring existing one
Finds and inspects data assets within Google Cloud. Relevant when any of the following conditions are true: 1. The user request involves finding, exploring, or inspecting data assets in Google Cloud, such as: - BigQuery datasets, tables, or views - BigLake catalog or tables - Spanner instances, databases or tables - etc. 2. You need to retrieve the schema, metadata, or governance policies for a GCP data asset. 3. You have a keyword or topic (e.g., "sales data") but lack the specific table or resource ID. 4. You are attempting to find data using `bq ls`, as this skill offers a superior approach. Don't use when: - Assets are outside Google Cloud
This skill helps the agent generate or update orchestration pipeline definitions for Google Cloud Composer to initialize orchestration pipeline or update the orchestration definition for orchestration of various data pipelines, like dbt pipelines, notebooks, Spark jobs, Dataform, Python scripts or inline BigQuery SQL queries. This skill also helps deploy and trigger orchestration pipelines.
Automates declarative resource creation and provisioning for data pipelines, supporting BigQuery, Dataform, Dataproc, BigQuery Data Transfer Service (DTS), and other resources. It manages environment-specific configurations (dev, staging, prod) through a deployment.yaml file. Use when: - Modifying or creating deployment.yaml for deployment settings. - Resolving environment-specific variables (e.g., Project IDs, Regions) for deployment. - Provisioning supported infrastructure like BigQuery datasets/tables, Dataform resources, or DTS resources via deployment.yaml. Do not use when: - Resources already exist. - Managing resources not supported by `gcloud beta orchestration-pipelines resource-types list`. - Managing general cloud infrastructure (VMs, networks, Kubernetes, IAM policies), which are better suited for Terraform. - Infrastructure spans multiple cloud providers (AWS, Azure, etc.). - Already uses Terraform for the target resources.
| name | gcp-dataflow |
| description | Provides guidance for writing, packaging and executing Apache Beam pipelines on GCP using Cloud Dataflow. Use when: - Creating an Apache Beam Dataflow pipeline. - Creating a Google Flex Template. |
| license | Apache-2.0 |
| metadata | {"version":"v2","publisher":"google"} |
Expert guidance for writing and packaging Apache Beam pipelines to run on Google Cloud Dataflow.
Use this section when creating a new project for a Dataflow pipeline.
requirements.txt, and other similar files where versions
are specified.Use this section when configuring a Dataflow Java pipeline project using gradle.
com.github.johnrengelman.shadow) unless the user explicitly requests a
Fat Jar.application plugin for
passing command-line parameters.slf4j-api version pulled transitively by Apache Beam.slf4j-simple,
logback-classic, etc.) to exactly match the major/minor version of the
resolved slf4j-api.When creating new Dataflow pipeline projects, configure them as a Flex template.
Flex Templates offer a hermetic and reproducible launch environment, and are
easy to launch with gcloud or with orchestrators like Cloud Composer.
Follow the Flex Templates section below.
--sdk_container_image). Whenever
configuring or suggesting a Dataflow Flex Template for a Python pipeline
that requires extra dependencies (e.g., using --requirements_file,
--setup_file, or --extra_package), YOU MUST recommend the Single
Docker Image Configuration as detailed in
python_flex_template_reference.md.cloudbuild.yaml out-of-the-box for
building and pushing images unless local setup is explicitly requested.When launching Python Pipelines without a Flex Template with
DataflowRunner, you MUST scan the pipeline project directory for the
following files:
requirements.txt:
--requirements_file pipeline option.setup.py:
--setup_file pipeline option. This is
critical if the pipeline uses local modules or packages.When launching Python Pipelines with a Flex Template, if the Flex Template
image is also the SDK Container image (Single Docker Image Configuration),
then you MUST supply the image in the sdk_container_image parameter.
Confirm the launch command with the user.
your-gcp-project-id) for GCP
resources when drafting run scripts or configs. Action: If values are
unknown, proactively run commands like gcloud config get-value project to
find active resources to pre-fill scripts for the user. Confirm the values
with the user before proceeding.YOU MUST use this section when the user asks about performance of their dataflow pipelines. This can be used to debug issues like pipeline slowness, pipeline failures, etc.
Understand User Request: Extract Job ID, Project ID, Transform Name (optional), and Time Window.
Transform Name Mapping: If the user requires transform-based debugging,
map user-provided Transform Names to actual Dataflow stage or ptransform
and apply to filters while querying:
This mapping can be extracted from gcloud dataflow jobs describe JOB_ID --full --format="json(pipelineDescription.executionPipelineStage)".
name property at the parent stage level. This
matches "F[digit]" (e.g. "F6").componentTransform array, read
precisely from userName or originalTransform (e.g.
"RateLimitAndLog/ParMultiDo(RateLimitAndLog)"). and use it as
ptransform.resource.labels.step_id="[Extracted ptransform name]".metric.labels.ptransform="[Extracted ptransform name]" or
metric.labels.stage="[Extracted stage_id]".Query Telemetry:
Analysis:
Output: Provide a synthesized summary with symptoms, potential root
cause, and links to relevant code transforms (using file:///... format).
Follow this template to structure your response: