تشغيل أي مهارة في Manus بنقرة واحدة

$pwd:

di-agent-flow-datastage

Name: Di Agent Flow Datastage
Author: IBM

// Reference for creating DataStage (batch) flows with the watsonx.data integration SDK. The SDK is verbose and permits exhaustive stage and property access: use it to author flows pyflow's compact DSL can't express, and to edit or optimize existing DataStage flows, including ones bootstrapped with pyflow.

تشغيل في Manus

$ git log --oneline --stat

stars:٠

forks:٠

updated:٢٨ مايو ٢٠٢٦ في ١٩:٥٤

مستكشف الملفات

2 ملفات

SKILL.md

readonly

name	di-agent-flow-datastage
description	Reference for creating DataStage (batch) flows with the watsonx.data integration SDK. The SDK is verbose and permits exhaustive stage and property access: use it to author flows pyflow's compact DSL can't express, and to edit or optimize existing DataStage flows, including ones bootstrapped with pyflow.

Create / Edit DataStage Flows

DataStage batch flows are authored by writing SDK-style Python code and submitting via create_or_update_datastage_flow. Auth, project context, persistence, and compilation are handled automatically.

Where to look

SDK conventions (method signatures, stage config, link schemas, column types, connection binding, key rules, full example) → references/sdk-conventions.md
Stage selection → recommend_datastage_stages(subutterances=[...])
Stage property names and accepted values → datastage_property_lookup(requests=[{"stage": "..."}])
Per-stage deep-dive → di-agent-knowledge-engine-datastage skill stages/
Flow optimization → di-agent-knowledge-engine-datastage skill optimization/overview.md
Custom stages (C/C++, Java) → BuildopStage.md, JavaIntegrationStage.md

Versioning

Use duplicate_asset(asset_id=..., asset_type="datastage_flow", project_id=...) as a safety net before making changes that could break a flow.

Editing an existing flow: Before modifying, duplicate the flow as a backup with a timestamped name ("{name} [backup YYYY-MM-DD]"). Work on the original — the backup is your rollback point. Keep at most one backup per flow to avoid clutter (delete old ones via delete_asset).

Iterating on a new flow: After a successful create_or_update_datastage_flow, if the user requests further edits, snapshot the working state first via duplicate_asset. This way you can restore if subsequent changes break it.

Restoration: Not a single tool call. Delete the broken flow, then duplicate the backup back to the original name — or simply point the user to the backup.

Caveats: Duplicating a flow does NOT duplicate any jobs or schedules attached to it. Be cautious when editing flows with active job runs. Cross-flow references (e.g. sub-flows) are not preserved in the backup.

Guardrails

Fetch flows by flow_id, never by name — name returns incomplete stage data
Never guess stage types, property names, or enum values — use recommend_datastage_stages / datastage_property_lookup
Bundle all changes into a single submission — successive submissions overwrite each other
Name collisions on create — ask user to confirm overwrite or rename, never retry automatically
After a successful create/update, surface the returned flow_link from the tool result as a clickable link so the user can open the flow in the UI
Stage property names in prose — use the User friendly name from datastage_property_lookup (e.g. "Number of rows (per partition)"), not the internal identifier (nrecs); show the internal name only in code blocks or when the user asks for the SDK property

related-skills.json

نفس المستودع

di-agent-bug-report.md

from "IBM/ibm-watsonx-data-integration-skills"

Generates a Markdown bug report for an IBM watsonx.data integration session. User can invoke directly. The agent may propose it (and must wait for explicit acceptance) only after exhausting recovery options on a failure. Skip for non-watsonx.data integration sessions.

2026-05-280

di-agent-flow-pyflow.md

from "IBM/ibm-watsonx-data-integration-skills"

Complete API spec for pyflow, IBM's LLM-only Python DSL for authoring new batch or streaming flows on DataStage or StreamSets. The compact surface is built for high LLM authoring reliability — bootstrap here first, and fall back to the verbose engine-specific SDK only when pyflow cannot express a needed feature.

2026-05-280

di-agent-knowledge-engine-datastage.md

from "IBM/ibm-watsonx-data-integration-skills"

Q&A reference for the DataStage parallel engine — parallelism, partitioning theory, APT configuration files, concurrent job execution, restart/recovery, disk/resource tuning, dataset performance, flow optimization (partitioning/sorting/memory), and per-stage semantics. Use for conceptual engine questions and stage property lookups regardless of authoring tool.

2026-05-280

di-agent-knowledge-engine-streamsets.md

from "IBM/ibm-watsonx-data-integration-skills"

Reference for StreamSets Data Collector engines and StreamSets environments — StreamSets environment configuration, StreamSets engine deployment (Docker/Podman), StreamSets job execution, StreamSets engine communication methods (tunneling/direct), StreamSets high availability and failover, StreamSets monitoring and resource management. Use ONLY when the user explicitly mentions StreamSets, Data Collector, or a StreamSets-specific concern.

2026-05-280

di-agent-query-substrait.md

from "IBM/ibm-watsonx-data-integration-skills"

Use when the user asks to generate a Substrait query plan, create a Substrait plan from natural language, convert a data query to Substrait JSON, write a Substrait DSL pipeline, translate a data request into a query plan, process a test entry from a JSONL dataset. Covers writing DSL code, calling MCP tools (compile_substrait_dsl, get_substrait_dsl_examples, load_test_entry), and self-correcting on compile errors. Trigger on: "generate substrait", "generate substrait dsl", "generate functional plan", "generate fp", "process entry", "run entry", "generate dsl".

2026-05-280

package.json

"author": "IBM"

"repository": "IBM/ibm-watsonx-data-integration-skills"

فتح مستودع GitHub عرض مستودعات المنشئ

$ install --global

$ download --local

تشغيل في Manus

Create / Edit DataStage Flows

Where to look

SDK conventions (method signatures, stage config, link schemas, column types, connection binding, key rules, full example) → references/sdk-conventions.md

Stage selection → recommend_datastage_stages(subutterances=[...])

Stage property names and accepted values → datastage_property_lookup(requests=[{"stage": "..."}])

Per-stage deep-dive → di-agent-knowledge-engine-datastage skill stages/

Flow optimization → di-agent-knowledge-engine-datastage skill optimization/overview.md

Custom stages (C/C++, Java) → BuildopStage.md, JavaIntegrationStage.md

Versioning

Use duplicate_asset(asset_id=..., asset_type="datastage_flow", project_id=...) as a safety net before making changes that could break a flow.

Restoration: Not a single tool call. Delete the broken flow, then duplicate the backup back to the original name — or simply point the user to the backup.

Guardrails

Fetch flows by flow_id, never by name — name returns incomplete stage data

Never guess stage types, property names, or enum values — use recommend_datastage_stages / datastage_property_lookup

Bundle all changes into a single submission — successive submissions overwrite each other

Name collisions on create — ask user to confirm overwrite or rename, never retry automatically

After a successful create/update, surface the returned flow_link from the tool result as a clickable link so the user can open the flow in the UI

Stage property names in prose — use the User friendly name from datastage_property_lookup (e.g. "Number of rows (per partition)"), not the internal identifier (nrecs); show the internal name only in code blocks or when the user asks for the SDK property

di-agent-flow-datastage

Create / Edit DataStage Flows

Where to look

Versioning

Guardrails

المزيد من هذا المستودع

المزيد من هذا المستودع

Create / Edit DataStage Flows

Where to look

Versioning

Guardrails