Name: Pipeline Script
Author: GuanceCloud

pipeline-script

// Create, refine, and validate GuanceCloud/DataKit pipeline scripts from raw message samples. Use when Codex needs to author pipeline .p scripts, choose json/load_json versus grok extraction for text or JSON messages, run pipeline-go validation, or debug parsing/runtime/extraction errors.

name	pipeline-script
description	Create, refine, and validate GuanceCloud/DataKit pipeline scripts from raw message samples. Use when Codex needs to author pipeline .p scripts, choose json/load_json versus grok extraction for text or JSON messages, run pipeline-go validation, or debug parsing/runtime/extraction errors.

Pipeline Script

Workflow

Treat the sample input as a single message value unless the user provides more point context. Pipeline's original input _ maps to the message field.
Inspect the message shape before writing code:
- Valid JSON object/array: prefer json(_, path, target_key) for a small fixed set of fields.
- JSON that needs repeated access, mutation, or conditionals: use data = load_json(_), then read data["field"].
- Plain text or log lines: use grok(_, "..."); add :int, :float, or :bool in grok captures when the type is known.
- Text with an embedded JSON fragment: grok the fragment into a temporary key, then use json(temp_key, ...) or load_json(temp_key).
- Text containing key=value segments: grok the surrounding structure, then use kv_split() on the segment so key order can vary.
If function syntax is uncertain, query embedded docs before guessing:

./bin/pipeline-check --search-functions json --function-lang all
./bin/pipeline-check --function-doc grok

Write the script with stable field names. Keep high-cardinality values as fields unless the user asks for tags. Use default_time(time) after extracting a timestamp intended to become point time.
For non-trivial grok, split long regex fragments into several short add_pattern() definitions and validate from a .p file with --script. Keep every quoted pipeline string on one physical line; never wrap inside the quotes. Use inline --cmd only for short scripts or quick experiments.
Validate with the skill-local checker, always requiring the fields that prove extraction worked:

./bin/pipeline-check --script ./example.p --message 'raw message here' --require-key service

For inline drafts:

./bin/pipeline-check --cmd 'json(_, service)' --message '{"service":"api"}' --require-key service

If validation returns ok: false, fix the script and rerun. A grok mismatch can otherwise look like a successful run with missing fields, so use --require-key for every expected output key and --expect key=value for important sample values.
Report the final .p script content and the checker execution result. Prefer result.extracted_fields, result.extracted_tags, result.time, and errors; the original input is available as result.extracted_fields.message. Inspect output.fields/output.tags only when the full transformed point is needed. Do not include shell line-continuation backslashes in the script itself. Do not claim the script is validated unless ./bin/pipeline-check exits successfully.

References

Read pipeline-patterns.md when you need function syntax, extraction examples, or the full checker option list.

Read production-guidelines.md when the user asks for production-ready scripts, robust parsing, multiple samples, or avoiding overfitting.

Read common-log-examples.md when you need validated starter scripts for common JSON app logs, Nginx access logs, SSH/syslog lines, Java application logs, or logfmt messages.

Read troubleshooting.md when parser errors mention unknown escape sequence, grok brackets, copied shell continuations, missing time, or required keys not found.

Read datakit-pipeline-examples.md when /usr/local/datakit/pipeline is available and you need the built-in script inventory, shared patterns, caveats, or routing to narrower DataKit references.

For DataKit built-in examples, prefer the narrowest reference:

datakit-web-examples.md: nginx.p, apache.p, tomcat.p.
datakit-database-examples.md: mysql.p, mongodb.p, postgresql.p, dameng.p, kingbase.p, sqlserver.p.
datakit-middleware-examples.md: redis.p, elasticsearch.p, kafka.p, rabbitmq.p, solr.p, consul.p, jenkins.p, tdengine.p.

Pipeline Script

Workflow

Treat the sample input as a single message value unless the user provides more point context. Pipeline's original input _ maps to the message field.

Inspect the message shape before writing code:

Valid JSON object/array: prefer json(_, path, target_key) for a small fixed set of fields.
JSON that needs repeated access, mutation, or conditionals: use data = load_json(_), then read data["field"].
Plain text or log lines: use grok(_, "..."); add :int, :float, or :bool in grok captures when the type is known.
Text with an embedded JSON fragment: grok the fragment into a temporary key, then use json(temp_key, ...) or load_json(temp_key).
Text containing key=value segments: grok the surrounding structure, then use kv_split() on the segment so key order can vary.

If function syntax is uncertain, query embedded docs before guessing:

./bin/pipeline-check --search-functions json --function-lang all ./bin/pipeline-check --function-doc grok

Write the script with stable field names. Keep high-cardinality values as fields unless the user asks for tags. Use default_time(time) after extracting a timestamp intended to become point time.

For non-trivial grok, split long regex fragments into several short add_pattern() definitions and validate from a .p file with --script. Keep every quoted pipeline string on one physical line; never wrap inside the quotes. Use inline --cmd only for short scripts or quick experiments.

Validate with the skill-local checker, always requiring the fields that prove extraction worked:

./bin/pipeline-check --script ./example.p --message 'raw message here' --require-key service

For inline drafts:

./bin/pipeline-check --cmd 'json(_, service)' --message '{"service":"api"}' --require-key service

If validation returns ok: false, fix the script and rerun. A grok mismatch can otherwise look like a successful run with missing fields, so use --require-key for every expected output key and --expect key=value for important sample values.

Report the final .p script content and the checker execution result. Prefer result.extracted_fields, result.extracted_tags, result.time, and errors; the original input is available as result.extracted_fields.message. Inspect output.fields/output.tags only when the full transformed point is needed. Do not include shell line-continuation backslashes in the script itself. Do not claim the script is validated unless ./bin/pipeline-check exits successfully.

References

Read pipeline-patterns.md when you need function syntax, extraction examples, or the full checker option list.

Read production-guidelines.md when the user asks for production-ready scripts, robust parsing, multiple samples, or avoiding overfitting.

Read common-log-examples.md when you need validated starter scripts for common JSON app logs, Nginx access logs, SSH/syslog lines, Java application logs, or logfmt messages.

Read troubleshooting.md when parser errors mention unknown escape sequence, grok brackets, copied shell continuations, missing time, or required keys not found.

Read datakit-pipeline-examples.md when /usr/local/datakit/pipeline is available and you need the built-in script inventory, shared patterns, caveats, or routing to narrower DataKit references.

For DataKit built-in examples, prefer the narrowest reference:

datakit-web-examples.md: nginx.p, apache.p, tomcat.p.

datakit-database-examples.md: mysql.p, mongodb.p, postgresql.p, dameng.p, kingbase.p, sqlserver.p.

datakit-middleware-examples.md: redis.p, elasticsearch.p, kafka.p, rabbitmq.p, solr.p, consul.p, jenkins.p, tdengine.p.

pipeline-script

Pipeline Script

Workflow

References

المزيد من هذا المستودع

المزيد من هذا المستودع

Pipeline Script

Workflow

References