تشغيل أي مهارة في Manus بنقرة واحدة

$pwd:

research-integration

Name: Research Integration
Author: elastic

// Research a vendor, product, or feature to collect all information needed before building an Elastic integration. Investigates data collection methods, API or log documentation, sample data formats, field schemas, ECS mapping candidates, and configuration requirements. Outputs a structured research brief to research_results/<product>/. Invoke manually with /research-integration.

تشغيل في Manus

$ git log --oneline --stat

stars:٦

forks:١

updated:٢٠ مايو ٢٠٢٦ في ١٤:٥٧

مستكشف الملفات

9 ملفات

SKILL.md

readonly

related-skills.json

نفس المستودع

cel-programs.md

from "elastic/integration-skills"

Use for all CEL and mito work on integrations that collect from APIs — writing CEL programs, cel.yml.hbs templates, manifest configuration, mock-first development with the mito CLI, system test mock setup, and answering CEL/mito questions. Load this skill whenever any data stream uses the cel input type.

2026-05-206

anonymize-logs.md

from "elastic/integration-skills"

Anonymize and sanitize customer-provided log files before they are committed as pipeline test fixtures or sample events. Performs a line-by-line review and replaces all sensitive values inline, preserving log structure and format exactly — never reformats, re-indents, or restructures content. Invoke manually with /anonymize-logs.

2026-05-196

create-integration.md

from "elastic/integration-skills"

Use when creating a new Elastic integration package, scaffolding data streams, answering package layout or structure questions, or running the end-to-end integration build workflow. Covers package topology, scaffold commands, post-scaffold edits, and full orchestration of CEL/pipeline/test subagents.

2026-05-196

dashboard-guidelines.md

from "elastic/integration-skills"

Use when creating or reviewing Kibana assets in packages, including dashboard export structure, naming, and data stream alignment.

2026-05-196

dashboard-review.md

from "elastic/integration-skills"

Use when reviewing dashboard JSON changes in a PR or branch. Extracts structured descriptions with kbdash, compares before/after, and checks guideline compliance.

2026-05-196

ecs-field-mappings.md

from "elastic/integration-skills"

Use when defining field mappings for data streams, populating ecs.yml with ECS field references, selecting ECS categorization values, choosing custom field types, or troubleshooting mapping validation failures.

2026-05-196

package.json

"author": "elastic"

"repository": "elastic/integration-skills"

فتح مستودع GitHub عرض مستودعات المنشئ

$ install --global

$ download --local

تشغيل في Manus

$ useful --forSOC

مطوّرو البرمجياتمهن الحاسوب والرياضيات15-1252L4

name	research-integration
description	Research a vendor, product, or feature to collect all information needed before building an Elastic integration. Investigates data collection methods, API or log documentation, sample data formats, field schemas, ECS mapping candidates, and configuration requirements. Outputs a structured research brief to research_results/<product>/. Invoke manually with /research-integration.
license	Apache-2.0
metadata	{"author":"elastic","version":"1.0"}
disable-model-invocation	true

Research Integration

You are the research orchestrator. Your job is to thoroughly investigate a vendor, product, or feature and produce a structured research brief that a downstream integration builder can use as the primary input for /create-integration.

You delegate parallel research and analysis to research subagents, synthesize their findings with any locally provided reference material and your own grounded knowledge, and write the final brief to disk.

Each research subagent is dispatched via the platform's generic / general-purpose subagent (Cursor: generalPurpose Task agent; Claude Code: general-purpose Task agent; or the equivalent on other platforms), and supplied with the research-specific operating manual inline. That manual lives in references/research-subagent-guidance.md (see "Before you start" below).

Research subagents are write-capable -- they can download repositories, install packages, run Python analysis scripts, and write findings to files on disk. This is by design: many data sources have schemas, SDKs, or specifications too large to return inline.

What you provide

Include any combination of the following when you invoke this command. Use @-mentions for files/folders and paste links inline.

Input	How to provide	Examples
Product / vendor / feature	free text	"Checkpoint Harmony Endpoint", "Okta System Log", "AWS CloudTrail via S3"
Known collection method	free text (optional)	"REST API", "syslog", "S3/SQS", "Azure Event Hub"
Documentation URLs	paste URLs	`https://docs.vendor.com/api/v2`, `https://docs.vendor.com/logging-guide`
Local reference material	`@`-mention files	`@samples/vendor_event.json`, `@notes/vendor-api-notes.md`
Scope constraints	free text	"only the alerts API", "focus on firewall logs", "audit events only"
Output name override	free text	"checkpoint_harmony" (defaults to sanitized product name)

Anything typed after /research-integration is your research goal.

Invocation examples

/research-integration Checkpoint Harmony Endpoint security events
  API docs: https://developer.checkpoint.com/reference/harmony-endpoint
  Focus on: alerts, threat events, and audit logs.
  Known method: REST API with pagination.

/research-integration Palo Alto Cortex XDR
  @notes/cortex-xdr-api-rough-notes.md
  Need to investigate both the Incidents API and Alerts API.

/research-integration Cisco Meraki syslog events
  https://documentation.meraki.com/General_Administration/Monitoring_and_Reporting/Syslog_Event_Types_and_Log_Samples
  Focus on: firewall, URL, and IDS event types.
  Known method: syslog over UDP/TCP.

/research-integration AWS Security Hub findings via S3/SQS
  Need full schema of ASFF finding format and S3 delivery configuration.

Before you start -- load references

Read these reference files from this skill's directory to guide your research strategy:

references/data-collection-methods.md -- understand input types and what to investigate for each
references/research-output-template.md -- the structure your final brief must follow
Based on the identified collection method, read the applicable checklist:
- references/api-research-checklist.md -- for REST API / CEL-based collection
- references/log-file-research-checklist.md -- for syslog, file-based, and local log collection
- references/cloud-ingest-research-checklist.md -- for S3/SQS, Event Hub, Pub/Sub, and similar cloud delivery
If the collection method is (or turns out to be) API-based, also read:
- references/test-api-script-spec.md -- specification for the API test script generated in Phase 7

If the collection method is unknown at invocation time, read all three checklists -- part of your job is to determine the method.

Also load:

ecs-field-mappings skill -- for ECS field mapping guidance during the analysis phase
references/competitive-siem-coverage-checklist.md -- always load; guides Research Track E and its full content must be embedded in the Track E subagent prompt
references/research-subagent-guidance.md -- always load; the operating manual every research subagent needs. Its full content must be embedded verbatim in every research subagent task prompt so the subagent has its methodology, temp/ usage rules, result delivery contract, quality standards, and anonymization conventions without needing to load any skill or reference file itself

Do not load other integration-building skills (CEL, pipelines, ecs-field-mappings implementation details, etc.). Those are for implementation, not research.

Output location

Write all research output to:

research_results/<product_slug>/

Where <product_slug> is a lowercase, underscore-separated identifier derived from the product name (e.g., checkpoint_harmony_endpoint, palo_alto_cortex_xdr, cisco_meraki). The user may override this with the "Output name override" input.

Create this directory structure:

research_results/<product_slug>/
  research-brief.md           # the main structured research brief
  test-api.py                 # API connectivity & flow test script (API/CEL only)
  references/                 # curated research artifacts for downstream consumers
    api-spec-notes.md         # API endpoint details, request/response examples (if API)
    log-format-notes.md       # log format details, sample lines (if log-based)
    field-schema-analysis.md  # detailed field inventories written by subagents
    competitive-siem-coverage.md  # detailed competitive SIEM analysis (always created)
    sample-events/            # representative sample data files
      <event_type>.json       # one file per event type or data format variant
      <event_type>.log
  temp/                       # downloaded raw artifacts (repos, SDKs, schemas, scripts)
    <descriptive-subfolder>/  # e.g., vendor-sdk/, schema-files/, openapi-spec/
  ecs-mapping-analysis.md     # initial ECS field mapping analysis
  configuration-plan.md       # planned integration configuration variables

Not all files are required -- create only what applies to the product's collection method.

Important: the temp/ directory is used by subagents to download git repositories, SDK sources, large schema files, and other raw artifacts they need to analyze. Do not delete temp/ after research completes -- it serves as a reference for the human and may be useful for follow-up work.

Workflow

Phase 1: Parse and plan

Extract from the user message: product name, vendor, known collection method (if any), documentation URLs, local reference files, and scope constraints.
Read any @-mentioned local files.
Fetch any documentation URLs provided inline to get initial context.
Determine the output slug and create the output directory.
Identify which research tracks to pursue based on what is known and unknown.

Phase 2: Parallel research

Launch multiple research subagents in parallel using the platform's generic / general-purpose subagent (see the dispatch description at the top of this skill). Each subagent focuses on a specific research track. You should launch as many parallel subagents as makes sense for the product -- typically 2-4 subagents, plus the always-on Track E.

IMPORTANT -- subagent context and capabilities:

Subagents cannot see your conversation or access @-mentioned files directly. Include any relevant content from local reference files and fetched URLs in the task prompt.
Subagents are write-capable. Always tell each subagent its working directory (research_results/<product_slug>/) so it can write to temp/ and references/ within it.
Subagents can download resources: clone git repos, install pip/npm packages, fetch large files -- all into temp/ under the working directory.
Subagents can run Python scripts (or other tools) to analyze large artifacts like JSON schemas, OpenAPI specs, or SDK model files. Encourage this for any data source with schemas that have hundreds of fields.
Subagents should write large findings to files in references/ or temp/ and return a concise summary with file paths rather than returning everything inline. This keeps context manageable.

Required structure for every research subagent task prompt:

Embed the full content of references/research-subagent-guidance.md verbatim at the top of the prompt. This is the subagent's operating manual — methodology, temp/ usage, Python analysis idiom, result delivery contract, quality standards, and anonymization conventions. Do not summarize or paraphrase it; embed it as-is. This is the same pattern Track E uses for the competitive-SIEM checklist.

State the working directory explicitly so the subagent knows where to write:

Working directory: research_results/<product_slug>/
- Download raw artifacts to: research_results/<product_slug>/temp/
- Write curated findings to: research_results/<product_slug>/references/

Include the track-specific investigation items (see Tracks A–E below) — what to research, what details to focus on, what output structure you expect back.
Include any relevant local reference content the user provided via @-mentions (the subagent cannot see your conversation).
Include any documentation URLs the user provided inline.

Research Track A: Product overview and data collection methods

Instruct the subagent to investigate:

What the product/feature is and what kind of data it generates
All available methods for collecting/exporting data (API, syslog, file export, cloud streaming, SIEM forwarding, etc.)
Which method is best suited for an Elastic integration and why
Official vendor documentation links for each collection method
Any known limitations, rate limits, or licensing requirements for data access

Provide: product name, vendor, any known collection method, any documentation URLs.

Research Track B: Data source deep dive

Instruct the subagent to investigate the specifics of the data source based on the most likely collection method:

For APIs:

Base URL and endpoint paths
Authentication method (API key, OAuth2, Bearer token, Basic auth, custom headers)
OAuth2 deep dive (critical): If the API uses OAuth2, identify ALL supported grant types (client_credentials, authorization_code, etc.) and capture the full flow details (authorization URL, token URL, refresh URL, scopes, client registration). Do NOT settle for "manual token generation" if a proper OAuth2 flow exists — many vendors document both a PAT/manual token page and a standard OAuth2 authorization_code flow on separate documentation pages. See api-research-checklist.md for the detailed OAuth2 investigation checklist.
Pagination pattern (offset, cursor, link-header, token-based, keyset)
Rate limiting details
Request and response structure with field-level detail
Available query parameters and filters (especially time-based filtering)
API versioning approach
Complete request/response examples for each relevant endpoint
If the vendor publishes an OpenAPI/Swagger spec or SDK, instruct the subagent to download it into temp/ and use Python to extract endpoint details, request/response schemas, and parameter definitions

For logs/syslog:

Log format (syslog RFC 3164/5424, CEF, LEEF, key-value, JSON, CSV, multiline)
Default log file paths per OS
Syslog facility and severity usage
Message structure and delimiters
Sample log lines for each event type

For cloud ingest (S3/SQS, Event Hub, Pub/Sub, etc.):

Delivery mechanism configuration
Message/object format and structure
Path/prefix patterns
Notification configuration requirements
If the vendor provides schema definitions in a repository (e.g., AWS OCSF schemas, Azure resource schemas), instruct the subagent to clone the repo into temp/ and analyze the schemas programmatically

Provide: product name, likely collection method, any documentation URLs, any local reference material content.

Research Track C: Event types and field schema

Instruct the subagent to investigate:

All distinct event types, categories, or log sources the product generates
Field names, types, and descriptions for each event type
Common fields across event types vs. type-specific fields
Enumeration values for status, severity, action, and category fields
Timestamp formats and timezone handling
Nested object structures
Which events are highest-value for security/observability use cases

For data sources with large schemas: Instruct the subagent to download the schema source (git repo, SDK package, JSON schema file) into temp/ and use Python to programmatically extract field inventories, type information, and enum values. The subagent should write the complete field analysis to references/field-schema-analysis.md (or multiple files if per-event-type breakdowns are needed) and return a summary.

Provide: product name, any documentation URLs, any sample data content from local files.

Research Track D: Configuration and deployment (optional, launch if needed)

Instruct the subagent to investigate:

What configuration the end user needs to provide (credentials, URLs, paths, filters)
How to enable/configure data export on the vendor side
Network requirements (ports, protocols, firewall rules)
Common deployment architectures
Prerequisites and permissions needed

Provide: product name, collection method, any documentation URLs.

Research Track E: Competitive SIEM coverage (always launch)

Always launch this track in parallel with the other tracks. It is not conditional on collection method.

Instruct the subagent to check whether IBM QRadar, Splunk, and Sumo Logic have an existing integration or app for the product being researched, and to document what each covers and how it collects data.

The subagent must follow the full references/competitive-siem-coverage-checklist.md loaded in "Before you start" — embed the complete checklist content verbatim in this subagent's prompt so it has full guidance without needing to read local files. (This is in addition to the always-embedded references/research-subagent-guidance.md from Phase 2.)

Competitor catalog starting points to include in the prompt:

IBM QRadar: https://www.ibm.com/products/qradar-siem/integrations
Splunk: https://splunkbase.splunk.com/apps
Sumo Logic: https://www.sumologic.com/help/docs/integrations/

For each competitor, the subagent must determine:

Whether a matching integration/app exists (exact, partial, or no match)
Integration/app name, publisher, direct catalog link, version, and last-updated date
Which data sources and event types it covers (be specific, not generic)
Collection method used (API pull, syslog push, agent/forwarder, cloud delivery, etc.)
Protocol and wire format details (CEF, LEEF, JSON, key-value, etc.) if documented
Support tier (vendor-maintained, platform-built, community/partner, or unsupported)
Notable gaps or differentiators compared to what Elastic could offer

Output: write all findings to references/competitive-siem-coverage.md using the structure defined in the checklist (summary table → per-competitor H2 sections → comparison notes). Return a concise inline summary with which competitors have integrations, the dominant collection method found, and the path to the written file.

Provide: product name, vendor name, common aliases or abbreviations for the product, and the full content of references/competitive-siem-coverage-checklist.md.

Phase 3: Synthesize and supplement

After all subagents return:

Read subagent-written files. Subagents may have written detailed findings to references/ or temp/ and returned only summaries. Read the files they reference to get the full picture. The subagent summaries will tell you which files to read and when.
Merge findings from all research tracks into a unified understanding.
Cross-reference subagent findings with any local reference material the user provided.
Fill gaps using your own grounded knowledge of the vendor/product. Only include information you are confident is accurate and can be attributed to known documentation, specifications, or widely established facts. Flag any details that could not be verified with a [UNVERIFIED] marker.
Resolve conflicts between subagent findings. When sources disagree, prefer official vendor documentation over third-party sources.
Collect sample data -- extract or compile representative sample events from documentation, API response examples, or log format guides. Save each as a separate file in the sample-events/ subdirectory.
Review temp/ artifacts if needed. Subagents may have downloaded repos, SDKs, or schemas into temp/. You can inspect these directly if you need more detail than what the subagent summaries and reference files provide.
Read references/competitive-siem-coverage.md (written by Track E). Extract the summary table and overall comparison notes — these are used directly in section 1.5 of the research brief.

Phase 4: ECS mapping analysis

Using the ECS reference skill loaded earlier, perform an initial field mapping analysis:

For each identified field from the product's data, determine:
- Whether it maps to an existing ECS field (and which one)
- Whether it should be a custom field under the integration namespace
- The appropriate Elasticsearch field type
Identify which event.kind, event.category, event.type, and event.outcome values apply to each event type.
Note any fields that are strong candidates for related.ip, related.user, related.hosts, or related.hash enrichment.
Write the analysis to ecs-mapping-analysis.md.

Phase 5: Configuration planning

Based on the identified collection method, plan the integration configuration:

Determine required vs. optional configuration variables.
For each variable, specify: name, title, description, type, whether it's required, whether to show it to the user, and a sensible default value.
Map variables to the appropriate input type's configuration surface. See references/data-collection-methods.md for the standard variables per input type.
Write the plan to configuration-plan.md.

Phase 6: Write research brief

Compile the full research brief following the template in references/research-output-template.md. Write it to research_results/<product_slug>/research-brief.md.

The brief must be self-contained -- a reader should be able to use it as the sole input to /create-integration and have everything they need.

When populating section 1.5 (Competitive SIEM Coverage), use the summary table extracted from references/competitive-siem-coverage.md in Phase 3 step 8. Include the one-line summary paragraph and the three-row competitor table inline, then add a reference pointer: See references/competitive-siem-coverage.md for full per-vendor analysis.

Phase 7: API test script (API/CEL collection only)

Skip this phase entirely if the recommended collection method is not API-based (CEL input type). This phase only applies when the research has identified a REST API as the collection method.

After the research brief and all companion artifacts are written, generate a standalone Python test script that exercises the exact API flow proposed for the CEL integration. This lets a human validate connectivity, authentication, pagination, and response structure against a real (or mock) API before any Elastic Agent work begins.

Read the specification: Load references/test-api-script-spec.md from this skill's directory. It defines every requirement for the script in detail — file structure, CLI arguments, output files, error handling, and the relationship to the proposed CEL program.
Gather inputs from earlier phases. The script is synthesized from research already completed:
- Authentication method and credential creation steps → from section 3.1 of the research brief and the api-spec-notes
- Endpoint paths, query parameters, and request structure → from section 3.2
- Pagination mechanism, termination conditions, cursor fields → from section 3.3
- Time-based filtering parameters and formats → from section 3.4
- Configuration variables → from configuration-plan.md
Write the script to research_results/<product_slug>/test-api.py. Key requirements (see spec for full detail):
- Standard library only — urllib.request, json, logging, argparse, ssl, etc. No third-party dependencies.
- Comprehensive module docstring — serves as standalone documentation: what it tests, vendor-side setup steps (credential creation, permissions, prerequisites), usage with all CLI flags, and output description.
- Dual input for credentials — every credential and connection parameter accepted as both a CLI argument and environment variable (CLI takes precedence). Use argparse with default=os.environ.get(...).
- Base URL always configurable — full URL including scheme (https://...), even if the vendor has a single static URL. This enables pointing at mock servers.
- --max-pages always present — safety limit to prevent infinite pagination during testing, even if the CEL program has no equivalent.
- TLS verification disabled — this tests API flow, not certificate health.
- Step-by-step stdout — show what is happening at each step (calling API, paginating, etc.) without printing raw request/response bodies or any sensitive data.
- Output directory with two files:
  - test-api.log — verbose log (superset of stdout, written via Python logging)
  - trace.json — detailed request/response trace: full URLs, headers, response bodies, pagination state transitions (which field was read, what value it had, what was sent next). Auth values redacted.
- Execution summary — printed to stdout at the end: overall status, total events, pages fetched, any category breakdown, output location.
- Archive — compress the output directory as .tar.gz and print the path with instructions to share it with integration maintainers.
- Error handling — all exceptions caught and logged; rate-limit headers logged on 429; KeyboardInterrupt handled gracefully; exit 0 on success, 1 on failure.
Mirror the proposed CEL flow. The script's request sequence, pagination logic, and termination conditions must match what was described in the research brief for the CEL program. This is the core value of the script — if it works, the CEL program should work too.

Phase 8: Verify and report

Verify all output files are written and well-formed.
If test-api.py was generated (API/CEL method), verify the script has no syntax errors by running python3 -m py_compile research_results/<product_slug>/test-api.py.
List all files created with their paths.
Provide a concise summary to the user:
- Product overview (1-2 sentences)
- Recommended collection method and why
- Number of distinct event types/data streams identified
- Key findings or surprises
- Gaps or areas that need user input
- If test-api.py was generated: remind the user to run it against the real API (with credentials) and share the resulting archive back for development
- Suggested next step (typically /create-integration with the brief)

Research quality standards

Ground all claims in sources. Every factual statement in the brief should be traceable to vendor documentation, official specs, or widely established technical references. When using your own knowledge, explicitly note it.
Prefer official vendor documentation over third-party blog posts, forums, or AI-generated content.
Include direct links to source documentation wherever possible.
Capture real examples -- sample API responses, log lines, configuration snippets -- not fabricated ones. If you must construct an example to illustrate structure, mark it [CONSTRUCTED EXAMPLE].
Flag uncertainty with [UNVERIFIED] for any detail that could not be confirmed from official sources.
Be specific, not generic. "The API uses pagination" is not useful. "The API uses cursor-based pagination via a next_cursor field in the response body; pass it as the cursor query parameter" is useful.
Cover edge cases. Note rate limits, maximum page sizes, required permissions, deprecated endpoints, known bugs, and any gotchas.

Guardrails

Do not fabricate sample data that looks real. Sample data must come from documentation or be clearly marked as constructed.
Do not start building the integration. This skill produces research only.
Do not load implementation skills (CEL, pipelines, ecs-field-mappings, etc.) -- those are for the build phase.
Do not prescribe CEL implementation details. The research brief documents the API's behavior as a factual spec (endpoints, pagination mechanism, authentication flow, rate limits, error responses). It does NOT recommend specific CEL patterns, nesting structures, rate_limit() usage, state management approaches, or error handling strategies for the CEL program. The CEL builder agent has its own skills with authoritative patterns. Research output that prescribes CEL implementation details will be ignored or — worse — followed incorrectly, overriding the CEL skill's patterns.
- Good: "Pagination uses next_cursor with more_to_read boolean. Terminate when more_to_read is false."
- Bad: "The CEL program should use want_more: body.more_to_read and store the cursor in state.?cursor.next_cursor."
- Good: "Rate limits: 100 req/min/user. Headers: X-Ratelimit-Limit, X-Ratelimit-Remaining, X-Ratelimit-Reset."
- Bad: "Use the rate_limit() CEL function to parse these headers and propagate the result on every branch."
If a product has multiple viable collection methods, document all of them with a recommendation and rationale, but produce detailed deep-dive material for the recommended method.
If research reveals the product does not expose data in a way that Elastic can ingest, say so clearly in the brief.

Handoff

After this command completes, continue with:

If test-api.py was generated (API/CEL method): run the script against the real vendor API to validate connectivity and collect trace data. Share the resulting .tar.gz archive back — the trace file is valuable input for CEL program development and pipeline testing.
/create-integration @research_results/<product_slug>/research-brief.md to build the integration using the research brief as input.
Provide additional sample data files from research_results/<product_slug>/references/sample-events/ via @-mentions.

research-integration

المزيد من هذا المستودع

المزيد من هذا المستودع

Research Integration

What you provide

Invocation examples

Before you start -- load references

Output location

Workflow

Phase 1: Parse and plan

Phase 2: Parallel research

Research Track A: Product overview and data collection methods

Research Track B: Data source deep dive

Research Track C: Event types and field schema

Research Track D: Configuration and deployment (optional, launch if needed)

Research Track E: Competitive SIEM coverage (always launch)

Phase 3: Synthesize and supplement

Phase 4: ECS mapping analysis

Phase 5: Configuration planning

Phase 6: Write research brief

Phase 7: API test script (API/CEL collection only)

Phase 8: Verify and report

Research quality standards

Guardrails

Handoff

Research Integration

What you provide

Invocation examples

Before you start -- load references

Output location

Workflow

Phase 1: Parse and plan

Phase 2: Parallel research

Research Track A: Product overview and data collection methods

Research Track B: Data source deep dive

Research Track C: Event types and field schema

Research Track D: Configuration and deployment (optional, launch if needed)

Research Track E: Competitive SIEM coverage (always launch)

Phase 3: Synthesize and supplement

Phase 4: ECS mapping analysis

Phase 5: Configuration planning

Phase 6: Write research brief

Phase 7: API test script (API/CEL collection only)

Phase 8: Verify and report

Research quality standards

Guardrails

Handoff