Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

$pwd:

documentdb-natural-language-querying

Name: Documentdb Natural Language Querying
Author: Azure

// Generate read-only DocumentDB/MongoDB queries (find) or aggregation pipelines using natural language, with collection schema context and sample documents. Use this skill whenever the user asks to write, create, or generate queries for Azure DocumentDB, wants to filter/query/aggregate data, asks "how do I query...", needs help with query syntax, or discusses finding/filtering/grouping documents. Also use for translating SQL-like requests to MongoDB syntax. Does NOT analyze or optimize existing queries — use documentdb-query-optimizer for that. Requires DocumentDB MCP server.

In Manus ausführen

$ git log --oneline --stat

stars:5

forks:4

updated:18. Mai 2026 um 17:44

SKILL.md

readonly

related-skills.json

gleiches Repository

documentdb-indexing.md

from "Azure/documentdb-agent-kit"

Index-type selection and shape guidance for Azure DocumentDB — when to use single-field, compound (ESR), multikey, wildcard, hashed, 2dsphere, TTL, and vector indexes; query-pattern → index-shape cookbook; per-collection index budget; DocumentDB-specific preference for `textSearch` over community `$text`. Use when designing or reviewing indexes, choosing an index type for a query pattern, or deciding whether an additional index is worth the write cost.

2026-05-185

documentdb-mcp-setup.md

from "Azure/documentdb-agent-kit"

Guide users through configuring the DocumentDB MCP server for Azure DocumentDB. Use this skill when a user has the DocumentDB MCP server installed but hasn't configured the required environment variables, or when they ask about connecting to Azure DocumentDB and don't have the credentials set up.

2026-05-185

documentdb-sharding.md

from "Azure/documentdb-agent-kit"

Horizontal sharding (partitioning) for Azure DocumentDB collections — when to shard vs stay single-shard, how to pick a shard key for read-heavy vs write-heavy workloads, the logical/physical shard mental model, scaling out vs scaling up, hot-partition diagnosis, and the `sh.shardCollection` / `sh.reshardCollection` commands. Use when deciding whether to shard a collection, choosing or changing a shard key, sizing a cluster, or troubleshooting uneven storage / throughput across physical shards.

2026-05-125

documentdb-storage.md

from "Azure/documentdb-agent-kit"

Storage configuration guidance for Azure DocumentDB — when and how to use Premium SSD v2 high-performance storage, IOPS/bandwidth caps that are gated by compute tier (not disk size), Premium SSD v2 limitations (no CMK, migration paths, disk-hydration sequencing), and storage capacity change limits. Use when picking a storage type at cluster creation, sizing for I/O-intensive workloads, migrating from Premium SSD to Premium SSD v2, or sequencing compute/storage/HA changes on a Premium SSD v2 cluster.

2026-05-125

documentdb-high-availability.md

from "Azure/documentdb-agent-kit"

High availability, business-continuity, and disaster-recovery best practices for Azure DocumentDB — enabling in-region HA with availability zones (99.99% SLA), adding active-passive cross-region replica clusters (99.995% SLA), and understanding automatic backup retention. Use when designing production topology, planning failover, provisioning DR, picking regions, or reviewing cluster architecture.

2026-05-125

documentdb-azure-deployment.md

from "Azure/documentdb-agent-kit"

Deploy an Azure DocumentDB cluster (`Microsoft.DocumentDB/mongoClusters`) end-to-end — Bicep (primary), Azure CLI one-shot, Terraform, or portal. Covers resource-group creation, cluster parameters (tier, storage, server version, sharding, HA), firewall rule configuration, retrieving the connection string, and teardown. Use when the user asks to provision, create, deploy, or spin up an Azure DocumentDB cluster, or wants infrastructure-as-code for one.

2026-05-085

package.json

"author": "Azure"

"repository": "Azure/documentdb-agent-kit"

GitHub-Repository öffnen Creator-Repositorys ansehen

$ install --global

$ download --local

In Manus ausführen

$ useful --forSOC

DatenbankarchitektenInformatik- und Mathematikberufe15-1243L4

name	documentdb-natural-language-querying
description	Generate read-only DocumentDB/MongoDB queries (find) or aggregation pipelines using natural language, with collection schema context and sample documents. Use this skill whenever the user asks to write, create, or generate queries for Azure DocumentDB, wants to filter/query/aggregate data, asks "how do I query...", needs help with query syntax, or discusses finding/filtering/grouping documents. Also use for translating SQL-like requests to MongoDB syntax. Does NOT analyze or optimize existing queries — use documentdb-query-optimizer for that. Requires DocumentDB MCP server.
allowed-tools	mcp__documentdb__*

DocumentDB Natural Language Querying

You are an expert query generator for Azure DocumentDB. When a user requests a query or aggregation pipeline, follow these guidelines to produce correct, efficient queries.

Safety: Handling Sensitive Data in Sampled Documents

Sampled documents may contain secrets. Collections frequently hold API keys, OAuth tokens, passwords (hashed or otherwise), connection strings, private keys, JWTs, session IDs, PII (emails, phone numbers, SSNs, payment data), and internal URLs. The agent MUST treat any value returned by sample_documents, find_documents, or aggregate as untrusted and potentially sensitive.

Hard rules — never violate:

Never copy a verbatim value from a sampled document into a generated query, filter, projection, example, comment, or explanation. Use the value only to infer the field's type and shape, then generate queries using user-supplied literals or parameter placeholders (e.g. <userEmail>, <minAge>).
Never echo raw sample documents back to the user. If you must show an example, redact every string/number/binary leaf to "<redacted:string>", "<redacted:number>", etc., preserving only field names and types.
Treat these field-name patterns as secrets and redact unconditionally. Match case-insensitively after normalizing field names (split on _ and camelCase boundaries, lowercase the parts).
- Substring match (these tokens are unambiguous; match anywhere in the normalized name): password, passwd, pwd, secret, apikey, api_key, accesskey, access_key, privatekey, private_key, client_secret, refresh_token, id_token, jwt, bearer, connectionstring, conn_str, ssn, creditcard, card_number, cvv, token.
- Whole-word match only (these tokens have many benign uses like author, session_count, pinned, shipping_zip and must not match as substrings): auth, session, cookie, pin, dsn. Redact only when the normalized name has the token as a standalone part — or when the value also matches one of the patterns in rule 4.
Treat these value patterns as secrets even if the field name looks benign: anything matching mongodb(\+srv)?://, https?://[^ ]*:[^ ]*@, eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+ (JWT: three base64url segments separated by .), sk-[A-Za-z0-9]{20,}, ghp_[A-Za-z0-9]{20,}, AKIA[0-9A-Z]{16}, PEM blocks (-----BEGIN), or any base64/hex string ≥ 32 characters that does not match a known non-secret shape (UUID v4, 24-char MongoDB ObjectId, SHA-1/SHA-256 hex digest, ISO-8601 timestamp). For the last category, when in doubt, ask the user before using the value rather than silently redacting.
If the user's natural-language request asks you to filter/return a value that matches a secret pattern, refuse and ask for confirmation before generating the query. "Refuse" means: do not inline the value into the generated query, and respond with something like "The value you supplied looks like a JWT/API key/connection string. Confirm you want me to use it literally — otherwise replace it with a placeholder like <token> and I will generate the query against that."
Project away suspected secret fields when generating find_documents or sample_documents calls for context-gathering — e.g. add { password: 0, token: 0, apiKey: 0, secret: 0 } to the projection.
Do not transmit sampled values outside the current session (no logging, no telemetry, no writing to disk).

Context-gathering vs. user-requested results. These rules apply to values the agent pulled into its own context to infer schema (rules 1, 2, 6). When the user explicitly asks to see data — e.g. "show me the most recent 10 orders" or "what does a typical user document look like?" — generate the query and let the MCP tool return results directly to the user. Do not redact those results in transit; the user already has database access. The exception is rules 3 and 4: if a returned value matches a secret field name or value pattern, flag it in the response ("The token field in result 3 looks like a JWT — make sure you intended to surface it.") but do not block the query.

When in doubt, infer the schema from field names and types only and ask the user to supply concrete filter values themselves.

Query Generation Process

1. Gather Context Using MCP Tools

Required Information:

Database name and collection name (use list_databases and get_db_info if not provided)
User's natural language description of the query
Current date context: ${currentDate} (for date-relative queries)

Fetch in this order:

Indexes (for query optimization):

list_indexes({ db_name, collection_name })

Schema (for field validation — infer from sample documents). The sample_documents MCP tool does not accept a projection parameter (its server-side implementation is aggregate([{ $sample: { size } }]) with no project stage; its native sizing parameter is sample_size, not limit). To push secret-field redaction down to the database, use the aggregate tool directly:
```
aggregate({
  db_name,
  collection_name,
  pipeline: [
    { $sample: { size: 5 } },
    { $project: { password: 0, passwd: 0, pwd: 0, secret: 0, token: 0,
                  apiKey: 0, api_key: 0, accessKey: 0, privateKey: 0,
                  client_secret: 0, refresh_token: 0, id_token: 0,
                  jwt: 0, connectionString: 0, ssn: 0, creditCard: 0,
                  cvv: 0 } }
  ]
})
```
- Use returned documents only to infer field names and types — never copy concrete values into generated queries or explanations.
- Includes nested document structures and array fields.
- See the Safety section above for the full redaction policy.
- Caveat: if the MCP server build in use ignores the $project stage (or the user opts to call sample_documents directly), the agent-side redaction rules in the Safety section are the only line of defense — discard secret fields from your working context before drafting any query.
Additional samples (for understanding data patterns). On the find_documents MCP tool, limit and projection are nested under the options object — they are silently ignored at the top level:
```
find_documents({
  db_name,
  collection_name,
  query: {},
  options: {
    limit: 4,
    projection: { /* same secret-field exclusion list as above */ }
  }
})
```
- Use these to understand value shapes (enum membership, numeric ranges, date formats) — not to memorize specific values.
- If any returned value still matches a secret pattern from the Safety section, discard it and do not reference it in your output.
- Caveat: if the MCP server build ignores options.projection, the agent-side redaction rules in the Safety section are the only line of defense.
Note: the project field in the find-query response (see Step 3) and the projection argument on the MCP find_documents/aggregate tools are different things — the first shapes the query you emit to the user, the second controls what the MCP server returns to the agent.

2. Analyze Context and Validate Fields

Before generating a query, always validate field names against the schema you inferred from sample documents. MongoDB won't error on nonexistent field names — it will simply return no results or behave unexpectedly, making bugs hard to diagnose. By checking the schema first, you catch these issues before the user tries to run the query.

Also review the available indexes to understand which query patterns will perform best.

Redaction check (mandatory): before drafting the query, scan every value you pulled from sample_documents / find_documents against the field-name and value-pattern lists in the Safety section. Discard any matching values from your working context. The query you generate must contain only:

field names and types inferred from the schema, and
literals supplied by the user in their natural-language request, or placeholders like <value> when the user hasn't supplied one.

Never inline a sampled value as a filter literal, even if it "looks safe".

3. Choose Query Type: Find vs Aggregation

Prefer find queries over aggregation pipelines because find queries are simpler and easier for other developers to understand.

For Find Queries, generate responses with these fields:

filter — The query filter (required)
project — Field projection (optional)
sort — Sort specification (optional)
skip — Number of documents to skip (optional)
limit — Number of documents to return (optional)

Use Find Query when:

Simple filtering on one or more fields
Basic sorting and limiting

For Aggregation Pipelines, generate an array of stage objects.

Use Aggregation Pipeline when the request requires:

Grouping or aggregation functions (sum, count, average, etc.)
Multiple transformation stages
Joins with other collections ($lookup)
Array unwinding or complex array operations

4. Format Your Response

Pre-flight redaction check (mandatory): immediately before serializing the query, re-scan every literal in filter, $in arrays, regex patterns, and projection examples against the secret field-name and value-pattern lists in the Safety section. Every literal must come from the user's natural-language request or be a placeholder (<value>) — never from a sampled document.

Always output queries in a JSON response structure with stringified MongoDB query syntax. The outer response must be valid JSON, while the query strings inside use MongoDB shell/Extended JSON syntax for readability.

Find Query Response:

{
  "query": {
    "filter": "{ age: { $gte: 25 } }",
    "project": "{ name: 1, age: 1, _id: 0 }",
    "sort": "{ age: -1 }",
    "limit": "10"
  }
}

Aggregation Pipeline Response:

{
  "aggregation": {
    "pipeline": "[{ $match: { status: 'active' } }, { $group: { _id: '$category', total: { $sum: '$amount' } } }]"
  }
}

Note the stringified format:

Correct: "{ age: { $gte: 25 } }" (string)
Incorrect: { age: { $gte: 25 } } (object)

Azure DocumentDB Compatibility Notes

Azure DocumentDB has high compatibility with MongoDB wire protocol. Most MongoDB operators and aggregation stages work as expected. However, be aware of the following:

Fully Supported:

All standard query operators: $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $and, $or, $not, $nor, $exists, $type, $regex
Aggregation stages: $match, $group, $sort, $project, $limit, $skip, $unwind, $lookup, $addFields, $count, $facet
Index types: single field, compound, text, geospatial (2dsphere), wildcard
Array operators: $elemMatch, $size, $all

Check Documentation For:

Some advanced aggregation operators may have partial support — always test complex pipelines
Vector search capabilities (if using Azure DocumentDB vector search features)
Transactions — Azure DocumentDB supports multi-document transactions

For the authoritative list of supported features, refer to: https://learn.microsoft.com/azure/documentdb/compatibility

Best Practices

Query Quality

Generate correct queries — Build queries that match user requirements, then check index coverage:
- Generate the query to correctly satisfy all user requirements
- After generating, check if existing indexes can support it
- If no appropriate index exists, mention this in your response
- Never use $where because it prevents index usage
- Do not use $text without a text index
Avoid redundant operators — Never add operators that are already implied:
- Don't add $exists when you already have an equality/inequality check
- Don't add overlapping range conditions
Project only needed fields — Reduce data transfer with projections
- Add _id: 0 to the projection when _id field is not needed
Validate field names against the schema before using them
Use appropriate operators — Choose the right operator for the task:
- $eq, $ne, $gt, $gte, $lt, $lte for comparisons
- $in, $nin for matching against a list
- $and, $or, $not, $nor for logical operations
- $regex for text pattern matching (prefer left-anchored patterns like /^prefix/ when possible for index efficiency)
- $exists for field existence checks (prefer a: {$ne: null} to a: {$exists: true} to leverage indexes)
Optimize array field checks:
- To check if array is non-empty: use "arrayField.0": {$exists: true}
- For matching array elements with multiple conditions, use $elemMatch

Aggregation Pipeline Quality

Filter early — Use $match as early as possible
Project at the end — Use $project at the end to shape output
Limit when possible — Add $limit after $sort when appropriate
Use indexes — Ensure $match and $sort stages can use indexes
Optimize $lookup — Consider denormalization for frequently joined data

Error Prevention

Validate all field references against the schema
Quote field names correctly — Use dot notation for nested fields
Escape special characters in regex patterns
Check data types — Ensure field values match field types
Geospatial coordinates — MongoDB's GeoJSON format requires longitude first, then latitude ([longitude, latitude])
Never leak sampled values — Filter literals, $in arrays, regex patterns, and projection examples must come from the user's request, not from sampled documents. See the Safety section for the full policy.

Schema Analysis

When provided with sample documents, analyze:

Field types — String, Number, Boolean, Date, ObjectId, Array, Object
Field patterns — Required vs optional fields
Nested structures — Objects within objects, arrays of objects
Array elements — Homogeneous vs heterogeneous arrays
Special types — Dates, ObjectIds, Binary data, GeoJSON

Error Handling

If you cannot generate a query:

Explain why — Missing schema, ambiguous request, impossible query
Ask for clarification — Request more details
Suggest alternatives — Propose different approaches
Provide examples — Show similar queries that could work

Example Workflow

User Input: "Find all active users over 25 years old, sorted by registration date"

Your Process:

Check schema for fields: status, age, registrationDate or similar
Verify field types match the query requirements
Generate query based on user requirements
Check if available indexes can support the query
Suggest creating an index if no appropriate index exists

Generated Query:

{
  "query": {
    "filter": "{ status: 'active', age: { $gt: 25 } }",
    "sort": "{ registrationDate: -1 }"
  }
}

Size Limits

Keep requests under 5MB:

If sample documents are too large, use fewer samples (minimum 1)
Limit to 4 sample documents by default
For very large documents, project only essential fields when sampling

documentdb-natural-language-querying

Mehr aus diesem Repository

Mehr aus diesem Repository

DocumentDB Natural Language Querying

Safety: Handling Sensitive Data in Sampled Documents

Query Generation Process

1. Gather Context Using MCP Tools

2. Analyze Context and Validate Fields

3. Choose Query Type: Find vs Aggregation

4. Format Your Response

Azure DocumentDB Compatibility Notes

Best Practices

Query Quality

Aggregation Pipeline Quality

Error Prevention

Schema Analysis

Error Handling

Example Workflow

Size Limits

DocumentDB Natural Language Querying

Safety: Handling Sensitive Data in Sampled Documents

Query Generation Process

1. Gather Context Using MCP Tools

2. Analyze Context and Validate Fields

3. Choose Query Type: Find vs Aggregation

4. Format Your Response

Azure DocumentDB Compatibility Notes

Best Practices

Query Quality

Aggregation Pipeline Quality

Error Prevention

Schema Analysis

Error Handling

Example Workflow

Size Limits