Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

nlweb-dev-patterns

Estrellas32

Forks13

Actualizado13 de mayo de 2026, 04:49

NLWeb development patterns — the mixed-mode programming philosophy, FastTrack vs Analysis parallel paths, config file precedence and the `mode: development` override trap, in-stream NLWS headers vs HTTP headers, embedding/ingest determinism, debugging the LLM-call chain, neural scorer selection (NLWebScorer ModernBERT+GAM), and the A2A / AgentFinder / DataFinder / ModelRouter subsystems. Use when designing the internal architecture of an NLWeb deployment or solving cross-cutting concerns.

Instalación

Instalar con Codex o Claude Copia este prompt, pégalo en Codex, Claude u otro asistente, y deja que revise la página de la skill y la instale por ti.

Ejecutar en Manus

Fuente

OrcaQubits

OrcaQubits/agentic-commerce-skills-plugins

Abrir repositorio de GitHub Ver repositorios del creador

Descarga

Ejecutar en Manus

Ocupaciones relacionadasSOC

Basado en la clasificación ocupacional SOC

Desarrolladores de softwareOcupaciones informáticas y matemáticas·SOC 15-1252

SKILL.md

readonly

name

nlweb-dev-patterns

description

NLWeb Development Patterns

Before writing code

Fetch live docs:

Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/nlweb-systemmap.md for module layout.
Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/nlweb-control-flow.md for the request lifecycle.
Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/life-of-a-chat-query.md for an end-to-end trace.
Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/nlweb-configs-files.md for the config precedence rules.
Inspect core/baseHandler.py, core/router.py, core/retriever.py, core/ranking.py for current code paths.

Pattern: Mixed-Mode Programming

NLWeb's defining design choice. Rather than one big LLM call per query, NLWeb makes many small calls, each with a strict JSON output schema (<returnStruc>), feeding Python control flow.

Implications:

Cost and latency scale with the number of call sites, not the size of any one call.
Failures are localized — one bad call doesn't poison the response.
Steerability is high — you can tune any single prompt without touching the rest.
Debugging is harder — you must trace which of N calls misbehaved.

When designing extensions, follow the same pattern: small, schema-constrained LLM calls, deterministic Python glue.

Pattern: FastTrack vs Analysis (Parallel Paths)

NLWebHandler runs two paths in parallel:

Path	What it does	When it wins
FastTrack	Immediate vector search → stream early results	Common queries with obvious retrieval matches
Analysis	Decontextualize → detect type → route via `ToolSelector` to a specific handler	Ambiguous queries, complex flows (compare, recipe substitution)

Both paths stream into the same response. FastTrack results appear quickly; Analysis results appear when ready. The agent decides whether to render incrementally or wait.

Implications for handlers you write: if you write a slow, expensive handler, FastTrack will still beat you to first byte for simple queries. That's fine — it's the design.

Pattern: Config File Precedence

8 YAML config files in config/. Precedence (highest first):

Environment variables (always win)
Query-string params — but only when mode: development in config_webserver.yaml
YAML defaults

The mode: development override is a foot-gun in production. A query like ?write_endpoint=other_qdrant would silently switch the write target. Always set mode: production before deploying.

Pattern: "Headers" Are In-Stream Messages, Not HTTP Headers

NLWeb's "NLWS headers" mechanism is JSON message objects on the SSE channel, not HTTP response headers. Each carries a message_type:

message_type	Carries
`license`	Content license terms
`data_retention`	How long the agent may cache
`cache_policy`	Caching directives
`usage_terms`	Acceptable use
`rate_limits`	Calls/sec, daily quota
`data_freshness`	Last index time
`api_version`	NLWeb release identifier
`ui_component`	Optional rendering hint

Client parsing rule: buffer message objects until you see a results chunk or terminal marker. Don't assume the first chunk is data.

Pattern: Embedding/Ingest Determinism

The most common NLWeb bug: changing the embedding provider after ingest, getting empty or garbage results.

Rule: pick the embedding provider FIRST, configure the retrieval backend's vector dimension to match, ingest with that provider, query with that provider. Never change mid-stream without re-ingesting.

If you need to migrate embedding providers:

Choose a maintenance window
Configure the new provider as the preferred_provider
db_load.py --only-delete delete-site <site> for each site
Re-ingest with the new provider
Restart and verify

Pattern: Debugging the LLM Call Chain

When /ask returns a bad answer, the bug is in one of these call sites:

Call site	Symptom	Fix
Decontextualize	Query rewritten wrong; off-topic results	Pre-compute `decontextualized_query`, log the prompt's output
Type detection	Wrong handler invoked	Pass `itemType` explicitly, or check `site_types.xml`
Tool selection	Right type, wrong tool	Adjust tool descriptions; set `tool_selection_enabled: false` to bypass
Ranking	Top results are off	Check embedding alignment first; then try `scorer=nlwebscorer`
Summarize / generate	Final answer is poor	Improve Schema.org source data; bump model tier

Isolate by mode: mode=list skips summarize/generate. If list is bad, the issue is retrieval or ranking, not synthesis.

Pattern: NLWebScorer (Optional Neural Reranker)

The NLWebScorer/ subsystem provides a ModernBERT + GAM neural reranker as an alternative to LLM-based ranking. Activate via ?scorer=nlwebscorer on /ask. Configure checkpoints in config_*.yaml:

scorers:
  nlwebscorer:
    bert_checkpoint: ./checkpoints/modernbert.pt
    gam_checkpoint: ./checkpoints/gam.pt

Use cases:

Cost reduction (LLM-ranking is expensive at scale)
Latency reduction (BERT is faster than even small LLMs)
Reproducible ranking (no LLM stochasticity)

Tradeoff: it's domain-specific — you may need to fine-tune on your data. See docs/training-recipe-modernbert-gam.md.

Pattern: The Five Subsystems

NLWeb's repo isn't just one server. Five top-level folders are conceptually distinct:

Subsystem	Purpose	When relevant
`AskAgent/`	The core `/ask` and `/mcp` server	Always
`AgentFinder/`	Cross-site NLWeb discovery (federated `/who`)	Multi-site federations
`DataFinder/`	NL→SQL for enterprise sources (HubSpot, Dynamics, Jira)	Enterprise data, not vector-backed
`ModelRouter/`	Cost/quality routing across LLM providers	Cost optimization at scale
`NLWebScorer/`	Neural reranker (ModernBERT + GAM)	High-volume retrieval

Most deployments use only AskAgent. The rest are opt-in.

Pattern: A2A and MCP as Co-Equal Bindings

NLWeb supports three transport bindings in parallel:

Binding	Path	Audience
REST `/ask`	port 8000	Browsers, custom clients
MCP `/mcp`	port 8000	AI agents (Claude, Gemini, native MCP)
A2A	`webserver/a2a_wrapper.py`, route `a2a.py`	Google Agent-to-Agent protocol
AppSDK adapter	port 8100	ChatGPT specifically

All share the same backend pipeline. No data duplication. Choose by audience, not by feature.

Pattern: Conversation Memory Hooks

core/conversation_history.py persists exchanges per authenticated user. methods/conversation_search.py queries the persisted history.

Long-term memory (cross-conversation user preferences) is NOT shipped. Hook points to add it:

After response generation in NLWebHandler.respond() — extract durable facts, write to user profile
Before query in the same handler — load user profile, inject into the decontextualize prompt

This is intentional: NLWeb leaves opinionated personalization to the integrator.

Pattern: Idempotency and Retries

NLWeb doesn't define idempotency keys — /ask calls are read-side; replays are safe. /mcp follows JSON-RPC 2.0 semantics: include id in every request, retry with the same id if the connection drops mid-request (server may dedup if implemented).

For db_load.py, idempotency is upsert by URL. Re-running on the same source updates existing records rather than duplicating.

Pattern: Schema.org as the Common Currency

Every result carries a schema_object. Agents pattern-match on @type to render appropriately. Design rule: any new tool or handler you write should preserve the schema_object in its output. Don't strip it down to text — that defeats the whole point of NLWeb.

Pattern: Versioning

NLWeb releases as dated markdown files in docs/release_notes/, not semver tags. When pinning a deployment:

Pin the git commit, not a tag
Read the release_notes entries from your pinned commit to the latest before upgrading
The MCP wrapper docstring explicitly warns "Backwards compatibility is not guaranteed" — re-test agent integrations on every upgrade

Pattern: Don't Modify Core Files

Most extensibility goes via:

config/*.yaml and XML files (preferred)
New files in methods/ (custom handlers)
New providers in llm_providers/, embedding_providers/, retrieval_providers/
aiohttp middleware in webserver/middleware/

Avoid editing core/baseHandler.py, core/router.py, etc. — they change frequently and your fork rots.

Pattern: Disable Defaults Aggressively

The default config enables three retrieval backends (qdrant_local, nlweb_west, shopify_mcp), the federated /who endpoint, and mode: development. For any non-demo deployment, set these:

# config_webserver.yaml
mode: production

# config_nlweb.yaml
who_endpoint_enabled: false

# config_retrieval.yaml
endpoints:
  nlweb_west: { enabled: false }
  shopify_mcp: { enabled: false }

These defaults make sense for hello-world demos. They are anti-patterns for production.

Always cross-reference with the latest docs/release_notes/ and the live core/ modules — patterns evolve and the code is the source of truth.

Más de este repositorio

mismo repositorio

a2a-framework-integration

OrcaQubits/agentic-commerce-skills-plugins

Integrate A2A with agent frameworks — Google ADK, LangGraph, CrewAI, AutoGen, AWS Bedrock AgentCore, and Microsoft Azure AI Foundry. Use when connecting framework-built agents to the A2A protocol for inter-agent communication.

2026-05-1332

ap2-human-not-present-flow

OrcaQubits/agentic-commerce-skills-plugins

Implement the AP2 human-not-present transaction flow — autonomous agent shopping with Intent Mandate authorization, constraint enforcement, and merchant escalation. Use when building autonomous agent purchasing that works after the user has left.

2026-05-1332

nlweb-ask-endpoint

OrcaQubits/agentic-commerce-skills-plugins

Implement and consume the NLWeb /ask REST endpoint — request shape (GET/POST, query-string and v0.55 structured body), SSE streaming response, modes (list/summarize/generate), in-stream "message_type" headers, error envelopes, and client-side parsing. Use when building an NLWeb server route, calling /ask from a custom agent, or debugging /ask responses.

2026-05-1332

nlweb-auth-multitenancy

OrcaQubits/agentic-commerce-skills-plugins

Configure NLWeb authentication and multi-tenant deployments — OAuth providers (GitHub, Google, Microsoft, Facebook), session storage, the `sites:` allowlist in `config_nlweb.yaml`, conversation persistence per authenticated user, and per-tenant data isolation. Use when adding login to an NLWeb instance, hosting multiple customers on one deployment, or persisting conversation history.

2026-05-1332

nlweb-chatgpt-appsdk

OrcaQubits/agentic-commerce-skills-plugins

Integrate NLWeb with ChatGPT's Apps SDK — the Node.js MCP server in `openai-apps-sdk-integration/`, the `nlweb-list` tool, the React widget at `ui://widget/nlweb-list.html`, and the port-8100 AppSDK adapter that translates NLWeb's message list to OpenAI Apps SDK envelopes. Use when publishing an NLWeb site as a ChatGPT app or wiring NLWeb results into an Apps SDK widget.

2026-05-1332

nlweb-data-loading

OrcaQubits/agentic-commerce-skills-plugins

Ingest site content into NLWeb's vector store using `db_load.py` — supports RSS/Atom feeds, Schema.org JSON-LD, sitemap-driven URL lists, and CSV. Covers chunking, embedding computation, site partitioning, batch sizing, delete-and-reload, and per-backend write_endpoint targeting. Use when bootstrapping a site's index, refreshing content, or migrating between retrieval backends.

2026-05-1332

name

nlweb-dev-patterns

description

NLWeb Development Patterns

Before writing code

Fetch live docs:

Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/nlweb-systemmap.md for module layout.
Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/nlweb-control-flow.md for the request lifecycle.
Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/life-of-a-chat-query.md for an end-to-end trace.
Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/nlweb-configs-files.md for the config precedence rules.
Inspect core/baseHandler.py, core/router.py, core/retriever.py, core/ranking.py for current code paths.

Pattern: Mixed-Mode Programming

NLWeb's defining design choice. Rather than one big LLM call per query, NLWeb makes many small calls, each with a strict JSON output schema (<returnStruc>), feeding Python control flow.

Implications:

Cost and latency scale with the number of call sites, not the size of any one call.
Failures are localized — one bad call doesn't poison the response.
Steerability is high — you can tune any single prompt without touching the rest.
Debugging is harder — you must trace which of N calls misbehaved.

When designing extensions, follow the same pattern: small, schema-constrained LLM calls, deterministic Python glue.

Pattern: FastTrack vs Analysis (Parallel Paths)

NLWebHandler runs two paths in parallel:

Path	What it does	When it wins
FastTrack	Immediate vector search → stream early results	Common queries with obvious retrieval matches
Analysis	Decontextualize → detect type → route via `ToolSelector` to a specific handler	Ambiguous queries, complex flows (compare, recipe substitution)

Both paths stream into the same response. FastTrack results appear quickly; Analysis results appear when ready. The agent decides whether to render incrementally or wait.

Implications for handlers you write: if you write a slow, expensive handler, FastTrack will still beat you to first byte for simple queries. That's fine — it's the design.

Pattern: Config File Precedence

8 YAML config files in config/. Precedence (highest first):

Environment variables (always win)
Query-string params — but only when mode: development in config_webserver.yaml
YAML defaults

The mode: development override is a foot-gun in production. A query like ?write_endpoint=other_qdrant would silently switch the write target. Always set mode: production before deploying.

Pattern: "Headers" Are In-Stream Messages, Not HTTP Headers

NLWeb's "NLWS headers" mechanism is JSON message objects on the SSE channel, not HTTP response headers. Each carries a message_type:

message_type	Carries
`license`	Content license terms
`data_retention`	How long the agent may cache
`cache_policy`	Caching directives
`usage_terms`	Acceptable use
`rate_limits`	Calls/sec, daily quota
`data_freshness`	Last index time
`api_version`	NLWeb release identifier
`ui_component`	Optional rendering hint

Client parsing rule: buffer message objects until you see a results chunk or terminal marker. Don't assume the first chunk is data.

Pattern: Embedding/Ingest Determinism

The most common NLWeb bug: changing the embedding provider after ingest, getting empty or garbage results.

If you need to migrate embedding providers:

Choose a maintenance window
Configure the new provider as the preferred_provider
db_load.py --only-delete delete-site <site> for each site
Re-ingest with the new provider
Restart and verify

Pattern: Debugging the LLM Call Chain

When /ask returns a bad answer, the bug is in one of these call sites:

Call site	Symptom	Fix
Decontextualize	Query rewritten wrong; off-topic results	Pre-compute `decontextualized_query`, log the prompt's output
Type detection	Wrong handler invoked	Pass `itemType` explicitly, or check `site_types.xml`
Tool selection	Right type, wrong tool	Adjust tool descriptions; set `tool_selection_enabled: false` to bypass
Ranking	Top results are off	Check embedding alignment first; then try `scorer=nlwebscorer`
Summarize / generate	Final answer is poor	Improve Schema.org source data; bump model tier

Isolate by mode: mode=list skips summarize/generate. If list is bad, the issue is retrieval or ranking, not synthesis.

Pattern: NLWebScorer (Optional Neural Reranker)

scorers:
  nlwebscorer:
    bert_checkpoint: ./checkpoints/modernbert.pt
    gam_checkpoint: ./checkpoints/gam.pt

Use cases:

Cost reduction (LLM-ranking is expensive at scale)
Latency reduction (BERT is faster than even small LLMs)
Reproducible ranking (no LLM stochasticity)

Tradeoff: it's domain-specific — you may need to fine-tune on your data. See docs/training-recipe-modernbert-gam.md.

Pattern: The Five Subsystems

NLWeb's repo isn't just one server. Five top-level folders are conceptually distinct:

Subsystem	Purpose	When relevant
`AskAgent/`	The core `/ask` and `/mcp` server	Always
`AgentFinder/`	Cross-site NLWeb discovery (federated `/who`)	Multi-site federations
`DataFinder/`	NL→SQL for enterprise sources (HubSpot, Dynamics, Jira)	Enterprise data, not vector-backed
`ModelRouter/`	Cost/quality routing across LLM providers	Cost optimization at scale
`NLWebScorer/`	Neural reranker (ModernBERT + GAM)	High-volume retrieval

Most deployments use only AskAgent. The rest are opt-in.

Pattern: A2A and MCP as Co-Equal Bindings

NLWeb supports three transport bindings in parallel:

Binding	Path	Audience
REST `/ask`	port 8000	Browsers, custom clients
MCP `/mcp`	port 8000	AI agents (Claude, Gemini, native MCP)
A2A	`webserver/a2a_wrapper.py`, route `a2a.py`	Google Agent-to-Agent protocol
AppSDK adapter	port 8100	ChatGPT specifically

All share the same backend pipeline. No data duplication. Choose by audience, not by feature.

Pattern: Conversation Memory Hooks

core/conversation_history.py persists exchanges per authenticated user. methods/conversation_search.py queries the persisted history.

Long-term memory (cross-conversation user preferences) is NOT shipped. Hook points to add it:

After response generation in NLWebHandler.respond() — extract durable facts, write to user profile
Before query in the same handler — load user profile, inject into the decontextualize prompt

This is intentional: NLWeb leaves opinionated personalization to the integrator.

Pattern: Idempotency and Retries

For db_load.py, idempotency is upsert by URL. Re-running on the same source updates existing records rather than duplicating.

Pattern: Schema.org as the Common Currency

Pattern: Versioning

NLWeb releases as dated markdown files in docs/release_notes/, not semver tags. When pinning a deployment:

Pin the git commit, not a tag
Read the release_notes entries from your pinned commit to the latest before upgrading
The MCP wrapper docstring explicitly warns "Backwards compatibility is not guaranteed" — re-test agent integrations on every upgrade

Pattern: Don't Modify Core Files

Most extensibility goes via:

config/*.yaml and XML files (preferred)
New files in methods/ (custom handlers)
New providers in llm_providers/, embedding_providers/, retrieval_providers/
aiohttp middleware in webserver/middleware/

Avoid editing core/baseHandler.py, core/router.py, etc. — they change frequently and your fork rots.

Pattern: Disable Defaults Aggressively

# config_webserver.yaml
mode: production

# config_nlweb.yaml
who_endpoint_enabled: false

# config_retrieval.yaml
endpoints:
  nlweb_west: { enabled: false }
  shopify_mcp: { enabled: false }

These defaults make sense for hello-world demos. They are anti-patterns for production.

Always cross-reference with the latest docs/release_notes/ and the live core/ modules — patterns evolve and the code is the source of truth.