Run any Skill in Manus with one click

nlweb-setup

Stars32

Forks13

UpdatedMay 13, 2026 at 04:49

Bootstrap a local NLWeb development environment from scratch — clone the repo, configure .env, install Python deps via `nlweb init-python`, run `nlweb init` for interactive LLM/retrieval selection, load sample Schema.org data, and verify with `nlweb check`. Use when starting a new NLWeb deployment from zero.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

OrcaQubits

OrcaQubits/agentic-commerce-skills-plugins

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Network and Computer Systems AdministratorsComputer and Mathematical Occupations·SOC 15-1244

SKILL.md

readonly

name	nlweb-setup
description	Bootstrap a local NLWeb development environment from scratch — clone the repo, configure .env, install Python deps via `nlweb init-python`, run `nlweb init` for interactive LLM/retrieval selection, load sample Schema.org data, and verify with `nlweb check`. Use when starting a new NLWeb deployment from zero.

NLWeb Setup

Before writing code

Fetch live docs first:

Fetch https://github.com/nlweb-ai/NLWeb (README) for the current minimum Python version and required deps.
Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/nlweb-hello-world.md for the canonical hello-world flow.
Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/nlweb-cli.md for current nlweb CLI flags.
Web-search site:github.com/nlweb-ai/NLWeb docs/release_notes and read the most recent dated release note — config keys and required env vars change between releases.
Identify the default write_endpoint and verify which backends are enabled by default in config/config_retrieval.yaml on main.

Conceptual Architecture

What "setup" produces

A working NLWeb dev environment has four parts:

Cloned repo + Python virtualenv with requirements installed.
.env file with provider credentials (OpenAI/Azure OpenAI key + retrieval backend secrets).
Sample data ingested into the local vector store (Qdrant local by default).
A running aiohttp server on :8000 with /ask, /mcp, /sites reachable.

Three Default-Enabled Backends — Watch Out

NLWeb ships with three retrieval backends enabled by default in config_retrieval.yaml:

qdrant_local (file-backed, fine for dev)
nlweb_west (Azure AI Search — requires Azure credentials)
shopify_mcp (queries Shopify's MCP endpoint, requires network)

For most local-dev cases, disable the latter two by setting enabled: false so you don't get connection errors at startup. The write_endpoint should point to qdrant_local for dev.

Setup Decision Checklist

LLM provider — OpenAI, Azure OpenAI (default), Anthropic, Gemini, Ollama (offline), Snowflake Cortex?
Embedding provider — must match between ingest and query; default is text-embedding-3-small on Azure OpenAI.
Retrieval write endpoint — Qdrant local for dev, Azure AI Search / Snowflake Cortex / pgvector for prod.
Data source — Schema.org JSON-LD on the site, RSS/Atom feed, sitemap.xml, or CSV?
Mode — development (allows query-string config overrides) or production in config_webserver.yaml?
OAuth — anonymous-only, or login-gated (GitHub/Google/Microsoft/Facebook)?

Project Layout (after setup)

NLWeb/                                 # cloned repo
├── AskAgent/python/
│   ├── app-aiohttp.py                 # main entry
│   ├── core/, methods/, webserver/    # core code
│   ├── llm_providers/, embedding_providers/, retrieval_providers/
│   └── data_loading/
├── config/
│   ├── config_llm.yaml
│   ├── config_embedding.yaml
│   ├── config_retrieval.yaml
│   ├── config_nlweb.yaml
│   ├── config_webserver.yaml
│   ├── config_oauth.yaml
│   ├── config_storage.yaml
│   ├── config_tools.yaml
│   ├── site_types.xml
│   └── prompts.xml
├── data/db/                           # qdrant_local file store
├── .env                               # YOUR credentials (gitignored)
└── docs/, scripts/, demo/, tests/

Setup Sequence

git clone https://github.com/nlweb-ai/NLWeb && cd NLWeb
nlweb init-python (or manual python -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt)
nlweb init — interactive prompts walk through LLM + retrieval selection and write .env
Disable the unwanted default backends in config/config_retrieval.yaml (nlweb_west, shopify_mcp for local-only dev)
nlweb data-load <source> <site-name> — ingest sample content (use a small RSS feed for first run)
nlweb check — runs connectivity diagnostics; resolve any red flags
nlweb app — start the server, hit http://localhost:8000/
Test /ask?query=hello&site=<site-name>&streaming=false

.env Conventions

NLWeb expects credentials via env vars (never YAML). Common keys (verify live):

OPENAI_API_KEY, AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT
ANTHROPIC_API_KEY, GEMINI_API_KEY
AZURE_SEARCH_ENDPOINT, AZURE_SEARCH_API_KEY
QDRANT_API_KEY (only for remote Qdrant)
SNOWFLAKE_USER, SNOWFLAKE_PASSWORD, SNOWFLAKE_ACCOUNT
CLOUDFLARE_API_TOKEN, CLOUDFLARE_ACCOUNT_ID

Verification Targets

After setup, these should work:

curl http://localhost:8000/sites → JSON list including your loaded site
curl 'http://localhost:8000/ask?query=test&site=<your-site>&streaming=false' → JSON with results
curl -X POST http://localhost:8000/mcp -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' → ask, list_sites, optionally who

Common Setup Failures

nlweb check fails on Azure: usually AZURE_OPENAI_ENDPOINT missing trailing slash or wrong deployment name.
Embedding dim mismatch on retrieval: data was loaded with a different embedding provider than runtime config. Either re-ingest or change preferred_provider in config_embedding.yaml.
Server starts but /ask returns empty: site name in the query doesn't match the site value used during ingest, or the sites: allowlist in config_nlweb.yaml excludes it.
Slow first request: cold model loading + /who endpoint pinging nlwm.azurewebsites.net. Disable who_endpoint_enabled for offline dev.

Always re-verify against the latest hello-world doc — the exact env-var names and CLI flags change.

NLWeb Setup

Before writing code

Fetch live docs first:

Fetch https://github.com/nlweb-ai/NLWeb (README) for the current minimum Python version and required deps.
Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/nlweb-hello-world.md for the canonical hello-world flow.
Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/nlweb-cli.md for current nlweb CLI flags.
Web-search site:github.com/nlweb-ai/NLWeb docs/release_notes and read the most recent dated release note — config keys and required env vars change between releases.
Identify the default write_endpoint and verify which backends are enabled by default in config/config_retrieval.yaml on main.

Conceptual Architecture

What "setup" produces

A working NLWeb dev environment has four parts:

Cloned repo + Python virtualenv with requirements installed.
.env file with provider credentials (OpenAI/Azure OpenAI key + retrieval backend secrets).
Sample data ingested into the local vector store (Qdrant local by default).
A running aiohttp server on :8000 with /ask, /mcp, /sites reachable.

Three Default-Enabled Backends — Watch Out

NLWeb ships with three retrieval backends enabled by default in config_retrieval.yaml:

qdrant_local (file-backed, fine for dev)
nlweb_west (Azure AI Search — requires Azure credentials)
shopify_mcp (queries Shopify's MCP endpoint, requires network)

For most local-dev cases, disable the latter two by setting enabled: false so you don't get connection errors at startup. The write_endpoint should point to qdrant_local for dev.

Setup Decision Checklist

LLM provider — OpenAI, Azure OpenAI (default), Anthropic, Gemini, Ollama (offline), Snowflake Cortex?
Embedding provider — must match between ingest and query; default is text-embedding-3-small on Azure OpenAI.
Retrieval write endpoint — Qdrant local for dev, Azure AI Search / Snowflake Cortex / pgvector for prod.
Data source — Schema.org JSON-LD on the site, RSS/Atom feed, sitemap.xml, or CSV?
Mode — development (allows query-string config overrides) or production in config_webserver.yaml?
OAuth — anonymous-only, or login-gated (GitHub/Google/Microsoft/Facebook)?

Project Layout (after setup)

NLWeb/                                 # cloned repo
├── AskAgent/python/
│   ├── app-aiohttp.py                 # main entry
│   ├── core/, methods/, webserver/    # core code
│   ├── llm_providers/, embedding_providers/, retrieval_providers/
│   └── data_loading/
├── config/
│   ├── config_llm.yaml
│   ├── config_embedding.yaml
│   ├── config_retrieval.yaml
│   ├── config_nlweb.yaml
│   ├── config_webserver.yaml
│   ├── config_oauth.yaml
│   ├── config_storage.yaml
│   ├── config_tools.yaml
│   ├── site_types.xml
│   └── prompts.xml
├── data/db/                           # qdrant_local file store
├── .env                               # YOUR credentials (gitignored)
└── docs/, scripts/, demo/, tests/

Setup Sequence

git clone https://github.com/nlweb-ai/NLWeb && cd NLWeb
nlweb init-python (or manual python -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt)
nlweb init — interactive prompts walk through LLM + retrieval selection and write .env
Disable the unwanted default backends in config/config_retrieval.yaml (nlweb_west, shopify_mcp for local-only dev)
nlweb data-load <source> <site-name> — ingest sample content (use a small RSS feed for first run)
nlweb check — runs connectivity diagnostics; resolve any red flags
nlweb app — start the server, hit http://localhost:8000/
Test /ask?query=hello&site=<site-name>&streaming=false

.env Conventions

NLWeb expects credentials via env vars (never YAML). Common keys (verify live):

OPENAI_API_KEY, AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT
ANTHROPIC_API_KEY, GEMINI_API_KEY
AZURE_SEARCH_ENDPOINT, AZURE_SEARCH_API_KEY
QDRANT_API_KEY (only for remote Qdrant)
SNOWFLAKE_USER, SNOWFLAKE_PASSWORD, SNOWFLAKE_ACCOUNT
CLOUDFLARE_API_TOKEN, CLOUDFLARE_ACCOUNT_ID

Verification Targets

After setup, these should work:

curl http://localhost:8000/sites → JSON list including your loaded site
curl 'http://localhost:8000/ask?query=test&site=<your-site>&streaming=false' → JSON with results
curl -X POST http://localhost:8000/mcp -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' → ask, list_sites, optionally who

Common Setup Failures

nlweb check fails on Azure: usually AZURE_OPENAI_ENDPOINT missing trailing slash or wrong deployment name.
Embedding dim mismatch on retrieval: data was loaded with a different embedding provider than runtime config. Either re-ingest or change preferred_provider in config_embedding.yaml.
Server starts but /ask returns empty: site name in the query doesn't match the site value used during ingest, or the sites: allowlist in config_nlweb.yaml excludes it.
Slow first request: cold model loading + /who endpoint pinging nlwm.azurewebsites.net. Disable who_endpoint_enabled for offline dev.

Always re-verify against the latest hello-world doc — the exact env-var names and CLI flags change.

nlweb-setup

NLWeb Setup

Before writing code

Conceptual Architecture

What "setup" produces

Three Default-Enabled Backends — Watch Out

Setup Decision Checklist

Project Layout (after setup)

Setup Sequence

.env Conventions

Verification Targets

Common Setup Failures

More from this repository

NLWeb Setup

Before writing code

Conceptual Architecture

What "setup" produces

Three Default-Enabled Backends — Watch Out

Setup Decision Checklist

Project Layout (after setup)

Setup Sequence

.env Conventions

Verification Targets

Common Setup Failures

More from this repository