تشغيل أي مهارة في Manus بنقرة واحدة

deepsearch

Deep research skill: iterative multi-hop investigation using SearXNG (meta-search) and Firecrawl (web exploration). Handles both quick web searches and structured comparative research. Use when researching information on the internet, debugging issues, finding solutions, conducting comparative studies, or answering questions requiring cross-domain context.

تشغيل في Manus

نظرة عامة

أمر التثبيت

npx skills add https://github.com/led8/.pi --skill deepsearch

انسخ والصق هذا الأمر في Claude Code لتثبيت المهارة

المصدر

led8/.pi

النجوم٠

التفرعات٠

آخر تحديث٣ يونيو ٢٠٢٦ في ١٩:٢٨

مستكشف الملفات

4 ملفات

SKILL.md

readonly

المزيد من هذا المستودع

نفس المستودع

todo

led8/.pi

Gestion de listes to-do en markdown via langage naturel. Commande /todo. Ajouter, cocher, décocher, modifier, supprimer, lister, changer la priorité des tâches perso et pro.

2026-06-030

brush-cutter

led8/.pi

Interrogez l'utilisateur, en français, jusqu'à ce que vous soyez sûr à 95 % de ce qu'il souhaite réellement, et non de ce qu'il pense devoir souhaiter, puis présentez le résultat en anglais. À utiliser lorsque l'utilisateur souhaite clarifier un projet, une fonctionnalité ou une idée vague avant de commencer le travail.

2026-06-030

note

led8/.pi

Prise de notes datées en markdown via langage naturel. Commande /note. Ajouter, lister, relire et transformer des notes en tâches. Notes perso et pro.

2026-06-030

terraform-aws

led8/.pi

Build, review, and troubleshoot Terraform configurations using the AWS provider. Use for provider setup, authentication patterns, tagging strategy, multi-region usage, import/migration, and AWS-specific safety issues.

2026-06-020

edito

led8/.pi

Generate a daily French editorial synthesis of tech/AI news as a self-contained vintage newspaper HTML page. Uses deepsearch (SearXNG + Firecrawl) to scan curated sources, LiteLLM for AI-generated illustrations, and mermaid for data visualizations. Triggers on /skill:edito or when the user asks for a tech news summary, daily briefing, news roundup, or editorial synthesis.

2026-06-010

memory

led8/.pi

Use this skill as the default memory system for non-trivial work when task continuity or durable repo knowledge matters.

2026-06-010

المصدر

led8

led8/.pi

فتح مستودع GitHub عرض مستودعات المنشئ

أمر التثبيت

تنزيل

تشغيل في Manus

مفيد لـSOC

مطوّرو البرمجياتمهن الحاسوب والرياضيات15-1252L4

name	deepsearch
description	Deep research skill: iterative multi-hop investigation using SearXNG (meta-search) and Firecrawl (web exploration). Handles both quick web searches and structured comparative research. Use when researching information on the internet, debugging issues, finding solutions, conducting comparative studies, or answering questions requiring cross-domain context.

DeepSearch — Unified Research Skill

Iterative multi-hop research combining meta-search discovery (SearXNG) and deep web exploration (Firecrawl). Replaces separate web-search, research, research-deep, and research-report skills.

Prerequisites — Auto-start

Before any research, always ensure services are up. Run this check at the start of every deepsearch invocation:

# Health check + auto-start if needed
curl -sf "http://localhost:8888/search?q=test&format=json" > /dev/null 2>&1 || \
  (cd ~/code/led8/ai/spark/agent-deepsearch && make up-searxng && sleep 5)

curl -sf -X POST "http://localhost:3002/v1/scrape" -H "Content-Type: application/json" -d '{"url":"https://example.com"}' > /dev/null 2>&1 || \
  (cd ~/code/led8/ai/spark/agent-deepsearch && make up-firecrawl && sleep 20)

This is not optional — run it silently every time, don't ask the user.

Modes

Default mode: Iterative research

/deepsearch <query>

Multi-hop investigation with confidence-based iteration. For debugging, technical questions, finding solutions, understanding topics.

Outline mode: Structured comparative research

/deepsearch --outline <topic>

Generates research framework (items + fields), then deep-researches each item systematically. For technology selection, benchmarks, comparative studies.

Core Algorithm

REPEAT until confidence > HIGH or max_iterations reached:
  1. Memory check (agent-memory)
  2. Decompose query into sub-questions
  3. SearXNG search → discover URLs + snippets
  4. Firecrawl scrape → read key pages, follow interesting links
  5. Extract + rank results by relevance
  6. Detect pivots (findings contradicting assumptions)
  7. Build synthesis → evaluate confidence
  8. IF confidence < threshold → refine queries → iterate
  9. Store key findings in memory

Tool Reference

SearXNG — Discovery

Basic search:

curl -s "http://localhost:8888/search?q=YOUR+QUERY&format=json" | jq '.results[:10]'

With filters:

# Time range: day, week, month, year
curl -s "http://localhost:8888/search?q=QUERY&format=json&time_range=month"

# Language
curl -s "http://localhost:8888/search?q=QUERY&format=json&language=en"

# Categories: general, science, it, news, files, images
curl -s "http://localhost:8888/search?q=QUERY&format=json&categories=it"

# Specific engines
curl -s "http://localhost:8888/search?q=QUERY&format=json&engines=github,stackoverflow"

# Pagination
curl -s "http://localhost:8888/search?q=QUERY&format=json&pageno=2"

Response structure:

{
  "results": [
    {"title": "...", "url": "...", "content": "...", "engine": "google", "score": 0.95}
  ],
  "suggestions": ["related query 1", "related query 2"],
  "infoboxes": [{"infobox": "...", "content": "...", "urls": [...]}],
  "number_of_results": 1234
}

Engine routing by research type:

Type	Engines param
General web	`google,bing,duckduckgo,brave`
IT/Dev	`github,stackoverflow,dockerhub`
Academic	`arxiv,semantic+scholar,google+scholar,pubmed`
Forums/Community	`reddit`
News	`google+news`
Knowledge	`wikipedia,wikidata`

Firecrawl — Exploration

Scrape a single page (markdown output):

curl -s -X POST "http://localhost:3002/v1/scrape" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "formats": ["markdown"]}' | jq '.data.markdown'

Scrape with only main content (no nav/footer):

curl -s -X POST "http://localhost:3002/v1/scrape" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "formats": ["markdown"], "onlyMainContent": true}'

Extract links from a page:

curl -s -X POST "http://localhost:3002/v1/scrape" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "formats": ["links"]}' | jq '.data.links'

Crawl a site (follow links recursively):

# Start crawl (async)
CRAWL_ID=$(curl -s -X POST "http://localhost:3002/v1/crawl" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "limit": 10, "maxDepth": 2}' | jq -r '.id')

# Check status
curl -s "http://localhost:3002/v1/crawl/$CRAWL_ID" | jq '.status, .completed, .total'

# Get results when done
curl -s "http://localhost:3002/v1/crawl/$CRAWL_ID" | jq '.data[].markdown'

Map a site (discover all URLs without scraping):

curl -s -X POST "http://localhost:3002/v1/map" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}' | jq '.links'

Resilient Scraping (403 / cookie walls)

Why this matters: this self-hosted Firecrawl only has two real engines — fetch (plain HTTP) and playwright (headless browser). The cloud-only anti-bot engines (fire-engine, chrome-cdp, tlsclient, stealth proxy) and actions (click-to-dismiss cookie banners) are NOT available. So you cannot bypass aggressive bot protection by clicking. Instead, use the tiered strategy below and fall back gracefully — never loop on a blocked URL.

Failure signatures — detect and react

Signature	Meaning	Action
`"success": false` + `"All scraping engines failed"`	Anti-bot blocked all engines (often a 403)	Go to next tier
`statusCode: 403` (or 401/429)	Forbidden / rate-limited / auth wall	Go to next tier
`markdown` is near-empty or only a cookie/consent banner	Cookie-consent wall returned instead of content	Go to next tier
SSL/TLS error message	Bad certificate	Retry with `skipTlsVerification: true`

Hard rule: max 2 retries per URL. If still blocked, drop it and pick a different source (source diversity beats fighting one wall).

Tier 1 — basic (default)

curl -s -X POST "http://localhost:3002/v1/scrape" \
  -H "Content-Type: application/json" \
  -d '{"url": "URL", "formats": ["markdown"], "onlyMainContent": true}' | jq '.data.markdown'

Tier 2 — force browser render (on 403 / thin content / cookie wall)

Forces the Playwright engine (waitFor is only supported there locally), waits for JS to render, and sends a realistic desktop User-Agent:

curl -s -X POST "http://localhost:3002/v1/scrape" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "URL",
    "formats": ["markdown"],
    "onlyMainContent": true,
    "waitFor": 3000,
    "skipTlsVerification": true,
    "removeBase64Images": true,
    "headers": {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"}
  }' | jq '.data.markdown'

Tier 3 — external fallbacks (still blocked)

Try in order, then give up on this URL:

# 1. Reader proxy (renders + strips boilerplate, bypasses many walls)
curl -s "https://r.jina.ai/URL"

# 2. Archived snapshot via Firecrawl
curl -s -X POST "http://localhost:3002/v1/scrape" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://web.archive.org/web/2/URL", "formats": ["markdown"], "onlyMainContent": true}' | jq '.data.markdown'

3. If all fail: use the SearXNG snippet (`.results[].content`) as the source

and pick a DIFFERENT URL covering the same sub-question.

Optional infra lever: routing the Playwright service through a PROXY_SERVER (see agent-deepsearch docs/scraping-resilience.md) is the single biggest fix for IP-based 403s.

Default Mode Workflow

1. Memory Check (always first)

agent-memory memory get-context --query "<topic>" --include-long-term 2>/dev/null

High-quality results → narrow scope, skip known parts
No results → full research

2. Query Decomposition

Parse the research goal into atomic sub-questions:

Query: "Best approach for rate limiting in FastAPI"
→ Sub-questions:
   1. What built-in rate limiting does FastAPI/Starlette offer?
   2. What third-party libraries exist?
   3. What are Redis-based vs in-memory tradeoffs?
   4. What do production deployments recommend?

Rules:

3-7 sub-questions per query
Each should be answerable independently
Identify dependencies (must be sequential) vs. independent (can be parallel)

3. Search & Explore (iterative)

For each sub-question:

Step 3a — Discover (SearXNG):

curl -s "http://localhost:8888/search?q=SUBQUESTION&format=json&engines=RELEVANT_ENGINES" | jq '.results[:10] | .[] | {title, url, content}'

Step 3b — Explore (Firecrawl): Pick the 2-3 most promising URLs from results:

curl -s -X POST "http://localhost:3002/v1/scrape" \
  -H "Content-Type: application/json" \
  -d '{"url": "PROMISING_URL", "formats": ["markdown"], "onlyMainContent": true}' | jq '.data.markdown'

If a scrape returns 403, near-empty content, or a cookie wall, escalate using the tiers in Resilient Scraping (Tier 2 browser render → Tier 3 fallbacks → different source). Never retry the same URL more than twice.

Step 3c — Follow links: If a page references something relevant, explore it:

# Get links from current page
curl -s -X POST "http://localhost:3002/v1/scrape" \
  -H "Content-Type: application/json" \
  -d '{"url": "CURRENT_PAGE", "formats": ["links"]}' | jq '.data.links[] | select(contains("relevant-keyword"))'

# Scrape the interesting linked page
curl -s -X POST "http://localhost:3002/v1/scrape" \
  -H "Content-Type: application/json" \
  -d '{"url": "LINKED_PAGE", "formats": ["markdown"], "onlyMainContent": true}'

4. Pivot Detection

Signal	Action
Info contradicts assumption	Change search direction
Resource deprecated/removed	Search for alternatives + used/archive
Feature not supported	Compare alternatives immediately
Single-source claim	Seek corroboration

5. Confidence Evaluation

Level	Criteria
HIGH	All sub-questions answered, multiple sources agree, no gaps
MEDIUM	Most answered, some gaps, single-source claims
LOW	Key questions unanswered, conflicting info, major gaps

If LOW or MEDIUM: refine queries, add new sub-questions, iterate. Maximum 3 iterations before presenting best available answer.

6. Memory Storage

Store non-obvious findings:

agent-memory memory remember --kind fact --subject "TOPIC" --predicate "KEY_FINDING" --object "DETAIL" --confidence 0.9 2>/dev/null

Store only: gotchas, decisions, constraints, non-obvious facts. Don't store: obvious facts, session logs, temporary info.

Outline Mode Workflow (`--outline`)

Step 1: Generate Framework

Based on topic + model knowledge, produce:

Items list: research objects to investigate
Fields framework: dimensions to compare

Present to user for confirmation (add/remove items/fields).

Step 2: Web Supplement

Search SearXNG for missing items:

curl -s "http://localhost:8888/search?q=TOPIC+comparison+2024+2025&format=json&time_range=year" | jq '.results[:15]'

Supplement items and fields based on findings.

Step 3: Generate Outline Files

Save to ./{topic_slug}/:

outline.yaml:

topic: "Research topic"
items:
  - name: "Item 1"
    category: "Category"
    description: "Brief description"
execution:
  batch_size: 5
  output_dir: ./results

fields.yaml:

field_categories:
  - category: "basic_info"
    fields:
      - name: "field_name"
        description: "What this field captures"
        detail_level: "moderate"
        required: true

Step 4: Deep Research (per item)

For each item, run the full iterative flow (steps 1-6 of default mode) targeting that specific item + all fields.

Output structured JSON per item to {output_dir}/{item_slug}.json.

Validate:

python ~/.pi/agent/skills/deepsearch/scripts/validate_json.py -f fields.yaml -j results/item.json

Batch checkpoint every batch_size items — pause and ask user for approval.

Step 5: Report

After all items complete, synthesize into a markdown report covering all fields, all items, with comparison tables where applicable.

Output Format

Default mode output:

## Summary
[2-3 sentences: what was found, recommended path forward]

## Findings

### [Sub-question 1]
[Answer with evidence]
- Source: [url]

### [Sub-question 2]
[Answer with evidence]
- Source: [url]

## Confidence: [HIGH/MEDIUM/LOW]
[Why this level]

## Gaps
- [Unanswered questions, if any]

## Sources
1. [title](url) — [engine, relevance note]
2. [title](url) — [engine, relevance note]

Outline mode output:

Per-item JSON + final markdown report with comparison tables.

Key Principles

Memory first — always check before searching
Discover then explore — SearXNG finds URLs, Firecrawl reads them
Follow the links — interesting pages reference other interesting pages
Iterate on confidence — don't stop at surface-level results
Pivot on contradictions — adapt strategy when assumptions break
Source diversity — multiple weak sources > single strong source
Rank semantically — relevance > authority > freshness > depth
Store durably — non-obvious findings go to memory for future use

deepsearch

المزيد من هذا المستودع

المزيد من هذا المستودع

DeepSearch — Unified Research Skill

Prerequisites — Auto-start

Modes

Default mode: Iterative research

Outline mode: Structured comparative research

Core Algorithm

Tool Reference

SearXNG — Discovery

Firecrawl — Exploration

Resilient Scraping (403 / cookie walls)

Failure signatures — detect and react

Tier 1 — basic (default)

Tier 2 — force browser render (on 403 / thin content / cookie wall)

Tier 3 — external fallbacks (still blocked)

3. If all fail: use the SearXNG snippet (.results[].content) as the source

and pick a DIFFERENT URL covering the same sub-question.

Default Mode Workflow

1. Memory Check (always first)

2. Query Decomposition

3. Search & Explore (iterative)

4. Pivot Detection

5. Confidence Evaluation

6. Memory Storage

Outline Mode Workflow (--outline)

Step 1: Generate Framework

Step 2: Web Supplement

Step 3: Generate Outline Files

Step 4: Deep Research (per item)

Step 5: Report

Output Format

Default mode output:

Outline mode output:

Key Principles

DeepSearch — Unified Research Skill

Prerequisites — Auto-start

Modes

Default mode: Iterative research

Outline mode: Structured comparative research

Core Algorithm

Tool Reference

SearXNG — Discovery

Firecrawl — Exploration

Resilient Scraping (403 / cookie walls)

Failure signatures — detect and react

Tier 1 — basic (default)

Tier 2 — force browser render (on 403 / thin content / cookie wall)

Tier 3 — external fallbacks (still blocked)

3. If all fail: use the SearXNG snippet (.results[].content) as the source

and pick a DIFFERENT URL covering the same sub-question.

Default Mode Workflow

1. Memory Check (always first)

2. Query Decomposition

3. Search & Explore (iterative)

4. Pivot Detection

5. Confidence Evaluation

6. Memory Storage

Outline Mode Workflow (--outline)

Step 1: Generate Framework

Step 2: Web Supplement

Step 3: Generate Outline Files

Step 4: Deep Research (per item)

Step 5: Report

Output Format

Default mode output:

Outline mode output:

Key Principles

3. If all fail: use the SearXNG snippet (`.results[].content`) as the source

Outline Mode Workflow (`--outline`)

3. If all fail: use the SearXNG snippet (`.results[].content`) as the source

Outline Mode Workflow (`--outline`)