원클릭으로 Manus에서 모든 스킬 실행

web-research

Correct protocol for crawling and researching external websites. Use this skill — always — before fetching any sub-page of a website you haven't visited before. Triggers on: "research this site", "check their website", "find out what X does", "scrape this competitor", "look at their features", "crawl this URL", "fetch pages from", "browse this site", "check what pages exist at", or any time you need information from multiple pages of the same domain. The rule is absolute: never guess or construct URL paths — always extract real hrefs from the homepage first.

Manus에서 실행

스타3

포크0

업데이트2026년 6월 2일 20:59

출처

alunadev

alunadev/ald-skills

GitHub 저장소 열기 Creator 저장소 보기

설치 명령

다운로드

Manus에서 실행

SKILL.md

readonly

name

web-research

description

Web Research — Correct Crawling Protocol

The Core Rule

Never construct or guess URL paths. Always extract real hrefs first.

This is not a preference — it is a hard rule with zero exceptions. URLs that look logical (/gestion-academica/, /facturacion-ventas/, /soluciones/) return 404 when the real paths are /gestion/, /facturacion/, /productos/. The site owns its URL structure, not you.

The Correct Protocol (always follow this order)

Step 1 — Fetch the root/homepage first

WebFetch(url: "https://domain.com/", prompt: "Extract EVERY href link on this page — all navigation links, footer links, button links, and any internal URL. I need the exact full URLs as they appear in the HTML so I can crawl them. List them all.")

This gives you the real URL structure of the site. Never skip this step.

Step 2 — Read the extracted URLs

From the homepage fetch, you now have a list of real paths. Use these — only these — for subsequent fetches.

Step 3 — Fetch the real pages in parallel

Now that you have confirmed URLs, fetch multiple pages simultaneously:

WebFetch(url: "https://domain.com/real-path-1/", prompt: "...")
WebFetch(url: "https://domain.com/real-path-2/", prompt: "...")
WebFetch(url: "https://domain.com/real-path-3/", prompt: "...")

Step 4 — If a page 404s

Check the homepage href list again. Do not guess an alternative path. If the information isn't at a known URL, it may not exist as a separate page — look for it on the pages that did return 200.

Red Flags — Stop immediately if you catch yourself doing any of these

Thought	What's actually happening
"The module is called X so the URL is probably /X/"	Guessing. Stop. Fetch homepage first.
"I'll try /about/, /about-us/, /quienes-somos/"	Still guessing — just with more attempts.
"It's a Spanish site so it must be /gestion-academica/"	Name ≠ URL. Stop.
"The nav said 'Soluciones' so the URL is /soluciones/"	Nav labels ≠ URL paths. Stop.
"Let me try a few variations until one works"	This is URL brute-forcing. Stop.

Why This Happens (root cause)

The instinct to construct URLs from topic names is a pattern-matching shortcut that works for well-known sites (GitHub, MDN, npm) where the URL structure is canonical and documented. For arbitrary company websites, CMS-built sites, or any site you haven't studied, this shortcut fails — often silently, returning 404s that waste tool calls and miss real content.

The fix is mechanical: homepage → extract hrefs → use only real URLs.

Bonus: When the homepage itself is content-sparse

Some sites load navigation via JavaScript (SPAs). If the homepage fetch returns very few links:

Try /sitemap.xml — many sites have this and it lists all pages
Try /sitemap_index.xml
Try the robots.txt: GET /robots.txt often references the sitemap URL
Look for a footer or nav section in what was returned — even partial HTML may have hrefs

WebFetch(url: "https://domain.com/sitemap.xml", prompt: "List all URLs in this sitemap.")
WebFetch(url: "https://domain.com/robots.txt", prompt: "Extract any sitemap URLs or disallowed paths listed here.")

Example — Correct execution

# WRONG ❌
WebFetch("https://alexiaeducaria.com/gestion-academica/")  → 404
WebFetch("https://alexiaeducaria.com/facturacion-ventas/") → 404
WebFetch("https://alexiaeducaria.com/que-es-alexia/")      → 404

# CORRECT ✅
Step 1: WebFetch("https://alexiaeducaria.com/", "Extract every href")
→ Returns real URLs: /gestion/, /aprendizaje/, /alex-ia/, /entorno-unico/, ...

Step 2: Fetch in parallel using ONLY those real URLs:
WebFetch("https://alexiaeducaria.com/gestion/")      → 200 ✅
WebFetch("https://alexiaeducaria.com/aprendizaje/")  → 200 ✅
WebFetch("https://alexiaeducaria.com/alex-ia/")      → 200 ✅

One-line summary

Homepage first. Extract hrefs. Visit only real URLs. Never guess.

이 저장소의 다른 Skills

같은 저장소

email-builder

alunadev/ald-skills

Builds complete bilingual email drafts from intent and audience. Use when the user asks to write, rewrite, localize, polish, or generate an email, campaign email, outreach message, customer update, follow-up, announcement, invite, sales email, internal note, or lifecycle email and wants clear subject lines, preheaders, body copy, CTAs, variants, or English and Spanish versions.

2026-06-043

maintaining-brand-identity

alunadev/ald-skills

Provides the single source of truth for brand guidelines, design tokens, technology choices, and voice/tone. Use when applying brand colors, defining typography scale, configuring CSS custom properties, or writing copy that must match a specific brand voice. Triggers on: design tokens, brand colors, color palette, typography, font family, voice & tone, CSS variables, brand consistency, design system initialization.

2026-06-023

skill-creator

alunadev/ald-skills

Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, update or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.

2026-06-023

taste-redesign

alunadev/ald-skills

Upgrades existing UIs to premium quality by auditing generic AI patterns and applying high-end design standards. Use when UI looks flat, generic, or "AI-slop". Triggers on: "improve the design", "looks generic", "not polished enough", "redesign this", "elevate the UI", "it looks boring", "make it better looking", "apply taste", "design review". Works with any CSS framework. Source: github.com/leonxlnx/taste-skill (redesign-skill variant).

2026-06-023

prompt-clarifier

alunadev/ald-skills

Enriches vague, low-detail prompts into structured, agent-optimized XML before execution. INVOKE IMMEDIATELY — before any tool use or file reads — when you detect any of these signals: prompt under 10 words with no file path or error message; vague action verbs with no object ("fix the bug", "make it better", "clean this up", "refactor this", "optimize performance", "improve the UI", "add authentication", "add payments", "add notifications", "build the feature"); CLARIFIER_ADVISORY in your context window; user says "clarify", "help me describe this", "enrich this prompt", "structure my request". Also triggers on: "make this work", "it's broken", "it looks bad", "add X" with no further detail, "implement Y" with no constraints. Do NOT trigger on: prompts ending with ?, prompts containing error messages or stack traces, prompts with specific file paths, prompts already containing acceptance criteria or success metrics.

2026-03-183

brainstorming

alunadev/ald-skills

Expert Socratic discovery skill for exploring ideas, architecture decisions, and technical design before writing any code. Use this skill — proactively and always first — when requirements are vague, when the user is debating between technical approaches, when the problem is unclear, or when a design decision could have significant architectural consequences. Triggers on: "should we use X or Y", "help me think through", "I'm not sure how to approach this", "what's the best architecture for", "trade-offs between", "design before I code", "let's think through this", "I have an idea", "how should I structure", "help me design", "I want to build X but not sure how", "what approach would you recommend", "is this the right way to". Produces a validated design doc with 2-3 implementation options, trade-offs, and a recommendation. Always use before planning when the approach is not yet locked.

2026-03-183

name

web-research

description

Web Research — Correct Crawling Protocol

The Core Rule

Never construct or guess URL paths. Always extract real hrefs first.

The Correct Protocol (always follow this order)

Step 1 — Fetch the root/homepage first

WebFetch(url: "https://domain.com/", prompt: "Extract EVERY href link on this page — all navigation links, footer links, button links, and any internal URL. I need the exact full URLs as they appear in the HTML so I can crawl them. List them all.")

This gives you the real URL structure of the site. Never skip this step.

Step 2 — Read the extracted URLs

From the homepage fetch, you now have a list of real paths. Use these — only these — for subsequent fetches.

Step 3 — Fetch the real pages in parallel

Now that you have confirmed URLs, fetch multiple pages simultaneously:

WebFetch(url: "https://domain.com/real-path-1/", prompt: "...")
WebFetch(url: "https://domain.com/real-path-2/", prompt: "...")
WebFetch(url: "https://domain.com/real-path-3/", prompt: "...")

Step 4 — If a page 404s

Check the homepage href list again. Do not guess an alternative path. If the information isn't at a known URL, it may not exist as a separate page — look for it on the pages that did return 200.

Red Flags — Stop immediately if you catch yourself doing any of these

Thought	What's actually happening
"The module is called X so the URL is probably /X/"	Guessing. Stop. Fetch homepage first.
"I'll try /about/, /about-us/, /quienes-somos/"	Still guessing — just with more attempts.
"It's a Spanish site so it must be /gestion-academica/"	Name ≠ URL. Stop.
"The nav said 'Soluciones' so the URL is /soluciones/"	Nav labels ≠ URL paths. Stop.
"Let me try a few variations until one works"	This is URL brute-forcing. Stop.

Why This Happens (root cause)

The fix is mechanical: homepage → extract hrefs → use only real URLs.

Bonus: When the homepage itself is content-sparse

Some sites load navigation via JavaScript (SPAs). If the homepage fetch returns very few links:

Try /sitemap.xml — many sites have this and it lists all pages
Try /sitemap_index.xml
Try the robots.txt: GET /robots.txt often references the sitemap URL
Look for a footer or nav section in what was returned — even partial HTML may have hrefs

WebFetch(url: "https://domain.com/sitemap.xml", prompt: "List all URLs in this sitemap.")
WebFetch(url: "https://domain.com/robots.txt", prompt: "Extract any sitemap URLs or disallowed paths listed here.")

Example — Correct execution

# WRONG ❌
WebFetch("https://alexiaeducaria.com/gestion-academica/")  → 404
WebFetch("https://alexiaeducaria.com/facturacion-ventas/") → 404
WebFetch("https://alexiaeducaria.com/que-es-alexia/")      → 404

# CORRECT ✅
Step 1: WebFetch("https://alexiaeducaria.com/", "Extract every href")
→ Returns real URLs: /gestion/, /aprendizaje/, /alex-ia/, /entorno-unico/, ...

Step 2: Fetch in parallel using ONLY those real URLs:
WebFetch("https://alexiaeducaria.com/gestion/")      → 200 ✅
WebFetch("https://alexiaeducaria.com/aprendizaje/")  → 200 ✅
WebFetch("https://alexiaeducaria.com/alex-ia/")      → 200 ✅

One-line summary

Homepage first. Extract hrefs. Visit only real URLs. Never guess.