| name | geo-article-pipeline |
| description | 运行完整的 GEO 文章生成 pipeline:关键词输入(keyword-fetcher skill)-> 主题生成 -> 商品搜索 -> 文章生成 -> 文章审核 -> 执行报告。支持批量输入关键词,自动完成去重、电商可行性判断、商品相关性过滤、质量审核与重试,最终输出可追溯的 GEO 优化文章。Use when user mentions "运行pipeline", "批量生成文章", "关键词生成文章", "article pipeline", "run the pipeline", "generate articles from keywords". |
| version | 2.0.0 |
GEO Article Pipeline
Automated pipeline that transforms keyword lists into GEO-optimized articles through a 7-step process with quality auditing.
Pipeline Architecture
Step 1: keyword-fetcher → Keywords from ODPS
↓
Step 2: geo-blog-topic-generator → Topics per keyword
↓
Step 3: query-product-search → Relevant products per keyword
↓
Step 4: Merge & Filter → pipeline_config.json
↓
Step 5: geo-content-generator → Final articles (.md)
↓
Step 6: Execution Report → pipeline_report.json
↓
Step 7: article-quality-auditor → Quality audit with auto-retry
When to Use
Use this skill when:
- User provides a list of keywords and wants to generate articles at scale
- User wants to run an ODPS SQL query to fetch keywords and generate articles
- User has a CSV/Excel file with keywords and wants batch article generation
- User wants to reproduce the article pipeline with new keyword sets
- User asks to "run the pipeline", "批量生成文章", "关键词转文章"
Prerequisites
- QoderWork environment with MCP tools enabled
keyword-fetcher skill installed (for ODPS keyword input)
geo-blog-topic-generator skill installed
query-product-search skill installed
geo-content-generator skill installed
article-quality-auditor skill installed
odps-runner.py available at ~/.odps_runner.py or in PATH (for ODPS input mode)
- ODPS config at
~/.odps_config.json (for ODPS input mode)
Quick Start
Mode A: ODPS SQL Input (Recommended)
Run the pipeline with this ODPS SQL:
SELECT final_keywords, language, search_volume, keyword_difficulty
FROM ae_usergrowth.ae_seo_keywords_pool_wcp_finally
WHERE ds = '20260324' AND source_type = 'semrush'
AND (informational = 1 OR commercial = 1)
AND search_volume > 100 AND keyword_difficulty < 40
LIMIT 20
This invokes the keyword-fetcher skill internally to fetch and parse keywords.
Mode B: CSV/Excel Input
Run the pipeline with this CSV file: /path/to/keywords.csv
Expected CSV columns: keyword, language, search_volume (optional), keyword_difficulty (optional)
Mode C: Direct Keyword List
Run the pipeline with these keywords: ["keyword1", "keyword2", "keyword3"]
Default language: en, default target_country: US
Execution Steps
Step 1: Fetch Keywords (keyword-fetcher)
If ODPS SQL:
- Invoke
keyword-fetcher skill with the SQL query
- The skill executes the query, deduplicates, and validates keywords
- Output saved to
step1_keywords/keywords.json
If CSV/Excel:
- Read the file
- Extract columns: keyword, language, search_volume, keyword_difficulty
- Deduplicate by (keyword.lower(), language.lower())
- Set target_country: "IT" if language="it", else "US"
- Save to
step1_keywords/keywords.json
If direct keywords:
- Create keyword list with default language="en", target_country="US"
- Save to
step1_keywords/keywords.json
Output format (step1_keywords/keywords.json):
{
"keywords": [
{
"keyword": "light bulb holders",
"language": "en",
"target_country": "US",
"search_volume": 110,
"keyword_difficulty": 6
}
],
"source": "odps|file|direct",
"source_file": "path_or_sql"
}
Step 2: Generate Topics (geo-blog-topic-generator)
For each keyword in step1_keywords/keywords.json:
- Invoke
geo-blog-topic-generator skill with the keyword
- The skill returns topics with ecommerce_suitable flag
- Save each keyword's result to
step2_topics/{keyword_slug}.json
- Filter: Skip keywords where
ecommerce_suitable == false
Subagent strategy: Process keywords in parallel (max 5 concurrent subagents)
Output format (step2_topics/{keyword}.json):
{
"keyword": "light bulb holders",
"ecommerce_suitable": true,
"topics": [
{
"引流关键词": "light bulb holders",
"PAA问题": "What are the different types of light bulb holders?",
"文章主题": "灯座类型全面指南",
"文章标题": "Light Bulb Holders Complete Guide Types and Sizes 2026",
"PAA排序值": 1
}
]
}
Step 3: Search Products (query-product-search)
For each keyword that passed Step 2 filter:
- Invoke
query-product-search skill with the keyword
- Save results to
step3_products/{keyword_slug}.json
- Filter: Keep only products with
relevance_score >= 0.5
- Filter: Skip keywords with 0 relevant products
Subagent strategy: Process keywords in parallel (max 5 concurrent subagents)
Step 4: Merge & Select Articles
- Read all
step2_topics/*.json and step3_products/*.json
- For each keyword that has both topics AND relevant products:
- Take the top-ranked topic (PAA排序值=1) as primary article
- Optionally take 2nd/3rd ranked topics for additional articles
- Assign article_id:
article_001, article_002, ...
- Build
pipeline_config.json:
[
{
"article_id": "article_001",
"keyword": "light bulb holders",
"language": "en",
"target_country": "US",
"topic": { ... from step2 ... },
"products": [ ... from step3, relevance >= 0.5 ... ],
"source_keyword_rank": 1,
"search_volume": 110,
"keyword_difficulty": 6
}
]
- Print selection summary: "Selected N articles from M keywords"
Step 5: Generate Articles (geo-content-generator)
For each article in pipeline_config.json:
- Extract: article_id, keyword, language, target_country, topic, products
- Invoke
geo-content-generator skill with the article config
- Generate filename:
article_{ID}_{keyword_slug}_{topic_type}.md
- Types: guide, comparison, installation, safety, picks, fabric, etc.
- Save to
step4_articles/
- Log: word count, tables count, schema type
Subagent strategy: Process articles in parallel (max 4 concurrent subagents)
Step 6: Generate Execution Report
Run the Python report generator:
python scripts/geo_article_pipeline.py report \
--pipeline-dir /path/to/pipeline \
--output pipeline_report.json
Or generate manually by collecting stats:
- Count files in each step directory
- Calculate word counts for articles
- Build the full traceability mapping
Report output (pipeline_report.json):
{
"report_generated_at": "2026-04-07T23:00:00+00:00",
"pipeline_version": "2.0.0",
"summary": {
"total_input_keywords": 20,
"keywords_with_usable_topics": 18,
"total_topics_generated": 45,
"keywords_with_relevant_products": 15,
"total_relevant_products": 72,
"total_articles_generated": 10,
"total_article_words": 14500,
"avg_words_per_article": 1450,
"articles_passed_audit": 8,
"articles_failed_audit": 2,
"articles_after_retry": 1,
"traceability_enabled": true
},
"steps": {
"step1_keywords": { "status": "completed", "output_count": 20, ... },
"step2_topics": { "status": "completed", "output_topics": 45, ... },
"step3_products": { "status": "completed", "total_products_found": 72, ... },
"step4_articles": { "status": "completed", "total_articles": 10, ... },
"step7_audit": { "status": "completed", "passed": 8, "failed": 2, "retried": 1, ... }
},
"traceability": {
"article_mappings": [ ... from pipeline_config.json ... ]
}
}
Step 7: Quality Audit (article-quality-auditor)
For each article in step4_articles/:
- Invoke
article-quality-auditor skill with the article + corresponding product data from step3_products/
- The skill executes 12-item quality audit (A/B/C/D categories)
- WebSearch verification for EEAT claims
- Save audit report to
step7_audit/{article_id}_audit.json
Auto-Retry Logic:
- If audit result is
fail:
- Extract
feedback_for_retry.merged_instruction from audit report
- Re-invoke
geo-content-generator with the feedback injected into the prompt
- Save retried article as
step4_articles/{article_id}_retry_{N}.md
- Re-run audit on the retried article
- Repeat up to 3 times max
- If still failing after 3 retries → mark as
manual_review
- Add to
step7_audit/manual_review_queue.json
Subagent strategy: Process articles in parallel (max 3 concurrent audits)
Output format (step7_audit/{article_id}_audit.json):
{
"article_id": "article_001",
"audit_timestamp": "2026-04-10T12:00:00Z",
"retry_count": 0,
"result": "pass|fail|manual_review",
"overall_score": 85,
"categories": {
"A": [ { "item": "A1", "name": "EEAT 虚构检测", "pass": true, ... } ],
"B": [...],
"C": [...],
"D": [...]
},
"feedback_for_retry": {
"merged_instruction": "具体修改指示...",
"failed_items": ["A1", "D1"],
"priority": "high|medium|low"
}
}
Manual Review Queue (step7_audit/manual_review_queue.json):
{
"queue": [
{
"article_id": "article_003",
"reason": "连续 3 次重试 A1 失败:无法验证专家引言真实性",
"audit_history": [...],
"flagged_claims": [...]
}
]
}
Directory Structure
{output_dir}/
├── step1_keywords/
│ └── keywords.json # Deduplicated keyword list (from keyword-fetcher)
├── step2_topics/
│ ├── keyword_1.json # Topics for keyword 1
│ └── keyword_2.json # Topics for keyword 2
├── step3_products/
│ ├── keyword_1.json # Products for keyword 1
│ └── keyword_2.json # Products for keyword 2
├── step4_articles/
│ ├── article_001_slug_guide.md # Generated article 1
│ ├── article_002_slug_review.md # Generated article 2
│ └── article_003_slug_guide_retry_1.md # Retried article
├── step7_audit/
│ ├── article_001_audit.json # Audit report for article 1
│ ├── article_002_audit.json # Audit report for article 2
│ └── manual_review_queue.json # Articles needing manual review
├── pipeline_config.json # Merged article configuration
├── pipeline_report.json # Execution report with traceability
├── pipeline_schema.json # Pipeline schema reference
└── scripts/
└── geo_article_pipeline.py # Input parser & report generator
Configuration Parameters
| Parameter | Default | Description |
|---|
| max_articles | None | Maximum total articles to generate (None = generate all) |
| max_articles_per_keyword | 2 | Max articles per keyword (primary + secondary topics) |
| min_product_relevance | 0.5 | Minimum product relevance score to include |
| min_search_volume | 0 | Minimum search volume filter |
| max_keyword_difficulty | 100 | Maximum keyword difficulty filter |
| parallel_keywords | 5 | Max parallel subagents for keyword processing |
| parallel_articles | 4 | Max parallel subagents for article generation |
| parallel_audits | 3 | Max parallel subagents for audit processing |
| max_audit_retries | 3 | Max retry attempts for failed articles |
| skip_ecommerce_unsuitable | true | Skip keywords where ecommerce_suitable=false |
| skip_no_products | true | Skip keywords with 0 relevant products |
| enable_audit | true | Enable Step 7 quality auditing |
| auto_retry_on_fail | true | Automatically retry failed articles with feedback |
| output_dir | current dir | Root directory for pipeline outputs |
Troubleshooting
Issue: Step 2 returns ecommerce_suitable=false for all keywords
- The keywords may be non-purchaseable (movies, news, persons, abstract concepts)
- Try more product-oriented keywords (e.g., "best X for Y", "X review", "how to choose X")
Issue: Step 3 returns 0 products for a keyword
- The keyword may be too specific or not match AliExpress product catalog
- Try broadening the keyword or checking the rewritten query in the output
Issue: Article generation is slow
- Increase parallel_articles if resources allow (max recommended: 6)
- Each article takes ~30-60 seconds to generate; batch of 10 takes ~2-3 min with parallelization
Issue: Articles are too short
- Check that topic and product configs are properly populated
- Ensure geo-content-generator skill is using the full article template
Issue: Articles fail audit (Step 7)
- Check
step7_audit/{article_id}_audit.json for specific failed items
- A类失败:通常是 EEAT 虚构或合规声明问题,需补充真实引用
- B类失败:检查 PAA 问题回答完整性和商品亮点准确性
- 自动重试会将 feedback 注入生成 prompt,关注失败项是否修复
- 3 次重试后仍失败的文章进入
manual_review_queue.json
Issue: Audit WebSearch verification fails
- 确保 MCP WebSearch 工具可用
- 某些新兴产品/标准可能搜索结果少,标记为 unverified 而非直接 fail
- 人工审核队列会记录详细的 flagged_claims 供后续验证