Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

github-research

Explore and analyze GitHub repositories related to a research topic. Reads deep-research output, discovers repos from multiple sources, deeply analyzes code, and produces integration blueprints.

Exécuter dans Manus

Aperçu

Explore and analyze GitHub repositories related to a research topic. Reads deep-research output, discovers repos from multiple sources, deeply analyzes code, and produces integration blueprints.

Commande d'installation

npx skills add https://github.com/lingzhi227/agent-research-skills --skill github-research

Copiez et collez cette commande dans Claude Code pour installer le skill

Source

lingzhi227/agent-research-skills

Étoiles90

Forks16

Mis à jour23 février 2026 à 21:26

Explorateur de fichiers

15 fichiers

SKILL.md

readonly

Plus depuis ce dépôt

même dépôt

deep-research

lingzhi227/agent-research-skills

Conduct systematic academic literature reviews in 6 phases, producing structured notes, a curated paper database, and a synthesized final report. Output is organized by phase for clarity.

2026-02-2790

citation-management

lingzhi227/agent-research-skills

Manage BibTeX citations for LaTeX papers. Harvest missing citations from a draft using Semantic Scholar, validate cite keys against .bib files, deduplicate entries, and format bibliography. Use when working with references, BibTeX, or citations.

2026-02-2090

data-analysis

lingzhi227/agent-research-skills

Generate statistical analysis code with 4-round review. Select appropriate statistical tests, interpret results, and produce analysis reports with p-values, effect sizes, and confidence intervals. Use when analyzing experimental data for a paper.

2026-02-2090

experiment-code

lingzhi227/agent-research-skills

Write ML experiment code with iterative improvement. Generate training/evaluation pipelines, debug errors, and optimize results through code reflection. Use when implementing experiments for a research paper.

2026-02-2090

figure-generation

lingzhi227/agent-research-skills

Generate publication-quality scientific figures using matplotlib/seaborn with a three-phase pipeline (query expansion, code generation with execution, VLM visual feedback). Handles bar charts, line plots, heatmaps, training curves, ablation plots, and more. Use when the user needs figures, plots, or visualizations for a paper.

2026-02-2090

idea-generation

lingzhi227/agent-research-skills

Generate novel research ideas with iterative refinement and novelty checking against literature. Score ideas on Interestingness, Feasibility, and Novelty. Use when brainstorming research directions or validating idea novelty.

2026-02-2090

Source

lingzhi227

lingzhi227/agent-research-skills

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Commande d'installation

Téléchargement

Exécuter dans Manus

Utile pourSOC

Développeurs de logicielsProfessions informatiques et mathématiques15-1252L4

name	github-research
description	Explore and analyze GitHub repositories related to a research topic. Reads deep-research output, discovers repos from multiple sources, deeply analyzes code, and produces integration blueprints.
argument-hint	["deep-research-output-dir"]

GitHub Research Skill

Trigger

Activate this skill when the user wants to:

"Find repos for [topic]", "GitHub research on [topic]"
"Analyze open-source code for [topic]"
"Find implementations of [paper/technique]"
"Which repos implement [algorithm]?"
Uses /github-research <deep-research-output-dir> slash command

Overview

This skill systematically discovers, evaluates, and deeply analyzes GitHub repositories related to a research topic. It reads deep-research output (paper database, phase reports, code references) and produces an actionable integration blueprint for reusing open-source code.

Installation: ~/.claude/skills/github-research/ — scripts, references, and this skill definition. Output: ./github-research-output/{slug}/ relative to the current working directory. Input: A deep-research output directory (containing paper_db.jsonl, phase reports, code_repos.md, etc.)

6-Phase Pipeline

Phase 1: Intake     → Extract refs, URLs, keywords from deep-research output
Phase 2: Discovery  → Multi-source broad GitHub search (50-200 repos)
Phase 3: Filtering  → Score & rank → select top 15-30 repos
Phase 4: Deep Dive  → Clone & deeply analyze top 8-15 repos (code reading)
Phase 5: Analysis   → Per-repo reports + cross-repo comparison
Phase 6: Blueprint  → Integration/reuse plan for research topic

Output Directory Structure

github-research-output/{slug}/
├── repo_db.jsonl                     # Master repo database
├── phase1_intake/
│   ├── extracted_refs.jsonl          # URLs, keywords, paper-repo links
│   └── intake_summary.md
├── phase2_discovery/
│   ├── search_results/               # Raw JSONL from each search
│   └── discovery_log.md
├── phase3_filtering/
│   ├── ranked_repos.jsonl            # Scored & ranked subset
│   └── filtering_report.md
├── phase4_deep_dive/
│   ├── repos/                        # Cloned repos (shallow)
│   ├── analyses/                     # Per-repo analysis .md files
│   └── deep_dive_summary.md
├── phase5_analysis/
│   ├── comparison_matrix.md          # Cross-repo comparison
│   ├── technique_map.md              # Paper concept → code mapping
│   └── analysis_report.md
└── phase6_blueprint/
    ├── integration_plan.md           # How to combine repos
    ├── reuse_catalog.md              # Reusable components catalog
    ├── final_report.md               # Complete compiled report
    └── blueprint_summary.md

Scripts Reference

All scripts are Python 3, stdlib-only, located in ~/.claude/skills/github-research/scripts/.

Script	Purpose	Key Flags
`extract_research_refs.py`	Parse deep-research output for GitHub URLs, paper refs, keywords	`--research-dir`, `--output`
`search_github.py`	Search GitHub repos via `gh api`	`--query`, `--language`, `--min-stars`, `--sort`, `--max-results`, `--topic`, `--output`
`search_github_code.py`	Search GitHub code for implementations	`--query`, `--language`, `--filename`, `--max-results`, `--output`
`search_paperswithcode.py`	Search Papers With Code for paper→repo mappings	`--paper-title`, `--arxiv-id`, `--query`, `--output`
`repo_db.py`	JSONL repo database management	subcommands: `merge`, `filter`, `score`, `search`, `tag`, `stats`, `export`, `rank`
`repo_metadata.py`	Fetch detailed metadata via `gh api`	`--repos`, `--input`, `--output`, `--delay`
`clone_repo.py`	Shallow-clone repos for analysis	`--repo`, `--output-dir`, `--depth`, `--branch`
`analyze_repo_structure.py`	Map file tree, key files, LOC stats	`--repo-dir`, `--output`
`extract_dependencies.py`	Extract and parse dependency files	`--repo-dir`, `--output`
`find_implementations.py`	Search cloned repo for specific code patterns	`--repo-dir`, `--patterns`, `--output`
`repo_readme_fetch.py`	Fetch README without cloning	`--repos`, `--input`, `--output`, `--max-chars`
`compare_repos.py`	Generate comparison matrix across repos	`--input`, `--output`
`compile_github_report.py`	Assemble final report from all phases	`--topic-dir`

Phase 1: Intake

Goal: Extract all relevant references, URLs, and keywords from the deep-research output.

Steps

Create output directory structure:

SLUG=$(echo "$TOPIC" | tr '[:upper:]' '[:lower:]' | tr ' ' '-' | tr -cd 'a-z0-9-')
mkdir -p github-research-output/$SLUG/{phase1_intake,phase2_discovery/search_results,phase3_filtering,phase4_deep_dive/{repos,analyses},phase5_analysis,phase6_blueprint}

Extract references from deep-research output:

python ~/.claude/skills/github-research/scripts/extract_research_refs.py \
  --research-dir <deep-research-output-dir> \
  --output github-research-output/$SLUG/phase1_intake/extracted_refs.jsonl

Review extracted refs: Read the generated JSONL. Note:
- GitHub URLs found directly in reports
- Paper titles and arxiv IDs (for Papers With Code lookup)
- Research keywords and themes (for GitHub search queries)
Write intake summary: Create phase1_intake/intake_summary.md with:
- Number of direct GitHub URLs found
- Number of papers with potential code links
- Key research themes extracted
- Planned search queries for Phase 2

Checkpoint

extracted_refs.jsonl exists with entries
intake_summary.md written
Search strategy documented

Phase 2: Discovery

Goal: Cast a wide net to find 50-200 candidate repos from multiple sources.

Steps

Search by direct URLs: Any GitHub URLs from Phase 1 → fetch metadata:

python ~/.claude/skills/github-research/scripts/repo_metadata.py \
  --repos owner1/name1 owner2/name2 ... \
  --output github-research-output/$SLUG/phase2_discovery/search_results/direct_urls.jsonl

Search Papers With Code: For each paper with an arxiv ID:

python ~/.claude/skills/github-research/scripts/search_paperswithcode.py \
  --arxiv-id 2401.12345 \
  --output github-research-output/$SLUG/phase2_discovery/search_results/pwc_2401.12345.jsonl

Search GitHub by keywords (3-8 queries based on research themes):

python ~/.claude/skills/github-research/scripts/search_github.py \
  --query "multi-agent LLM coordination" \
  --min-stars 10 --sort stars --max-results 50 \
  --output github-research-output/$SLUG/phase2_discovery/search_results/gh_query1.jsonl

Search GitHub code (for specific implementations):

python ~/.claude/skills/github-research/scripts/search_github_code.py \
  --query "class MultiAgentOrchestrator" \
  --language python --max-results 30 \
  --output github-research-output/$SLUG/phase2_discovery/search_results/code_query1.jsonl

Fetch READMEs for repos that lack descriptions:

python ~/.claude/skills/github-research/scripts/repo_readme_fetch.py \
  --input <repos.jsonl> \
  --output github-research-output/$SLUG/phase2_discovery/search_results/readmes.jsonl

Merge all results into master database:

python ~/.claude/skills/github-research/scripts/repo_db.py merge \
  --inputs github-research-output/$SLUG/phase2_discovery/search_results/*.jsonl \
  --output github-research-output/$SLUG/repo_db.jsonl

Write discovery log: Create phase2_discovery/discovery_log.md with search queries used, results per source, total unique repos found.

Rate Limits

GitHub search API: 30 requests/minute (authenticated)
Papers With Code API: No strict limit but be respectful (1 req/sec)
Add --delay 1.0 to batch operations when needed

Checkpoint

repo_db.jsonl populated with 50-200 repos
discovery_log.md with search details

Phase 3: Filtering

Goal: Score and rank repos, select top 15-30 for deeper analysis.

Steps

Enrich metadata for all repos:

python ~/.claude/skills/github-research/scripts/repo_metadata.py \
  --input github-research-output/$SLUG/repo_db.jsonl \
  --output github-research-output/$SLUG/repo_db.jsonl \
  --delay 0.5

Score repos (quality + activity scores):

python ~/.claude/skills/github-research/scripts/repo_db.py score \
  --input github-research-output/$SLUG/repo_db.jsonl \
  --output github-research-output/$SLUG/repo_db.jsonl

LLM relevance scoring: Read through the top ~50 repos (by quality_score) and assign relevance_score (0.0-1.0) based on:
- Direct relevance to research topic
- Implementation completeness
- Code quality signals (from README, description)
- Update the relevance scores:
```
python ~/.claude/skills/github-research/scripts/repo_db.py tag \
  --input github-research-output/$SLUG/repo_db.jsonl \
  --ids owner/name --tags "relevance:0.85"
```

Compute composite scores and rank:

python ~/.claude/skills/github-research/scripts/repo_db.py score \
  --input github-research-output/$SLUG/repo_db.jsonl \
  --output github-research-output/$SLUG/repo_db.jsonl
python ~/.claude/skills/github-research/scripts/repo_db.py rank \
  --input github-research-output/$SLUG/repo_db.jsonl \
  --output github-research-output/$SLUG/phase3_filtering/ranked_repos.jsonl \
  --by composite_score

Select top repos: Filter to top 15-30:

python ~/.claude/skills/github-research/scripts/repo_db.py filter \
  --input github-research-output/$SLUG/phase3_filtering/ranked_repos.jsonl \
  --output github-research-output/$SLUG/phase3_filtering/ranked_repos.jsonl \
  --max-repos 30 --not-archived

Write filtering report: Create phase3_filtering/filtering_report.md:
- Stats before/after filtering
- Score distributions
- Top 30 repos with scores and rationale

Scoring Formula

activity_score = sigmoid((days_since_push < 90) * 0.4 + has_recent_commits * 0.3 + open_issues_ratio * 0.3)
quality_score  = normalize(log(stars+1) * 0.3 + log(forks+1) * 0.2 + has_license * 0.15 + has_readme * 0.15 + not_archived * 0.2)
composite_score = relevance * 0.4 + quality * 0.35 + activity * 0.25

Checkpoint

ranked_repos.jsonl with 15-30 repos
filtering_report.md with scoring details

Phase 4: Deep Dive

Goal: Clone and deeply analyze the top 8-15 repos.

Steps

Select repos for deep dive: Take top 8-15 from ranked list.

Clone each repo (shallow):

python ~/.claude/skills/github-research/scripts/clone_repo.py \
  --repo owner/name \
  --output-dir github-research-output/$SLUG/phase4_deep_dive/repos/

Analyze structure for each cloned repo:

python ~/.claude/skills/github-research/scripts/analyze_repo_structure.py \
  --repo-dir github-research-output/$SLUG/phase4_deep_dive/repos/name/ \
  --output github-research-output/$SLUG/phase4_deep_dive/analyses/name_structure.json

Extract dependencies:

python ~/.claude/skills/github-research/scripts/extract_dependencies.py \
  --repo-dir github-research-output/$SLUG/phase4_deep_dive/repos/name/ \
  --output github-research-output/$SLUG/phase4_deep_dive/analyses/name_deps.json

Find implementations: Search for key algorithms/concepts from research:

python ~/.claude/skills/github-research/scripts/find_implementations.py \
  --repo-dir github-research-output/$SLUG/phase4_deep_dive/repos/name/ \
  --patterns "class Transformer" "def forward" "attention" \
  --output github-research-output/$SLUG/phase4_deep_dive/analyses/name_impls.jsonl

Deep code reading: For each repo, READ the key source files identified by structure analysis. Write a per-repo analysis in phase4_deep_dive/analyses/{name}_analysis.md:
- Architecture overview
- Key algorithms implemented
- Code quality assessment
- API / interface design
- Dependencies and requirements
- Strengths and limitations
- Reusability assessment (how easy to extract components)
Write deep dive summary: phase4_deep_dive/deep_dive_summary.md

IMPORTANT: Actually Read Code

Do NOT just summarize READMEs. You must:

Read the main source files (entry points, core modules)
Understand the actual implementation approach
Identify specific functions/classes that implement research concepts
Note code patterns, design decisions, and trade-offs

Checkpoint

Repos cloned in repos/
Per-repo analysis files in analyses/
deep_dive_summary.md written

Phase 5: Analysis

Goal: Cross-repo comparison and technique-to-code mapping.

Steps

Generate comparison matrix:

python ~/.claude/skills/github-research/scripts/compare_repos.py \
  --input github-research-output/$SLUG/phase4_deep_dive/analyses/ \
  --output github-research-output/$SLUG/phase5_analysis/comparison.json

Write comparison matrix: Create phase5_analysis/comparison_matrix.md:
- Table comparing repos across dimensions (language, LOC, stars, framework, license, tests)
- Dependency overlap analysis
- Strengths/weaknesses per repo
Write technique map: Create phase5_analysis/technique_map.md:
- Map each paper concept / research technique → specific repo + file + function
- Identify gaps (techniques with no implementation found)
- Note alternative implementations of the same concept
Write analysis report: phase5_analysis/analysis_report.md:
- Executive summary of findings
- Key insights from code analysis
- Recommendations for which repos to use for which purposes

Checkpoint

comparison_matrix.md with repo comparison table
technique_map.md mapping concepts to code
analysis_report.md with findings

Phase 6: Blueprint

Goal: Produce an actionable integration and reuse plan.

Steps

Write integration plan: phase6_blueprint/integration_plan.md:
- Recommended architecture for combining repos
- Step-by-step integration approach
- Dependency resolution strategy
- Potential conflicts and how to resolve them
Write reuse catalog: phase6_blueprint/reuse_catalog.md:
- For each reusable component: source repo, file path, function/class, what it does, how to extract it
- License compatibility matrix
- Effort estimates (easy/medium/hard to integrate)

Compile final report:

python ~/.claude/skills/github-research/scripts/compile_github_report.py \
  --topic-dir github-research-output/$SLUG/

Write blueprint summary: phase6_blueprint/blueprint_summary.md:
- One-page executive summary
- Top 5 repos and why
- Recommended next steps

Checkpoint

integration_plan.md complete
reuse_catalog.md with component catalog
final_report.md compiled
blueprint_summary.md as executive summary

Quality Conventions

Repos are ranked by composite score: relevance × 0.4 + quality × 0.35 + activity × 0.25
Deep dive requires reading actual code, not just READMEs
Integration blueprint must map paper concepts → specific code files/functions
Incremental saves: Each phase writes to disk immediately
Checkpoint recovery: Can resume from any phase by checking what outputs exist
All scripts are stdlib-only Python — no pip installs needed
gh CLI is required for GitHub API access (must be authenticated)
Deduplication by repo_id (owner/name) across all searches
Rate limit awareness: Respect GitHub search API limits (30 req/min)

Error Handling

If gh is not installed: warn user and provide installation instructions
If a repo is archived/deleted: skip gracefully, note in log
If clone fails: skip, note in log, continue with remaining repos
If Papers With Code API is down: skip, rely on GitHub search only
Always write partial progress to disk so work is not lost

References

See references/phase-guide.md for detailed phase execution guidance
Deep-research skill: ~/.claude/skills/deep-research/SKILL.md
Paper database pattern: ~/.claude/skills/deep-research/scripts/paper_db.py