| name | homepage-generator |
| description | Generate a fact-checked academic personal homepage from a CV, optionally augmented by an existing manual homepage and an assets directory. Produces editable structured source files (profile.yml + publications.bib + bio.md + news.md) and a single-file HTML page. Uses Codex MCP for independent factual review against DBLP / arXiv. Optionally uses Gemini multimodal for screenshot critique when available. Use when the user says '做个学术主页', '从CV生成主页', 'aris-homepage', 'generate academic homepage from CV', 'PhD homepage', 'GitHub Pages personal site', or wants a fact-checked academic site. |
| argument-hint | init --from-cv <cv.docx|cv.pdf|cv.txt> [--from-repos owner/repo,...] [--include-private] [--manual-homepage <url>] [--assets-dir <path>] [--out <dir>] [--force|--merge] | finalize | render --persona theory-minimal [--out <html>] [--override-all] [--no-audit] [--offline] | check [--strict] | doctor |
| allowed-tools | Bash(*), Read, Write, Edit, WebFetch, mcp__codex__codex |
/homepage-generator — fact-checked academic homepage from CV
The only personal-site generator that fact-checks your CV before publishing.
Cross-model adversarial review: the LLM that drafts your homepage never grades it — a fresh Codex thread audits every claim against DBLP / arXiv before the HTML is signed off.
When to use
Generate a single-file HTML academic homepage. Optimized for PhD candidates, postdocs, and early-career researchers with public publications. v1 ships the theory-minimal persona (text-heavy academic page in the Zhxie / Avicenna lineage); active-researcher (paper thumbnails + news ticker variant) is planned.
Use when the user says 做个学术主页 · from CV generate homepage · aris-homepage · PhD personal site · GitHub Pages homepage.
Do NOT use for: portfolio sites needing image galleries; newsletter-funnel sites needing audience metrics; pure blog sites (use Jekyll/Hugo); tenured-faculty pages with student/teaching as top-level sections (use academicpages).
Public demo
A real-world dogfood example: https://wanshuiyin.github.io/ — homepage generated by this skill from a CV + the maintainer's previous manual page. Use it as a style and capability reference; do not copy any names, affiliations, advisors, awards, paper titles, or filenames into your own examples or tests (see Privacy below).
Quick start
aris-homepage init --from-cv ./cv.pdf --out ./site
cd ./site
aris-homepage finalize
$EDITOR profile.yml publications.bib bio.md news.md EXTRACTION_REVIEW.md
aris-homepage render --persona theory-minimal
Input model — three sources for the LLM agent
The init CLI only handles the CV → text conversion. The other two inputs are consumed by the calling LLM agent when it fills extraction.json. Recommend supplying all three for best results:
| Input | How to supply | Purpose |
|---|
| CV | --from-cv cv.docx/pdf/txt on the CLI | The factual source — identity, education, jobs, publications, awards |
| GitHub repos (v1.1) | --from-repos owner/repo,owner/repo2 on the CLI | The project-evidence source — stars / releases / topics / README per repo; merged into News + featured projects (issue #2) |
| Manual homepage | Provide URL in the prompt; the agent uses WebFetch | The editorial source — section ordering, topic groupings, tone, link priorities, photo URL |
| Assets directory | Provide path in the prompt; the agent inspects + copies into assets/ | The visual source — headshot, paper thumbnails, project logos |
Reconciliation rule: the CV is authoritative for facts (paper venues, dates, author lists), the manual homepage is authoritative for how you present yourself (what to group, what to surface, what voice), and the assets folder provides visuals. If the three sources conflict, do not silently merge — write the conflict to EXTRACTION_REVIEW.md for user resolution.
If you have no manual homepage yet: skip it. The generator falls back to CV-only structure with sensible academic defaults.
Coming in v1.1: native CLI flags --manual-homepage URL and --assets-dir PATH will fetch + stage these inputs automatically. For v1, the calling agent handles them.
Commands
aris-homepage init --from-cv <file> [--from-repos owner/repo,...] [--include-private] [--out DIR] [--force|--merge]
aris-homepage finalize [--out DIR]
aris-homepage render --persona theory-minimal [--out index.html] [--override-all] [--no-audit] [--offline]
aris-homepage check [--strict]
aris-homepage doctor
Generated editable source files
After finalize, your working dir contains these editable files. Edit them in your IDE; they are the source of truth — re-run render after each change.
| File | Role |
|---|
profile.yml | Structured facts: identity, affiliations, education, research, links, awards, talks, teaching, featured projects, publication metadata, audit overrides |
publications.bib | BibTeX entries — paper truth source |
bio.md | 1-3 paragraph self-introduction in Markdown |
news.md | Reverse-chronological news bullets; supports inline <img> for embedded badges |
assets/ | Optional local images (photo, paper thumbnails) — remote https:// URLs also accepted in profile.yml |
EXTRACTION_REVIEW.md | LLM extraction confidence flags — read this before the first render |
.aris-homepage/ | Internal cache (DBLP responses, extraction handoff JSON); safe to delete |
audit-report.md | Generated by render / check — your evidence trail |
Schema reference
profile.yml has many optional fields; the complete reference lives in PROFILE_SCHEMA.md in this skill directory. Keep that as the single source of truth for fields.
Core schema groups (read PROFILE_SCHEMA.md for the exact field shapes):
identity: name, name_native (bilingual), title, email, wechat, office, photo (local path or remote URL)
affiliations: current + past arrays with role / institution / department / start / end
education · research (summary + interests) · links (scholar / github / dblp / orcid / etc.)
featured_projects: first-class section for flagship OSS work — logo, stats grid, link cluster, sub-projects, open problems
awards · talks · teaching · blogs_tutorials (rendered combined with talks)
professional_services: conference reviewer / journal reviewer / editorial board list
selected_publications: flat list OR ordered topic groups ([{group: "Topic Title", keys: [bibkey1, ...]}])
publications: preamble (intro sentence before first H3)
publications_meta.<bibkey>: thumbnail, description (blue blurb box), awards (list of badges), co_first (equal-contribution markers), links (arXiv / paper / code / slides / openreview / etc. — any key supported)
audit.overrides.<bibkey>: per-paper, per-field bypass with required reason and optional expires: YYYY-MM-DD
ship: persona, accent_color, lang, awards_heading (override "Awards" → custom string)
Fact-check protocol
Runs automatically during every render (unless --no-audit). Three outcomes per claim:
| Outcome | Trigger | Effect |
|---|
| PASS (silent) | Title fuzzy-matches DBLP; venue + year + author set agree | No mention in audit-report |
| WARN (soft) | DBLP returns 0 hits OR ≥2 ambiguous; arXiv-only paper; award has no external URL | Render proceeds; logged in audit-report |
| FAIL (hard) | DBLP venue ≠ profile venue; year mismatch; author list missing user; fabricated award badge; missing bibkey in publications.bib | HTML still renders but verdict = BLOCKED; user must --override-all to ship |
Override two-layer:
- Per-paper, per-field in profile.yml:
audit.overrides.<bibkey>.<field>: true with required reason: and optional expires: date
- CLI emergency:
aris-homepage render --override-all (every override loudly logged)
Honest scope of fact-check: catches venue/year/author mismatch and fabricated award claims. Does NOT verify: workshop papers without DBLP entries, industry tech reports, blog/talk content, OSS star counts, or arbitrary claims in the bio. Treat the audit as a diagnostic floor, not a guarantee.
Cross-model review — what's automated vs optional
Two distinct review layers; do not confuse them:
Layer 1 — automated factual audit (always runs)
aris-homepage render and aris-homepage check run a deterministic Python pipeline that queries DBLP (with a 4-attempt backoff + local cache at .aris-homepage/dblp-cache.json) and falls back to arXiv hints. No external LLM needed. This is the floor of fact-check, and it works with zero AI-runtime dependencies beyond Python + the calling shell.
Layer 2 — optional adversarial LLM review (recommended for high-stakes)
If the calling agent has access to Codex MCP (mcp__codex__codex), run a fresh-thread Codex review after render to scrutinize: bio prose tone, claim phrasing, sub-project list, schema consistency. Codex acts as the cross-family reviewer (ARIS's adversarial-review invariant).
If the calling agent has access to Gemini (mcp__gemini__analyzeFile or mcp__gemini-cli__ask-gemini with model: auto-gemini-3), additionally use it to critique a Chrome-headless screenshot of the rendered HTML for visual issues (layout collisions, font sizes, image proportions).
Minimum required runtime: Python + the calling shell. The skill renders + fact-checks fully without Codex or Gemini. Codex strengthens the review; Gemini adds visual-design feedback. Neither is required to generate or ship the homepage.
| Runtime | What you get |
|---|
| Python only | Layer-1 DBLP fact-check; full render |
| + Codex MCP | + Adversarial LLM review of prose / claims / schema |
| + Gemini multimodal | + Visual-design critique of rendered screenshot |
Pipeline
┌────────────────────────────────────────────┐
cv.{pdf,docx} ─►│ Step 1: extract → cv.txt │
│ Step 1b: if --manual-homepage, WebFetch │
│ Step 1c: if --assets-dir, link to workspace│
└─────────────────┬──────────────────────────┘
▼
┌──────────────────────────────────────────┐
│ Step 2: LLM agent fills extraction.json │
│ (JSON-schema-constrained output) │
└─────────────────┬────────────────────────┘
▼
┌──────────────────────────────────────────┐
│ Step 3: aris-homepage finalize │
│ → profile.yml + publications.bib │
│ + bio.md + news.md + EXTRACTION_REVIEW │
└─────────────────┬────────────────────────┘
│
✋ USER EDITS IN IDE ✋
│
▼
┌──────────────────────────────────────────┐
│ Step 4: render (with Layer-1 DBLP audit) │
│ ↳ Python DBLP/arXiv fact-check │
│ ↳ Python builds per-section HTML │
│ ↳ inject into homepage-<persona>.html │
│ ↳ (optional) Codex MCP adversarial pass│
│ ↳ (optional) Gemini screenshot critique│
└─────────────────┬────────────────────────┘
▼
┌──────────────────────────────────────────┐
│ index.html + audit-report.md │
└──────────────────────────────────────────┘
Pipeline: dependencies
- Python 3.10+,
pyyaml (pip install pyyaml, may need --break-system-packages on modern macOS)
- BibTeX: parsed by a bundled stdlib parser — no
bibtexparser dependency
- DOCX:
textutil on macOS (bundled) OR python-docx (pip install python-docx)
- PDF:
pdftotext (install via brew install poppler / apt install poppler-utils)
- DBLP: official
https://dblp.org/search/publ/api (rate-limited 4-attempt backoff + local cache in .aris-homepage/dblp-cache.json)
- Codex MCP: for cross-model factual audit (required for the audit; the generator runs without it via
--no-audit)
aris-homepage doctor checks all of the above.
Privacy and generic examples
All examples in this skill must be generic unless explicitly marked as the public demo URL (wanshuiyin.github.io).
Never include in examples, schema docs, or tests:
- maintainer / collaborator / advisor / student names
- institution-specific names (university, department, lab)
- local filesystem paths (
/Users/..., ~/...)
- private CV filenames
- email addresses, phone numbers, WeChat IDs, office locations
- copied publication lists, awards, employment history, unpublished project names
- API keys or credentials
Use placeholders:
Dr. Example Researcher · Jane Doe
Example University · Department of CS
cv.pdf · assets/photo.jpg
https://example.github.io/
example2026paper (bibkey)
advisor@example.edu
The public demo at wanshuiyin.github.io is the only exception — it's an authorized, named real-world example of generator output, not a source to copy data from.
Acceptance criteria (v1 done)
aris-homepage init --from-cv produces editable scaffolding from any user's CV (single-file .docx or .pdf).
aris-homepage render --persona theory-minimal produces a single HTML file ≤500KB (no images) or ≤2MB (with photo + thumbnails inline), or smaller still when images are referenced via remote URLs.
- Fact-check correctly hard-fails on a corrupted profile.yml (e.g., venue swap NeurIPS↔ICML) and passes when corrected.
- The HTML is publishable on GitHub Pages / Netlify / S3 / any static host without a build step.
aris-homepage doctor accurately reports environment readiness.
- No personal info from the maintainer's dogfood leaks into shipped examples or tests.
What's deferred (v1.1+)
active-researcher template (placeholder exists; theory-minimal is the only fully-shipping persona)
- Builder-Engineer / PM personas (v2 — these target non-academic users)
- Multi-page output (sidebar nav for sites with 50+ pubs)
- Bilingual side-by-side mode (
lang: bilingual)
- Automated
--manual-homepage editorial-extraction helpers (currently the calling LLM agent reads the fetched HTML and reconciles)
- Auto-thumbnail downscaling and WebP conversion
Related
skills/interview-cheatsheet/SKILL.md — sister skill for ML interview cheat sheets (shared cross-model review pattern)
skills/render-html/SKILL.md — Markdown → single-file HTML primitive
tools/aris_homepage.py — implementation
tools/templates/homepage-theory-minimal.html — template
PROFILE_SCHEMA.md (sibling file) — complete schema reference