| name | auto-paper-improvement-loop |
| description | Autonomously improve a generated paper via GPT-5.4 xhigh review → implement fixes → recompile, for 2 rounds. Use when user says "改论文", "improve paper", "论文润色循环", "auto improve", or wants to iteratively polish a generated paper. |
| argument-hint | ["paper-directory"] |
| allowed-tools | Bash(*), Read, Write, Edit, Grep, Glob, Agent, mcp__codex__codex, mcp__codex__codex-reply |
Auto Paper Improvement Loop: Review → Fix → Recompile
Autonomously improve the paper at: $ARGUMENTS
Context
This skill is designed to run after Workflow 3 (/paper-plan → /paper-figure → /paper-write → /paper-compile). It takes a compiled paper and iteratively improves it through external LLM review.
Unlike /auto-review-loop (which iterates on research — running experiments, collecting data, rewriting narrative), this skill iterates on paper writing quality — fixing theoretical inconsistencies, softening overclaims, adding missing content, and improving presentation.
For control, robotics, and systems papers targeting IEEE Transactions, treat this as a reviewer-style journal polishing loop, not a conference-style cosmetic pass. The default priority is:
- claim-evidence alignment
- theorem-assumption-proof consistency
- comparison fairness and adequacy
- IEEE Trans writing discipline
- final formatting and compilation hygiene
Constants
- MAX_ROUNDS = 2 — Two rounds of review→fix→recompile. Empirically, Round 1 catches structural issues (4→6/10), Round 2 catches remaining presentation issues (6→7/10). Diminishing returns beyond 2 rounds for writing-only improvements.
- REVIEWER_MODEL =
gpt-5.4 — Model used via Codex MCP for paper review.
- REVIEW_LOG =
PAPER_IMPROVEMENT_LOG.md — Cumulative log of all rounds, stored in paper directory.
- TARGET_VENUE =
IEEE_TRANS — Default venue family. Supported: IEEE_TAC, IEEE_TSMC, IEEE_TCYB, IEEE_TIE, IEEE_TNNLS, IEEE_TCSI, IEEE_ACCESS, AUTOMATICA, ICLR, NeurIPS, ICML.
- HUMAN_CHECKPOINT = false — When
true, pause after each round's review and present score + weaknesses to the user. The user can approve fixes, provide custom modification instructions, skip specific fixes, or stop early. When false (default), runs fully autonomously.
- AUTO_PROCEED = true — When
false, stop after each review round even if HUMAN_CHECKPOINT = false.
💡 Override: /auto-paper-improvement-loop "paper/" — venue: IEEE_TAC, human checkpoint: true
Inputs
- Compiled paper —
paper/main.pdf + LaTeX source files
- All section
.tex files — concatenated for review prompt
- Optional claim artifacts —
CLAIMS_FROM_RESULTS.md, PAPER_PLAN.md, AUTO_REVIEW.md, findings.md
Project Automation Policy
Before acting, resolve automation defaults in this precedence order:
- Inline command arguments
PROJECT_AUTOMATION.md in the project root
CLAUDE.md in the project root
- The constants in this skill
For control, robotics, and systems papers, a good default is:
AUTO_PROCEED = false
HUMAN_CHECKPOINT = true
That keeps the loop autonomous at the edit level while still requiring your approval at each reviewer gate.
Before sending the paper to REVIEWER_MODEL, read:
../shared-references/agent-role-charter.md
../shared-references/anti-ai-writing.md
../shared-references/writing-principles.md
../shared-references/model-routing-policy.md
Routing:
gpt-5.4 owns the reviewer gate
Sonnet owns local rewrite application, citation cleanup, notation cleanup, and other mechanical fixes
Opus is used only if the review forces a story-level rewrite, claim contraction, or final whole-paper integration pass
State Persistence (Compact Recovery)
If the context window fills up mid-loop, Claude Code auto-compacts. To recover, this skill writes PAPER_IMPROVEMENT_STATE.json after each round:
{
"current_round": 1,
"threadId": "019ce736-...",
"last_score": 6,
"status": "in_progress",
"timestamp": "2026-03-13T21:00:00"
}
On startup: if PAPER_IMPROVEMENT_STATE.json exists with "status": "in_progress" AND timestamp is within 24 hours, read it + PAPER_IMPROVEMENT_LOG.md to recover context, then resume from the next round. Otherwise (file absent, "status": "completed", or older than 24 hours), start fresh.
After each round: overwrite the state file. On completion: set "status": "completed".
Workflow
Step 0: Preserve Original
cp paper/main.pdf paper/main_round0_original.pdf
Step 1: Collect Paper Text
Concatenate all section files into a single text block for the review prompt:
for f in paper/sections/*.tex; do
echo "% === $(basename $f) ==="
cat "$f"
done > /tmp/paper_full_text.txt
Also gather any structured context that constrains valid edits:
PAPER_PLAN.md for claims-evidence mapping
CLAIMS_FROM_RESULTS.md for supported claim boundaries
AUTO_REVIEW.md or findings.md for known weaknesses already diagnosed
If these files exist, include their essential conclusions in the review briefing so the reviewer judges the paper against the actual supported claims, not an inflated reading of the prose.
Step 2: Round 1 Review
Send the full paper text to GPT-5.4 xhigh:
mcp__codex__codex:
model: gpt-5.4
config: {"model_reasoning_effort": "xhigh"}
prompt: |
You are reviewing a [TARGET_VENUE] paper. Please provide a detailed, structured review.
## Full Paper Text:
[paste concatenated sections]
## Optional Structured Context
[paste claims-from-results summary, paper plan highlights, prior reviewer conclusions if available]
## Review Instructions
Please act as a senior professor in multi-agent control, robotics, and dynamical
systems, with extensive journal reviewer and associate-editor experience. Also act
like a hard scientific writing editor who can detect AI-looking prose immediately.
If TARGET_VENUE is an IEEE Transactions journal, use the standards of a strong
control / robotics / systems journal reviewer rather than a conference reviewer. Provide:
1. **Overall Score** (1-10, where 6 = weak accept, 7 = accept)
2. **Summary** (2-3 sentences)
3. **Strengths** (bullet list, ranked)
4. **Weaknesses** (bullet list, ranked: CRITICAL > MAJOR > MINOR)
5. **For each CRITICAL/MAJOR weakness**: A specific, actionable fix
6. **Missing References** (if any)
7. **Verdict**: Ready for submission? Yes / Almost / No
Focus on:
- theoretical rigor
- claims vs evidence alignment
- writing clarity and self-containedness
- notation consistency
- AI-like phrasing, generic transitions, and hype language
- whether the prose sounds like a field-native journal manuscript rather than LLM output
If TARGET_VENUE is IEEE_TRANS or one of IEEE_TAC / IEEE_TSMC / IEEE_TCYB / IEEE_TIE / IEEE_TNNLS / IEEE_TCSI / IEEE_ACCESS, explicitly evaluate:
- whether every main contribution is backed by a theorem, proposition, simulation, or comparison
- whether assumptions are stronger than necessary or insufficiently justified
- whether proof flow has hidden gaps, missing definitions, or informal leaps
- whether simulation scenarios actually validate the claimed theory
- whether comparisons to classical or representative baselines are missing
- whether Abstract / Introduction / Conclusion read like an IEEE Transactions paper rather than AI-generated prose
- whether claims are overstated relative to what is proved and simulated
Do not ask for impossible fixes. Prefer the minimum grounded fix that materially improves publishability.
When flagging AI-like writing, quote the concrete sentence pattern and suggest a more field-native rewrite direction.
Save the threadId for Round 2.
Step 2b: Human Checkpoint (if enabled)
Skip only if HUMAN_CHECKPOINT = false AND AUTO_PROCEED = true.
Present the review results and wait for user input:
📋 Round 1 review complete.
Score: X/10 — [verdict]
Key weaknesses (by severity):
1. [CRITICAL] ...
2. [MAJOR] ...
3. [MINOR] ...
Reply "go" to implement all fixes, give custom instructions, "skip 2" to skip specific fixes, or "stop" to end.
Parse user response same as /auto-review-loop: approve / custom instructions / skip / stop.
If AUTO_PROCEED = false, this checkpoint is mandatory even when HUMAN_CHECKPOINT = false.
Step 3: Implement Round 1 Fixes
When fixes touch disjoint files or disjoint paper slices, they may be applied in parallel. Keep the review gate itself serial.
Parse the review and implement fixes by severity:
Priority order:
- CRITICAL fixes (assumption mismatches, internal contradictions)
- MAJOR fixes (overclaims, missing content, notation issues)
- MINOR fixes (if time permits)
Trans-first priority override: if the target is an IEEE Transactions journal, apply this ordering inside CRITICAL/MAJOR:
- claim-theorem-simulation mismatches
- unjustified or missing assumptions
- missing baseline/comparison framing
- weak introduction story or contribution bullets
- conclusion, abstract, and language polish
Common fix patterns:
| Issue | Fix Pattern |
|---|
| Assumption-model mismatch | Rewrite assumption to match the model, add formal proposition bridging the gap |
| Overclaims | Soften language: "validate" → "demonstrate practical relevance", "comparable" → "qualitatively competitive" |
| Missing metrics | Add quantitative table with honest parameter counts and caveats |
| Theorem not self-contained | Add "Interpretation" paragraph listing all dependencies |
| Notation confusion | Rename conflicting symbols globally, add Notation paragraph |
| Missing references | Add to references.bib, cite in appropriate locations |
| Theory-practice gap | Explicitly frame theory as idealized; add synthetic validation subsection |
| Missing theorem-to-simulation mapping | Add a short bridge sentence or paragraph in Intro / Main Results / Simulation identifying which result validates which claim |
| IEEE-style abstract too generic | Rewrite into problem → method → theorem/result → validation structure; remove hype and background filler |
| Introduction reads like conference paper | Expand related-work synthesis, surface the technical gap earlier, and make contributions concrete and falsifiable |
| Missing limitations / scope boundary | Add an honest remark in Introduction, Discussion, or Conclusion about what is not claimed |
| AI-like prose | Remove filler transitions, repeated sentence templates, hype adjectives, and generic motivation; replace with paper-specific technical content |
Step 4: Recompile Round 1
cd paper && latexmk -C && latexmk -pdf -interaction=nonstopmode -halt-on-error main.tex
cp main.pdf main_round1.pdf
Verify: 0 undefined references, 0 undefined citations.
Step 5: Round 2 Review
Use mcp__codex__codex-reply with the saved threadId:
mcp__codex__codex-reply:
threadId: [saved from Round 1]
model: gpt-5.4
config: {"model_reasoning_effort": "xhigh"}
prompt: |
[Round 2 update]
Since your last review, we have implemented:
1. [Fix 1]: [description]
2. [Fix 2]: [description]
...
Please re-score and re-assess. Same format:
Score, Summary, Strengths, Weaknesses, Actionable fixes, Verdict.
Pay special attention to whether any remaining claims are still too strong for the available theory and simulations.
Step 5b: Human Checkpoint (if enabled)
Skip only if HUMAN_CHECKPOINT = false AND AUTO_PROCEED = true. Same as Step 2b — present Round 2 review, wait for user input.
Step 6: Implement Round 2 Fixes
Same process as Step 3. Typical Round 2 fixes:
- Add controlled synthetic experiments validating theory
- Further soften any remaining overclaims
- Formalize informal arguments (e.g., truncation → formal proposition)
- Strengthen limitations section
Step 7: Recompile Round 2
cd paper && latexmk -C && latexmk -pdf -interaction=nonstopmode -halt-on-error main.tex
cp main.pdf main_round2.pdf
Step 8: Format Check
After the final recompilation, run a format compliance check:
PAGES=$(pdfinfo paper/main.pdf | grep Pages | awk '{print $2}')
echo "Pages: $PAGES"
OVERFULL=$(grep -c "Overfull" paper/main.log 2>/dev/null || echo 0)
echo "Overfull hbox warnings: $OVERFULL"
grep "Overfull" paper/main.log 2>/dev/null | head -10
UNDERFULL=$(grep -c "Underfull" paper/main.log 2>/dev/null || echo 0)
echo "Underfull hbox warnings: $UNDERFULL"
grep -c "badness" paper/main.log 2>/dev/null || echo "0 badness warnings"
Auto-fix patterns:
| Issue | Fix |
|---|
| Overfull hbox in equation | Wrap in \resizebox or split with \split/aligned |
| Overfull hbox in table | Reduce font (\small/\footnotesize) or use \resizebox{\linewidth}{!}{...} |
| Overfull hbox in text | Rephrase sentence or add \allowbreak / \- hints |
| Over page limit | Move content to appendix, compress tables, reduce figure sizes |
| Underfull hbox (loose) | Rephrase for better line filling or add \looseness=-1 |
If any overfull hbox > 10pt is found, fix it and recompile before documenting.
For IEEE Transactions venues, also check:
- abstract length and style
- presence of Index Terms
- appendix placement and proof references
- readability of all figures and tables in two-column format
- whether the manuscript stays within a typical journal page range instead of drifting into unnecessary length
Step 9: Document Results
Create PAPER_IMPROVEMENT_LOG.md in the paper directory:
# Paper Improvement Log
## Score Progression
| Round | Score | Verdict | Key Changes |
|-------|-------|---------|-------------|
| Round 0 (original) | X/10 | No/Almost/Yes | Baseline |
| Round 1 | Y/10 | No/Almost/Yes | [summary of fixes] |
| Round 2 | Z/10 | No/Almost/Yes | [summary of fixes] |
## Round 1 Review & Fixes
<details>
<summary>GPT-5.4 xhigh Review (Round 1)</summary>
[Full raw review text, verbatim]
</details>
### Fixes Implemented
1. [Fix description]
2. [Fix description]
...
## Round 2 Review & Fixes
<details>
<summary>GPT-5.4 xhigh Review (Round 2)</summary>
[Full raw review text, verbatim]
</details>
### Fixes Implemented
1. [Fix description]
2. [Fix description]
...
## PDFs
- `main_round0_original.pdf` — Original generated paper
- `main_round1.pdf` — After Round 1 fixes
- `main_round2.pdf` — Final version after Round 2 fixes
Step 9: Summary
Report to user:
- Score progression table
- Number of CRITICAL/MAJOR/MINOR issues fixed per round
- Final page count
- Remaining issues (if any)
Feishu Notification (if configured)
After each round's review AND at final completion, check ~/.claude/feishu.json:
- After each round: Send
review_scored — "Round N: X/10 — [key changes]"
- After final round: Send
pipeline_done — score progression table + final page count
- If config absent or mode
"off": skip entirely (no-op)
Output
paper/
├── main_round0_original.pdf # Original
├── main_round1.pdf # After Round 1
├── main_round2.pdf # After Round 2 (final)
├── main.pdf # = main_round2.pdf
└── PAPER_IMPROVEMENT_LOG.md # Full review log with scores
Key Rules
-
Large file handling: If the Write tool fails due to file size, immediately retry using Bash (cat << 'EOF' > file) to write in chunks. Do NOT ask the user for permission — just do it silently.
-
Preserve all PDF versions — user needs to compare progression
-
Save FULL raw review text — do not summarize or truncate GPT-5.4 responses
-
Use mcp__codex__codex-reply for Round 2 to maintain conversation context
-
Always recompile after fixes — verify 0 errors before proceeding
-
Do not fabricate experimental results — synthetic validation must describe methodology, not invent numbers
-
Respect the paper's claims — soften overclaims rather than adding unsupported new claims
-
Global consistency — when renaming notation or softening claims, check ALL files (abstract, intro, method, experiments, theory sections, conclusion, tables, figure captions)
-
For IEEE Trans targets, theory and validation outrank prose polish — do not spend the loop on cosmetics while claim support is still weak
-
Do not silently add new contributions — only sharpen, narrow, justify, or reorganize existing supported contributions
-
Prefer explicit scope boundaries to vague optimism — a precise limitation is better than an inflated claim
Typical Score Progression
Typical trajectories differ by venue family:
IEEE Trans-style control / robotics / systems paper
| Round | Score | Key Improvements |
|---|
| Round 0 | 4-6/10 | Baseline: weak claim-evidence mapping, assumptions unclear, simulations undersold or misaligned |
| Round 1 | 6-7/10 | Fixed assumptions, narrowed claims, strengthened Intro / Main Results / Simulation connections |
| Round 2 | 7-8/10 | Added scope boundaries, comparison framing, theorem interpretation, cleaner IEEE journal prose |
Conference-style paper
| Round | Score | Key Improvements |
|---|
| Round 0 | 4-6/10 | Baseline: structure, overclaims, notation issues |
| Round 1 | 6-7/10 | Main content fixes |
| Round 2 | 7-8/10 | Presentation and compliance fixes |
For your direction, the useful target is usually not "more polish" but "fewer unjustified claims and cleaner theorem-to-evidence closure."