| name | paper-summary |
| description | Write structured research paper summaries in a background → problem → method → results format. Use when the user asks to summarize a paper, add a paper entry to a reading list/awesome list, or explain a research method. Triggers on phrases like "summarize this paper", "add this paper to <file>", "write up <paper name>", or when given an arxiv/conference URL with instructions to integrate it into notes. |
Paper Summary Skill
Write compact, technically substantive paper entries that a reader can skim in ~30 seconds and still walk away with (a) what existed before, (b) what was broken, (c) how the method fixes it, and (d) whether it worked.
When to use
- User asks to summarize a paper, add it to a markdown reading list, or explain a research technique
- User provides a paper URL (arXiv, USENIX, OpenReview, etc.) with instructions like "add this to X"
- User asks you to "explain the key ideas of X" for note-taking purposes
Format
Produce four blocks in this order. Each block has a fixed role — do not reorder or merge them. Use a markdown bullet hierarchy (paper title at top level, subsections as nested bullets).
- <Paper Title> [[Venue'YY/MM](url)]
- Background: <existing techniques and their limitations>
- Key problem & insight: <what's broken and the core insight that fixes it>
- Proposed method — <name> with N components:
1. **<Component 1 name>**: <one-line idea>. <high-level mechanism in 1-3 lines>
2. **<Component 2 name>**: ...
3. **<Component 3 name>**: ...
- Results: <headline numbers, vs. which baselines>
Block rules
1. Background / existing techniques
- Name the prior art explicitly (give acronyms if the field uses them: PDC/PIC, GRPO, RoPE, etc.)
- State each one's limitation in one clause, not a paragraph
- If there are two competing approaches, contrast them (e.g., "X is fast but rigid; Y is flexible but expensive")
- Skip generic motivation ("LLMs are important..."). Start from the technical baseline
2. Key problem & insight
- One sentence identifying the specific failure mode the paper targets
- One sentence stating the core insight (the "aha") that makes the method possible
- This is the block where a reader decides whether to keep reading — make it concrete
3. Proposed method
- Lead with the method name and a count of its components ("CacheSlide with three core components")
- For each component:
- Bold the component name (use the paper's actual name, not a paraphrase — grep the paper if needed)
- Start with a one-line high-level idea (what it does, why it exists)
- Follow with the mechanism in 1-3 lines: the key variables, the decision rule, or the equation
- When the mechanism has discrete steps (e.g., pretraining/inference phases, top-k selection + blending), use a sub-bullet list — but keep each sub-bullet ≤1 line
- Do NOT describe every algorithm line. Pick the parts that distinguish the method from the baseline
- If the paper has equations that are central (e.g., a loss function, a novel advantage formulation, a position encoding rule), include them with $\LaTeX$ — but only the ones a reader needs to understand the idea
4. Results
- Report the headline numbers with units (latency X×, accuracy +N points, throughput Y×)
- Name the baselines the paper beats (e.g., "vs. vLLM prefix caching", "vs. DAPO", "over GRPO")
- Skip ablations unless the ablation itself is the main contribution
Style rules
- Explain the insight. Don't directly copy text from the paper or combining multiple sentences, understand and explain the insight of the paper.
- Be concise, not terse. A reader should be able to understand the mechanism, not just memorize its name. But cut any sentence that only restates what a well-named component already implies.
- Use the paper's actual terminology. If the paper calls it "WCA (Weighted Correction Attention)", do not paraphrase it as "selective token correction." Open the paper and check.
- Avoid marketing language. No "novel", "cutting-edge", "state-of-the-art" unless quoting a benchmark result. No "seamlessly", "elegantly", "robustly".
- Prefer verbs over nouns. "Ranks tokens by deviation and blends the top-k" > "performs a ranking-based blending mechanism".
- Equations inline, not blocks. Use
$...$ for short formulas. Only use $$...$$ if the formula is longer than a line and central to the method.
- No emojis. Don't add them unless the surrounding file already uses them.
Length guidance
- Background + Key problem: 2-4 bullets total
- Proposed method: 1 bullet per component, each with an idea line + ≤3 mechanism sub-bullets
- Results: 1 bullet
- Total target: 8-15 lines for a typical systems/ML paper. Go longer only if the method has >3 components or equations are essential.
Process
- Fetch the paper. Use WebFetch on the arXiv HTML or the PDF. If WebFetch returns binary PDF metadata, try WebFetch on the HTML mirror (
arxiv.org/html/<id>) or download via curl and use Read with pages:
- Find the real component names. Skim the method section and the algorithm pseudocode. Do not invent names.
- Identify the baseline being beaten. This tells you what goes in the Background block.
- Draft all four blocks. Then re-read and cut anything that's restating the obvious.
- Check against the file's existing style. If the target markdown file uses a different bullet convention, match it.
Anti-patterns
- Listing every contribution in the abstract. Papers oversell. Pick the 2-3 that actually matter.
- Copy-pasting the abstract's framing. Abstracts are written for acceptance; summaries are written for recall. Re-frame.
- "The authors propose..." Drop it. Start with the technique name.
- Deep-diving one component while ignoring the others. Balance coverage across the components — readers need the whole shape of the method.
- Claiming results without numbers. "Outperforms baselines" is useless. "3.1-4.3× lower latency vs. vLLM" is useful.
Example (reference)
- CacheSlide: Unlocking Cross Position-Aware KV Cache Reuse for Accelerating LLM Serving [[FAST'26](https://www.usenix.org/system/files/fast26-liu-yang.pdf)]
- Background:
- PDC (Position-Dependent Caching): KV tied to absolute positions, reuse only on exact prefix matches
- PIC (Position-Independent Caching): strips position encoding, reusable anywhere but loses attention fidelity
- RoPE: high positional sensitivity — any shift invalidates cached keys
- CoPE: content-gated position encoding, less sensitive to shifts
- Key problem & insight: agent prompts have reusable segments that maintain consistent *relative* ordering despite absolute shifts (RPDC pattern). PDC/PIC don't exploit this; CoPE can, if positions are locked to a learned template rather than recomputed from live context
- Proposed method — CacheSlide with three components:
1. **CCPE (Chunked Contextual Position Encoding)**: pretrain a template $e^*$ of the most frequent CoPE encoding per task; at inference, reuse chunks get positions from $e^*[i]$ (pinned), recompute chunks get live CoPE
2. **WCA (Weighted Correction Attention)**: token-level gate on top of CCPE — rank tokens in a reuse chunk by $d_i = \|K^{\text{new}}_i - K^{\text{cache}}_i\|$, top-k (~5-17%) get blended $K_i \leftarrow \alpha K^{\text{new}}_i + (1-\alpha) K^{\text{cache}}_i$, rest use cache as-is, applied every $\tau$ layers
3. **SLIDE (KV cache manager)**: make WCA's I/O pattern SSD-friendly — relocate updated tokens to fresh pages (sequential writes), spill clean pages first, reclaim scratch pages during decode
- Results: 3.1-4.3× latency reduction, 3.5-5.8× throughput improvement over state-of-the-art baselines
This example shows all four blocks at the target length — use it as a template, not a script.