// Use AI to merge individual page HTML files into a unified chapter document. Creates continuous document format for improved reading experience and semantic consistency.
| name | ai-chapter-consolidate |
| description | Use AI to merge individual page HTML files into a unified chapter document. Creates continuous document format for improved reading experience and semantic consistency. |
This skill uses AI to intelligently merge individual page HTML files into a single, continuous chapter document. Rather than simple concatenation, the AI:
The result is a unified chapter document in the continuous format (single page-container, single page-content).
Collect all page HTML files for chapter
04_page_XX.html files for all pages in chapterExtract content from each page
<main class="page-content">Prepare consolidation inputs for AI
Invoke AI consolidation
Process AI output
Save consolidated document
output/chapter_XX/chapter_artifacts/chapter_XX.htmlPer-page HTML files (validated by previous gate):
output/chapter_XX/page_artifacts/page_16/04_page_16.html (Chapter opening)output/chapter_XX/page_artifacts/page_17/04_page_17.html (Continuation)output/chapter_XX/page_artifacts/page_18/04_page_18.html (Continuation)Chapter metadata (from analysis):
The prompt sent to Claude:
You are merging individual page HTML documents into a single, continuous chapter.
INPUT PAGES:
Page 1 (Opening - include chapter header):
[HTML content from page 1]
Page 2 (Continuation):
[HTML content from page 2]
Page 3 (Continuation):
[HTML content from page 3]
... (all pages)
TASK:
Merge these pages into a single HTML document that reads as one continuous chapter.
REQUIREMENTS:
1. Structure:
- Create single <div class="page-container"> wrapping everything
- Create single <main class="page-content"> for all content
- Remove page-break indicators or comments
- Create truly continuous document (no paginated elements)
2. Chapter Header:
- Keep chapter header from Page 1 (chapter number, title)
- Remove chapter headers/titles from continuation pages
- Keep section navigation if present on Page 1
- Remove duplicate navigation from other pages
3. Content Preservation:
- Include ALL text content from all pages
- Preserve exact wording (no paraphrasing)
- Maintain all lists, paragraphs, tables
- Include all semantic classes
- Keep all HTML structure
4. Heading Hierarchy:
- Normalize heading levels across merged pages
- Page 1 h1 = Chapter title (stays as h1)
- First section in each page = h2 (main sections)
- Sub-sections = h3 or h4 as needed
- Ensure no hierarchy jumps (h1 โ h3 without h2)
- Number consecutive headings logically
5. Content Flow:
- Remove page-specific headers/footers
- Merge seamlessly so content flows naturally
- No artificial breaks or transitions
- Paragraphs continue logically
- Lists maintain coherence
6. Exhibits and Images:
- Preserve all tables and figures
- Keep exhibit titles and captions
- Include all images with proper paths
- Maintain table of contents if present
7. CSS Classes:
- Preserve all semantic classes (section-heading, paragraph, etc.)
- Keep consistent class usage throughout
- Ensure classes match chapter opening page style
- Do not add or remove classes
8. Metadata:
- Include title tag: "Chapter N: Title - Pages X-Y"
- Keep meta charset and viewport
- Link stylesheet: <link rel="stylesheet" href="../../styles/main.css">
OUTPUT:
Return ONLY a single, valid HTML5 document:
```html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Chapter [N]: [Title] - Pages [X-Y]</title>
<link rel="stylesheet" href="../../styles/main.css">
</head>
<body>
<div class="page-container">
<main class="page-content">
<!-- All content from all pages, merged seamlessly -->
</main>
</div>
</body>
</html>
VALIDATION:
## Page Content Extraction Logic
Before sending to AI, extract content strategically:
### Page 1 (Opening):
- **Include**: Entire page HTML content
- **Reason**: Contains chapter header, navigation, first section
- **Preserve**: All elements (header, nav, dividers, content)
### Pages 2-N (Continuation):
- **Extract**: Only content after chapter header
- **Skip**: Chapter number, chapter title, section navigation
- **Preserve**: Section headings, paragraphs, lists, exhibits
- **Include**: All semantic content sections
### Example extraction:
```html
<!-- Page 1: Keep everything -->
<div class="chapter-header">
<span class="chapter-number">2</span>
<h1 class="chapter-title">Rights in Real Estate</h1>
</div>
<nav class="section-navigation">...</nav>
<h2 class="section-heading">REAL PROPERTY RIGHTS</h2>
<p class="paragraph">...</p>
<!-- Page 2: Skip header, keep content -->
<!-- <div class="chapter-header">...</div> SKIPPED -->
<!-- <nav class="section-navigation">...</nav> SKIPPED -->
<h4 class="subsection-heading">Physical characteristics.</h4>
<p class="paragraph">...</p>
<ul class="bullet-list">...</ul>
<!-- Page 3: Continue same pattern -->
<h4 class="subsection-heading">Interdependence.</h4>
<p class="paragraph">...</p>
Path: output/chapter_XX/chapter_artifacts/chapter_XX.html
Structure:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Chapter 2: Rights in Real Estate - Pages 16-29</title>
<link rel="stylesheet" href="../../styles/main.css">
</head>
<body>
<div class="page-container">
<main class="page-content">
<!-- Chapter header (from page 1) -->
<div class="chapter-header">...</div>
<nav class="section-navigation">...</nav>
<hr class="section-divider">
<!-- Page 1 content -->
<h2 class="section-heading">REAL PROPERTY RIGHTS</h2>
<p class="paragraph">...</p>
<!-- Page 2 content (seamlessly merged) -->
<h4 class="subsection-heading">Physical characteristics.</h4>
<p class="paragraph">...</p>
<ul class="bullet-list">...</ul>
<!-- Page 3 content (continuing flow) -->
<h4 class="subsection-heading">Interdependence.</h4>
<p class="paragraph">...</p>
<!-- ... more content from remaining pages ... -->
<!-- Final page content -->
<h2 class="section-heading">REGULATIONS AND LICENSING</h2>
<p class="paragraph">...</p>
</main>
</div>
</body>
</html>
Path: output/chapter_XX/chapter_artifacts/consolidation_log.json
{
"chapter": 2,
"title": "Rights in Real Estate",
"book_pages": "16-29",
"pdf_indices": "15-28",
"consolidated_at": "2025-11-08T14:35:00Z",
"pages_merged": 14,
"pages_included": [
{
"page": 16,
"book_page": 17,
"status": "opening_chapter",
"content_type": "header_navigation_content"
},
{
"page": 17,
"book_page": 18,
"status": "continuation",
"content_type": "subsections_paragraphs"
},
{
"page": 18,
"book_page": 19,
"status": "continuation",
"content_type": "subsections_paragraphs_list"
}
// ... all pages
],
"content_statistics": {
"total_headings": {
"h1": 1,
"h2": 4,
"h3": 0,
"h4": 12
},
"total_paragraphs": 156,
"total_lists": 12,
"total_list_items": 42,
"total_tables": 3,
"total_images": 5,
"total_words": 12547
},
"ai_model": "claude-3-5-sonnet-20241022",
"consolidation_notes": "Successfully merged 14 pages into continuous format"
}
Execute consolidation via Python wrapper:
cd Calypso/tools
# Run consolidation
python3 consolidate_chapter.py \
--chapter 2 \
--pages 15-28 \
--output "../output" \
--mapping "../analysis/page_mapping.json"
# Or invoke directly via Claude API:
# The orchestrator sends the AI prompt with all page contents
Before passing to next gate:
File created
chapter_XX.html existsStructure validated
<div class="page-container"><main class="page-content">Content completeness
Heading hierarchy
Metadata logged
โ All pages merged into single document โ Chapter header preserved from page 1 โ Duplicate headers removed from continuation pages โ Content flows naturally (continuous format) โ Heading hierarchy is correct โ All text content preserved โ Semantic classes maintained โ Ready for semantic validation
If page HTML is incomplete:
If heading hierarchy is ambiguous:
If content appears duplicated:
Once consolidation completes:
To test consolidation on Chapter 2:
# Input: 14 individual page HTML files (pages 16-29)
# Process: AI merges into single continuous chapter
# Output: chapter_02.html (single, unified document)
# Verify:
# - File size is sum of all pages
# - Content flows logically
# - Heading hierarchy makes sense
# - No duplicate sections