| name | rrwrite-assemble |
| description | Assembles all manuscript sections into a complete manuscript with validation and metadata generation |
| arguments | [{"name":"target_dir","description":"Output directory containing manuscript sections","default":"manuscript"}] |
| allowed-tools | null |
| context | fork |
Manuscript Assembly Protocol
Purpose
Combine all drafted manuscript sections into a complete manuscript file with validation, metadata generation, and quality checks.
Prerequisites
Required files in {target_dir}:
abstract.md
introduction.md
methods.md
results.md
discussion.md
availability.md (optional)
Recommended:
literature_citations.bib - For citation validation
outline.md - For structure verification
Workflow
Phase 1: Pre-Assembly Validation
-
Check Required Sections:
cd {target_dir}
required_sections=("abstract.md" "introduction.md" "results.md" "discussion.md" "methods.md")
missing=()
for section in "${required_sections[@]}"; do
if [ ! -f "$section" ]; then
missing+=("$section")
fi
done
if [ ${#missing[@]} -gt 0 ]; then
echo "Error: Missing required sections: ${missing[*]}"
exit 1
fi
-
Verify Section Completion:
Check workflow state to ensure all sections are marked as completed:
import json
from pathlib import Path
state_file = Path("{target_dir}/.rrwrite/state.json")
if state_file.exists():
with open(state_file) as f:
state = json.load(f)
sections = state.get("workflow_status", {}).get("drafting", {}).get("sections", {})
incomplete = [s for s, data in sections.items() if data.get("status") != "completed"]
if incomplete:
print(f"Warning: Incomplete sections: {incomplete}")
Phase 2: Section Assembly
-
Combine Sections in Order:
For Nature format: Abstract → Introduction → Results → Discussion → Methods → Availability
cd {target_dir}
> manuscript_full.md
cat >> manuscript_full.md << 'EOF'
**Target Journal:** Nature
**Date:** $(date +%Y-%m-%d)
---
EOF
echo "# Abstract" >> manuscript_full.md
echo "" >> manuscript_full.md
cat abstract.md >> manuscript_full.md
echo -e "\n\n---\n" >> manuscript_full.md
echo "# Introduction" >> manuscript_full.md
echo "" >> manuscript_full.md
cat introduction.md >> manuscript_full.md
echo -e "\n\n---\n" >> manuscript_full.md
echo "# Results" >> manuscript_full.md
echo "" >> manuscript_full.md
cat results.md >> manuscript_full.md
echo -e "\n\n---\n" >> manuscript_full.md
echo "# Discussion" >> manuscript_full.md
echo "" >> manuscript_full.md
cat discussion.md >> manuscript_full.md
echo -e "\n\n---\n" >> manuscript_full.md
echo "# Methods" >> manuscript_full.md
echo "" >> manuscript_full.md
cat methods.md >> manuscript_full.md
echo -e "\n\n---\n" >> manuscript_full.md
if [ -f "availability.md" ]; then
echo "# Data and Code Availability" >> manuscript_full.md
echo "" >> manuscript_full.md
cat availability.md >> manuscript_full.md
echo -e "\n\n---\n" >> manuscript_full.md
fi
echo "# References" >> manuscript_full.md
echo "" >> manuscript_full.md
echo "[References will be generated from literature_citations.bib]" >> manuscript_full.md
echo "✓ Manuscript assembled: manuscript_full.md"
Phase 3: Metadata Generation
- Calculate Statistics:
import re
from pathlib import Path
from datetime import datetime
import json
target_dir = Path("{target_dir}")
manuscript_file = target_dir / "manuscript_full.md"
with open(manuscript_file) as f:
content = f.read()
text_only = re.sub(r'```.*?```', '', content, flags=re.DOTALL)
text_only = re.sub(r'^#.*$', '', text_only, flags=re.MULTILINE)
text_only = re.sub(r'\[.*?\]', '', text_only)
words = len(text_only.split())
citations = re.findall(r'\[([a-zA-Z0-9,\s]+)\]', content)
unique_citations = set()
for cite_group in citations:
for cite in cite_group.split(','):
cite = cite.strip()
if cite and cite[0].islower():
unique_citations.add(cite)
sections = re.findall(r'^# (.+)$', content, flags=re.MULTILINE)
section_words = {}
current_section = None
current_text = []
for line in content.split('\n'):
if line.startswith('# '):
if current_section:
section_text = ' '.join(current_text)
section_text = re.sub(r'\[.*?\]', '', section_text)
section_words[current_section] = len(section_text.split())
current_section = line[2:].strip()
current_text = []
else:
current_text.append(line)
if current_section:
section_text = ' '.join(current_text)
section_text = re.sub(r'\[.*?\]', '', section_text)
section_words[current_section] = len(section_text.split())
manifest = {
"version": "1.0",
"generated": datetime.now().isoformat(),
"manuscript_file": "manuscript_full.md",
"statistics": {
"total_words": words,
"total_sections": len(sections),
"unique_citations": len(unique_citations),
"section_breakdown": section_words
},
"sections_included": sections,
"validation": {
"all_required_sections": True,
"word_count_target": 3000,
"within_limit": words <= 3500
}
}
manifest_file = target_dir / "manifest.json"
with open(manifest_file, 'w') as f:
json.dump(manifest, f, indent=2)
print(f"✓ Generated manifest: {manifest_file}")
print(f" Total words: {words}")
print(f" Citations: {len(unique_citations)}")
print(f" Sections: {len(sections)}")
Phase 4: Quality Checks
-
Check Citation Validity:
import re
from pathlib import Path
target_dir = Path("{target_dir}")
manuscript_file = target_dir / "manuscript_full.md"
bib_file = target_dir / "literature_citations.bib"
with open(manuscript_file) as f:
content = f.read()
cited_keys = set()
for cite_group in re.findall(r'\[([a-zA-Z0-9,\s]+)\]', content):
for cite in cite_group.split(','):
cite = cite.strip()
if cite and cite[0].islower():
cited_keys.add(cite)
if bib_file.exists():
with open(bib_file) as f:
bib_content = f.read()
bib_keys = set(re.findall(r'@\w+\{([^,]+),', bib_content))
missing = cited_keys - bib_keys
if missing:
print(f"⚠ Warning: Citations not in .bib file: {missing}")
else:
print(f"✓ All {len(cited_keys)} citations found in .bib file")
else:
print("⚠ Warning: No literature_citations.bib file found")
-
Check Word Count Compliance:
import json
from pathlib import Path
target_dir = Path("{target_dir}")
manifest_file = target_dir / "manifest.json"
with open(manifest_file) as f:
manifest = json.load(f)
total_words = manifest["statistics"]["total_words"]
target = manifest["validation"]["word_count_target"]
if total_words > target * 1.2:
print(f"⚠ Warning: Manuscript is {total_words - target} words over target ({total_words}/{target})")
print(f" Recommendation: Trim {total_words - target} words before submission")
elif total_words < target * 0.8:
print(f"⚠ Warning: Manuscript is {target - total_words} words under target ({total_words}/{target})")
else:
print(f"✓ Word count within acceptable range: {total_words}/{target}")
Phase 5: State Update
- Update Workflow State:
import sys
from pathlib import Path
sys.path.insert(0, str(Path('scripts').resolve()))
from rrwrite_state_manager import StateManager
import json
target_dir = "{target_dir}"
manager = StateManager(output_dir=target_dir, enable_git=False)
manifest_file = Path(target_dir) / "manifest.json"
with open(manifest_file) as f:
manifest = json.load(f)
manager.update_workflow_stage(
"assembly",
status="completed",
file=f"{target_dir}/manuscript_full.md",
manifest_file=f"{target_dir}/manifest.json",
sections_included=manifest["statistics"]["total_sections"],
total_word_count=manifest["statistics"]["total_words"],
validation_warnings=0
)
print("✓ Workflow state updated")
Output Files
The assembly process generates:
-
{target_dir}/manuscript_full.md
- Complete manuscript with all sections
- Section headers and separators
- References placeholder
-
{target_dir}/manifest.json
- Assembly metadata
- Word counts per section
- Citation statistics
- Validation results
Validation Criteria
The manuscript is considered successfully assembled if:
✅ All required sections present and combined
✅ Word count statistics calculated
✅ Citations extracted and validated against .bib file
✅ Manifest generated with complete metadata
✅ Workflow state updated to mark assembly as completed
Display Summary
After successful assembly, display:
============================================================
MANUSCRIPT ASSEMBLY COMPLETE
============================================================
Output: {target_dir}/manuscript_full.md
Statistics:
• Total words: [X]
• Target: [Y] (Nature: 3000 words)
• Status: [Within limit / Over by X words]
• Citations: [N] unique citations
• Sections: [M] sections included
Section Breakdown:
• Abstract: [X] words
• Introduction: [X] words
• Results: [X] words
• Discussion: [X] words
• Methods: [X] words
• Availability: [X] words
Next Steps:
1. Review manuscript_full.md
2. Run critique:
/rrwrite-critique-manuscript --target-dir {target_dir} --file manuscript_full.md
3. Address critique feedback
4. Trim to target word count if needed
============================================================
Error Handling
Missing sections:
Error: Missing required sections: [list]
Please draft all sections before assembly:
/rrwrite-draft-section --target-dir {target_dir} --section [name]
No sections found:
Error: No manuscript sections found in {target_dir}
Expected files: abstract.md, introduction.md, results.md, discussion.md, methods.md
Invalid target directory:
Error: Target directory not found: {target_dir}
Please specify a valid manuscript directory
Notes
- Assembly does NOT modify individual section files
- Original sections remain unchanged in {target_dir}
- Manifest can be regenerated by re-running assembly
- Word counts exclude markdown formatting and citations
- Nature format uses Abstract → Intro → Results → Discussion → Methods order
- For other journals (PLOS, Bioinformatics), adjust section order as needed