| name | lucy-ng:CASE |
| description | Full de novo structure elucidation - skip dereplication and solve the structure from NMR correlations. Use when dereplication returned no matches, the compound is known to be novel, or you want to solve the structure from first principles. |
lucy-ng:CASE
Full de novo structure elucidation - skip dereplication and solve the structure from NMR correlations.
Purpose
This skill performs FULL Computer-Assisted Structure Elucidation (CASE) without dereplication. Use this when:
- Dereplication already returned no matches
- You know the compound is novel/not in databases
- You want to solve the structure from first principles
- You're evaluating AI-based CASE methodology
Domain Knowledge
Reference: For NMR background, peak picking strategy, symmetry detection,
dereplication scoring, LSD reference, and ranking interpretation,
see the main skill document: skill/SKILL.md
This skill focuses on the CASE procedure (step-by-step execution). The main skill
document contains all shared domain knowledge.
Prerequisites
lucy --version || pip install lucy-ng
lucy lsd check
Required Data
| Data | Essential? | Purpose |
|---|
| Molecular formula | YES | From user (HRMS) |
| 13C spectrum | YES | All carbon positions |
| HSQC | YES | Direct C-H correlations |
| HMBC | YES | Long-range correlations |
| DEPT-135 | Recommended | Multiplicities (CH, CH2, CH3) |
| COSY | Optional | H-H correlations |
Workflow
Supervisor integration: When running under supervisor control, write CASE-PROGRESS.md after each LSD iteration (see Step 7c). This enables the supervisor to detect loops and provide diagnostic guidance.
Step 0: Setup Documentation
mkdir -p analysis
Document all steps in analysis/ as you proceed.
Step 1: Request Molecular Formula
Always ask the user:
"Please provide the molecular formula for this unknown compound (typically from HRMS)."
Calculate key values from formula:
- Total carbons
- Total hydrogens
- Heteroatoms (N, O, S, etc.)
- Degree of unsaturation: DBE = (2C + 2 + N - H) / 2
Step 2: Identify Available Experiments
for dir in */; do
if [ -f "$dir/acqus" ]; then
nuc=$(grep "##\$NUC1=" "$dir/acqus" | head -1)
pp=$(grep "##\$PULPROG=" "$dir/acqus" | head -1)
echo "Exp $dir: $nuc | $pp"
fi
done
Map experiments:
- 1H:
zg*
- 13C:
zgdc*, zgpg*
- DEPT:
dept*
- HSQC:
hsqc*
- HMBC:
hmbc*
- COSY:
cosy*
Step 3: Analyze Symmetry
Compare expected vs observed signals:
lucy analyze symmetry <data_dir> <formula>
Or manually:
- Count peaks in 13C spectrum
- Compare to carbons in formula
- If observed < expected → molecule has symmetry
Document:
## Symmetry Analysis
- Expected carbons (from formula): X
- Observed 13C signals: Y
- Interpretation: [No symmetry / C2 symmetry / etc.]
Step 4: Pick 13C Peaks
lucy pick 1d <13c_experiment>
Or from peaklist.xml if binary data is poor:
- Extract F1 values from
<Peak1D F1="..."/> tags
- List all carbon shifts
Document all peaks with proposed assignments:
| # | Shift (ppm) | Type (if known) |
|---|
| 1 | 187.8 | Carbonyl? |
| 2 | 152.5 | C-N? |
| ... | ... | ... |
Step 5: Pick HSQC Peaks
Get raw HSQC peaks:
lucy pick hsqc <hsqc_exp> --format json
Apply DEPT-guided filtering (see skill/SKILL.md Section 3):
- Pick DEPT-135 peaks:
lucy pick 1d <dept135_exp> --format json
- Match HSQC carbon positions to DEPT carbons within ±1.5 ppm
- Extract multiplicities from DEPT sign (positive = CH/CH3, negative = CH2)
- If DEPT-90 available, disambiguate CH vs CH3
Document:
- Which carbons are protonated (have HSQC signals)
- Which are quaternary (no HSQC signal)
- Multiplicities (CH, CH2, CH3)
Step 6: Pick HMBC Peaks
Get raw HMBC peaks:
lucy pick hmbc <hmbc_exp> --format json
Apply cross-validation filtering (see skill/SKILL.md Section 3):
- Validate each HMBC peak:
- Carbon position exists in 13C peaks (±1.5 ppm)
- Proton position exists in HSQC peaks (±0.1 ppm)
- Retain only validated correlations
Document all HMBC correlations:
| Carbon (ppm) | Proton (ppm) | Notes |
|---|
| 187.8 | 7.5 | Carbonyl to aromatic H |
| ... | ... | ... |
Step 7: Generate LSD Input
Write the LSD file directly using skill knowledge:
Reference:
- skill/SKILL.md Section 6 (LSD Reference)
- skill/diagnostic/SKILL.md Section 1 (LSD Command Reference)
Build the LSD file manually:
; LSD input for <FORMULA>
; Atom definitions (MULT atom# element hybridization H-count)
MULT 1 C 2 0 ; Carbonyl carbon, sp2, 0H (quaternary)
MULT 2 C 2 1 ; Aromatic CH, sp2, 1H
MULT 3 N 3 1 ; Amine nitrogen, sp3, 1H (NH)
MULT 4 O 2 0 ; Carbonyl oxygen, sp2, 0H
...
; HSQC correlations (MUST come before HMBC)
HSQC 2 2 ; C2 has H2 attached
HSQC 5 5 ; C5 has H5 attached
...
; HMBC correlations
HMBC 1 2 ; C1 correlates to H2
HMBC 1 5 ; C1 correlates to H5
...
; Heteroatom constraints (optional but helpful)
BOND 1 4 ; C1 bonded to O4 (carbonyl)
Critical checks before running:
Step 7b: Iterative HMBC Addition (Minimize Solutions)
CRITICAL: Do NOT add all HMBC correlations at once!
Adding too many HMBC correlations often leads to 0 solutions (over-constrained) due to:
- Noise artifacts in the HMBC spectrum
- Long-range correlations (⁴J+) that exceed LSD's default 2-3 bond assumption
- Overlapping or incorrectly assigned peaks
Strategy: Gradually add HMBC correlations until solutions reach a minimum > 0
- Start with high-confidence correlations only (5-7 strongest peaks)
- Run LSD and check solution count
- Add 1-2 more correlations at a time
- Re-run LSD after each addition
- Stop when solutions are minimized but still > 0
Workflow example:
cp compound_base.lsd compound_test.lsd
lsd compound_test.lsd 2>&1 | grep solution
echo "HMBC 4 9" >> compound_test.lsd
lsd compound_test.lsd 2>&1 | grep solution
echo "HMBC 5 9" >> compound_test.lsd
lsd compound_test.lsd 2>&1 | grep solution
Tracking table (recommended):
| HMBC Count | Correlations Added | Solutions | Action |
|---|
| 5 | Base set | 47 | Add more |
| 7 | + C1→H7, C2→H10 | 12 | Add more |
| 8 | + C8→H10 | 6 | Add more |
| 9 | + C6→H9 | 6 | Add more |
| 10 | + C4→H9 | 5 | Add more |
| 11 | + C5→H9 | 1 | STOP - Ideal! |
| 12 | + C3→H4 | 0 | Remove last |
Key principles:
- Ideal: 1 solution — uniquely determined structure
- Acceptable: 2-10 solutions — can rank by 13C prediction
- 0 solutions — over-constrained, remove last correlation(s)
- Never use ELIM to "fix" 0 solutions — it masks the real problem
Prioritize correlations by:
- Intensity (stronger peaks are more reliable)
- Proximity to known fragment assignments
- Correlations that connect unassigned regions
Step 7c: Write Progress Checkpoint (CASE-PROGRESS.md)
After EVERY LSD iteration (including the baseline run), append an iteration entry to CASE-PROGRESS.md in the compound's working directory. This file is read by the supervisor agent to monitor progress, detect loops, and provide diagnostic guidance.
First iteration: Create the file with header section:
# CASE Progress Log
**Compound:** <compound_path>
**Formula:** <molecular_formula>
**Started:** <timestamp>
Each iteration: Append a new section:
---
## Iteration N: <brief description>
**Time:** <timestamp>
**LSD file:** <filename>.lsd
**Solution count:** <count>
**Constraints added:**
- <constraint and reasoning>
**Constraints removed:**
- <constraint and reasoning> (or "None")
**Why:** <natural language explanation of strategy for this iteration>
**Constraint effectiveness:** <% reduction from previous, or "baseline", or "over-constrained (0 solutions)">
**Confidence:** <qualitative assessment: too many solutions / converging / stuck / etc.>
**HMBC correlations used:** X/Y
**Notes:**
- sp2 count: <N> (<even/odd>) <check/warning>
- H budget: <matches/mismatch>
- <other observations>
Rules:
- NEVER overwrite the file — always append new iteration sections
- Include ALL required fields in every iteration entry
- The "Why" field must explain reasoning, not just state what was done
- The "Constraints added/removed" must list each constraint individually with reasoning
- If recovering from 0 solutions, document which correlations were removed and why
For the complete format specification with examples, see skill/supervisor/SKILL.md Section 7.
Step 8: Run LSD Solver
lucy lsd run compound.lsd
Or directly:
LSD compound.lsd
For solution count interpretation and troubleshooting, see skill/SKILL.md Section 5 (LSD Reference).
Step 9: Convert to SMILES
outlsd 5 < compound.sol > solutions.smi
Step 10: Rank Solutions
lucy lsd rank solutions.smi --spectrum <13c_exp>
lucy lsd rank solutions.smi --shifts "187.8,152.5,135.7,..."
For MAE score interpretation and ranking guidance, see skill/SKILL.md Section 6 (Ranking and Prediction).
Step 11: Analyze J-Coupling Path Lengths
After solving, use lucy lsd analyze to compute the actual J-coupling path lengths for all HMBC correlations:
lucy lsd analyze compound.sol compound.lsd
This command:
- Parses the OUTLSD section of the .sol file to extract molecular connectivity
- Builds a graph from atom neighbors
- Uses BFS shortest path to compute bonds between carbon and proton-bearing carbon
- Reports nJ = path_length + 1 for each HMBC correlation
Example output:
Solution 2: 9× ²J 11× ³J (all ²J/³J, no ELIM needed)
HMBC Correlations:
-------------------------------------------------------
C# H# C (ppm) Path J-coupling
-------------------------------------------------------
1 7 131.29 1 ²J_CH
1 10 131.29 1 ²J_CH
2 7 124.71 2 ³J_CH
...
Interpretation:
- All ²J/³J correlations: Structure is consistent with standard HMBC without ELIM
- Contains ⁴J+ correlations: May explain why ELIM was needed
JSON output for PDF generation:
lucy lsd analyze compound.sol compound.lsd --format json > analysis/j_coupling.json
Generate structure images with LSD atom numbering:
lucy lsd analyze compound.sol compound.lsd --draw solution_{n}.png
This generates a 2D structure image where each atom is labeled with its LSD index (C1, C2, ..., O11), making the HMBC table directly readable against the structure.
Generate publication-quality correlation diagrams with arrows:
For visualizing HMBC correlations directly on the structure with curved arrows and J-coupling labels:
lucy visualize correlations \
--sol compound.sol \
--lsd-file compound.lsd \
--show-atom-numbers \
--show-j-coupling \
-o analysis/hmbc_diagram.svg
This creates a publication-quality SVG diagram showing:
- Clean 2D structure (from the solved .sol file)
- Red atom number annotations positioned away from the structure
- Curved HMBC arrows connecting correlating atoms
- ²J/³J labels on arrows indicating coupling path length
Include the correlation diagram next to the HMBC table in your PDF report - it provides an immediate visual representation of how the HMBC correlations connect the molecular fragments.
Step 12: Report Results
## CASE Results
**Molecular Formula:** [formula]
**Degree of Unsaturation:** [DBE]
### Data Used
- 13C: [X] signals
- HSQC: [Y] correlations (Z protonated carbons)
- HMBC: [N] correlations
- Symmetry: [description]
### LSD Results
- Solutions found: [count]
- ELIM used: [Yes/No]
### Top Candidates
**Rank 1:** MAE = X.XX ppm ([Quality])
[SMILES]
- Key features: [description]
**Rank 2:** MAE = X.XX ppm ([Quality])
[SMILES]
- Differs from #1 in: [description]
### Confidence Assessment
[High/Medium/Low] - [reasoning]
### Recommendation
[Final structure proposal or need for additional data]
Step 13: Generate PDF Report
Always generate a PDF report with rendered structures and formatted tables at the end of every CASE analysis.
python3 << 'EOF'
from rdkit import Chem
from rdkit.Chem import Draw, AllChem
from reportlab.lib import colors
from reportlab.lib.pagesizes import A4
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from reportlab.lib.units import inch
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Image, Table, TableStyle
from reportlab.lib.enums import TA_CENTER
import io
doc = SimpleDocTemplate(
"analysis/CASE_Report.pdf",
pagesize=A4,
rightMargin=0.75*inch,
leftMargin=0.75*inch,
topMargin=0.75*inch,
bottomMargin=0.75*inch
)
styles = getSampleStyleSheet()
title_style = ParagraphStyle('CustomTitle', parent=styles['Heading1'],
fontSize=20, spaceAfter=30, alignment=TA_CENTER)
heading_style = ParagraphStyle('CustomHeading', parent=styles['Heading2'],
fontSize=14, spaceBefore=20, spaceAfter=10)
normal_style = styles['Normal']
story = []
story.append(Paragraph("CASE Structure Elucidation Report", title_style))
story.append(Spacer(1, 0.25*inch))
story.append(Paragraph("Summary", heading_style))
summary_data = [
["Molecular Formula", "<FORMULA>"],
["Molecular Weight", "<MW> Da"],
["Degree of Unsaturation (DBE)", "<DBE>"],
["LSD Solutions Found", "<COUNT>"],
]
summary_table = Table(summary_data, colWidths=[2.5*inch, 3*inch])
summary_table.setStyle(TableStyle([
('BACKGROUND', (0, 0), (0, -1), colors.lightgrey),
('FONTNAME', (0, 0), (0, -1), 'Helvetica-Bold'),
('GRID', (0, 0), (-1, -1), 0.5, colors.grey),
('PADDING', (0, 0), (-1, -1), 8),
]))
story.append(summary_table)
story.append(Spacer(1, 0.3*inch))
story.append(Paragraph("13C NMR Data", heading_style))
c13_data = [
["#", "Shift (ppm)", "Multiplicity", "Assignment"],
]
c13_table = Table(c13_data, colWidths=[0.4*inch, 1.2*inch, 1.2*inch, 2.5*inch])
c13_table.setStyle(TableStyle([
('BACKGROUND', (0, 0), (-1, 0), colors.HexColor('#4472C4')),
('TEXTCOLOR', (0, 0), (-1, 0), colors.white),
('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
('ALIGN', (0, 0), (-1, -1), 'CENTER'),
('GRID', (0, 0), (-1, -1), 0.5, colors.grey),
('PADDING', (0, 0), (-1, -1), 6),
]))
story.append(c13_table)
story.append(Spacer(1, 0.3*inch))
def smiles_to_image(smiles, size=(400, 300)):
mol = Chem.MolFromSmiles(smiles)
AllChem.Compute2DCoords(mol)
img = Draw.MolToImage(mol, size=size)
img_buffer = io.BytesIO()
img.save(img_buffer, format='PNG')
img_buffer.seek(0)
return img_buffer
story.append(Paragraph("Structure Candidates", heading_style))
story.append(Paragraph("Ranking Comparison", heading_style))
rank_data = [
["Rank", "Structure", "MAE (ppm)", "Quality", "Within 3ppm"],
]
rank_table = Table(rank_data, colWidths=[0.5*inch, 2.5*inch, 1*inch, 0.8*inch, 1*inch])
rank_table.setStyle(TableStyle([
('BACKGROUND', (0, 0), (-1, 0), colors.HexColor('#4472C4')),
('TEXTCOLOR', (0, 0), (-1, 0), colors.white),
('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
('ALIGN', (0, 0), (-1, -1), 'CENTER'),
('GRID', (0, 0), (-1, -1), 0.5, colors.grey),
('PADDING', (0, 0), (-1, -1), 6),
]))
story.append(rank_table)
doc.build(story)
print("PDF report generated: analysis/CASE_Report.pdf")
EOF
CRITICAL: Use data from the successful analysis
Do NOT re-pick peaks for the PDF. Extract all data directly from the LSD file that produced successful solutions. The LSD file contains the exact peaks and correlations that were used.
The PDF report must include complete tables of ALL data used:
-
Summary table — formula, MW, DBE, solution count, recommended structure
-
Complete 13C NMR table — ALL carbons used in the LSD file:
- Carbon number (C1, C2, ...)
- Chemical shift (ppm)
- Multiplicity (C, CH, CH2, CH3) from DEPT
- Hybridization (sp2/sp3)
- H-count
- Assignment/interpretation
-
Complete HSQC table — ALL direct C-H correlations from the LSD file:
- Every HSQC command in the LSD file becomes a row
- Include carbon identity, shift, multiplicity, and proton chemical shift if known
-
HMBC Correlation Diagram (placed ABOVE the HMBC table):
- Generate the diagram FIRST before the HMBC table:
lucy visualize correlations --sol compound.sol --lsd-file compound.lsd \
--show-atom-numbers -o analysis/hmbc_diagram.svg
- Convert SVG to PNG for ReportLab embedding:
import cairosvg
cairosvg.svg2png(url='analysis/hmbc_diagram.svg',
write_to='analysis/hmbc_diagram.png', scale=2.0)
- The diagram shows:
- Clean 2D structure with explicit atom labels (C, H, O)
- Red curved arrows connecting HMBC-correlating atoms
- Atom numbers matching the LSD file numbering
- Optimized layout to avoid overlaps between arrows and labels
- Include as a centered Image in the PDF, full page width (~6 inches)
-
Complete HMBC table (placed BELOW the diagram) — ALL long-range correlations from the LSD file:
-
Excluded signals section — Document WHY certain peaks were not used:
- Solvent peaks (e.g., CDCl3 at 77 ppm)
- Noise/artifacts
- Duplicate signals from overlapping peaks
- Signals that couldn't be assigned confidently
-
Structure candidates — Rendered 2D images (RDKit) with SMILES and MAE scores
-
Ranking comparison table — All candidates with MAE, quality rating, carbons within tolerance
-
Recommended structure — Larger image with SMILES and InChI, plus reasoning if not Rank #1
Required dependencies:
CRITICAL: Install missing dependencies - do NOT fall back to suboptimal solutions (like text placeholders instead of images).
pip install reportlab
pip install cairosvg
brew install cairo
Before generating the PDF, verify all dependencies are working:
from reportlab.platypus import SimpleDocTemplate
from rdkit import Chem
from rdkit.Chem import Draw
import cairosvg
If cairosvg import fails with "no library called cairo", install the system Cairo library as shown above.
Troubleshooting
For detailed troubleshooting guidance, see skill/SKILL.md Section 5 (LSD Reference) and Section 6 (Ranking and Prediction).
Quick checklist for 0 solutions: sp2 count is EVEN, hydrogen count matches formula, HMBC correlations correct, only then try ELIM 1 0.
Quick Reference
mkdir -p analysis
lucy pick 1d ./2
lucy pick hsqc ./5 ./3 --dept90 ./4
lucy pick hmbc ./6 ./2 ./5 --dept135 ./3
lucy lsd generate . C16H10N2O2 -o analysis/compound.lsd
cd analysis && LSD compound.lsd
outlsd 5 < compound.sol > solutions.smi
lucy lsd rank solutions.smi --spectrum ../2
lucy lsd analyze compound.sol compound.lsd --draw structure_{n}.png
IMPORTANT: Always generate a PDF report at the end of every CASE analysis (Step 13).