Convert Excel calculation spreadsheets to Python code — extract formulas, build dependency graphs, generate pytest tests using cell values as assertions, and produce dark-intelligence archive YAMLs.
Installation
Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.
Porting engineering calculations from Excel to Python
Extracting calculation methodology from legacy spreadsheets
Building dark-intelligence archives from XLSX files
Any time a spreadsheet contains formulas that should become code
Pre-Conversion Assembly: Multi-Source Workbook Transfer
Before converting, you need all target workbooks collected into a single git repo (typically client-c) that can be transferred to the Windows machine running Codex Desktop.
Step 0: Inventory and Rank
Scan all sources for .xlsx, .xls, .xlsm files (case-insensitive, exclude ~$* temp files):
Read sheets/formulas of each candidate to estimate complexity (use openpyxl to list sheet names, count formulas, detect cross-sheet refs)
Rank by: GTM value, reusability across projects, complexity (Low/Med/High sheets), and estimated token cost (Low ~500K, Med ~1.5M, High ~3-5M tokens)
Create a tracking document with: INVENTORY -> ANALYZED -> CONVERTED -> VERIFIED status per workbook
Step 1: Copy into client-c Repo
The client-c repo is the transfer vehicle to Windows. Workbooks are scattered across /mnt/ace/ (raw workspace) and workspace-hub sub-repos. Use rsync to copy only Excel files while preserving directory structure:
# Use rsync -- preserve directory tree, copy ONLY xlsx/xls/xlsm
rsync -av --include='*/' --include='*.xlsx' --include='*.xls' --include='*.xlsm' --exclude='*' \
/mnt/ace/client-b/s7/ballymore/ client-c/engineering_workbooks/ballymore/
Key findings:
client-c .gitattributes marks *.xlsx, *.xls, *.xlsm as binary (not LFS). Large repos will grow proportionally to total file size.
/mnt/ace/ is the raw workspace where files are physically present — workspace-hub sub-repos may have sparse overlays where xlsx files are on-disk but not git-tracked.
The execution machine is ws014 (Windows). Transfer via:
git clone git@github.com:vamseeachanta/client-c.git # on ws014
The conversion prompt runs in Codex on ws014 — NOT the Copilot in Excel add-in and NOT Cowork. Copilot in Excel can only read cell values and explain formulas; it cannot write Python files, create tests, or organize code into repos. Codex has full filesystem access and can use openpyxl to read Excel files, extract formula logic, write Python modules, and create PRs.
Step 5b: Large File Bypass (if needed)
If git hooks block files > 5MB:
git commit --no-verify # bypasses size check hooks in client-c
This is safe for intentional Excel workbook staging in engineering_workbooks/.
Traceability: Name Python modules after the source workbook, include link to workbook in docstring
Tests: Assert outputs match original spreadsheet cell values
Commit: From within the target repo directory (per workspace-hub convention)
Update registry: Mark workbook as CONVERTED and VERIFIED with link to PR
Conversion Quality Requirements
Established quality bar from first conversion (Ballymore Jumper, 7 sheets, 2.3MB):
Minimum Bar (must achieve per workbook)
Tests: All must pass with zero failures. Codex-in-Excel commonly produces buggy code that needs fixing.
Common bugs to check:
Missing return props in factory functions (the pattern if props is None: props = ClassName() without returning is the most common bug)
Wrong sys.path in test files (pointing to /tmp or hardcoded Windows paths)
Functions that create default instances but don't use them
Coverage scope: Every sheet must have at least one test class covering its calculations
Formula fidelity: Each test assertion must include cell reference comment (e.g., # Bare pipe!H4 = PI()*(E4²-E5²)/4*7850)
Type hints: All function signatures must have types
Constants: Every magic number must be a named constant with comment
Codex-in-Excel vs Native CLI Comparison
Aspect
Windows Codex
Linux openpyxl
Completeness
Typically more thorough (24 functions vs 7 for Ballymore)
Adequate but may miss edge cases
Test coverage
Higher test count (81 vs 53 for Ballymore)
Solid but less comprehensive
Documentation
Includes architecture diagrams, data flow graphs
Basic README
OrcaFlex output
Produces full line-type section breakdown
May skip
COG calculations
Both insulated + uninsulated variants
Often skipped
Code quality
More bugs (5-16 of 81 tests fail before fixes)
Cleaner on first run
Usability
Code may be trapped in Excel cells, needs extraction
Immediately runnable .py files
Recommendation: Run conversion on ws014 (Windows) using Codex Desktop cowork.
The quality advantage (24 vs 7 functions, 81 vs 53 tests, COG, full OrcaFlex breakdown)
outweighs the 10-20% failure rate which is fixable with the known bug list below.
Linux gives clean but less complete code.
Known Bugs in Windows Cowork Output (fix before accepting)
Buggy pattern #1 — most common (16/81 failures in Ballymore):
Fix: Convert to pytest. Use this conversion guide:
self.assertAlmostEqual(a, b, places=N) -> assert a == pytest.approx(b, abs=1e-N)
self.assertEqual(a, b) -> assert a == b
self.assertTrue(x) -> assert x
self.assertGreater(a, b) -> assert a > b
self.assertLess(a, b) -> assert a < b
self.assertIn(a, b) -> assert a in b
class TestX(unittest.TestCase): -> class TestX:
def setUp(self): -> def setup_method(self):
Buggy pattern #4:
Code output as Excel cell text (column A of new sheet) instead of .py file
Extract with: open workbook, read cell values from that sheet column A, write to .py file
Preferred Architecture Pattern
For each workbook, produce:
# Dataclasses with separate input/property/result separation@dataclassclassBarePipeInputs:
od_in: float = 10.75
wt_in: float = 1.79
bend_radius_in: float = 50.0
insul_od_in: float = 16.75
insul_density_lb_ft3: float = 61.1@dataclassclassPipeProperties:
# All computed properties, populated by calculate_* functions
od_m: float = 0.0# ...
This pattern avoids the __post_init__ trap where derived fields auto-compute but tests can't verify intermediate steps. Keep functions separate from data.
Integration with digitalmodel
After conversion, integrate into the digitalmodel repo: