Run any Skill in Manus with one click

$pwd:

phototransduction

Name: Phototransduction
Author: vivesca

// Reconstruct documents from photos/screenshots as structured markdown. Use when user shares photos, screenshots, or images of slides/papers/emails/screens and wants them captured as text. Common triggers — "OCR this", "reconstruct from photos", "transcribe slides", "photo to markdown", "took photos", "took a photo", "synced to iCloud", "photos on my phone", "screenshot of an email", "took screenshots", "photos in iCloud Photos", "captured an email/slide/page", "process the photos I just took".

Run Skill in Manus

$ git log --oneline --stat

stars:0

forks:0

updated:May 6, 2026 at 10:36

SKILL.md

readonly

package.json

"author": "vivesca"

"repository": "vivesca/vivesca"

View GitHub Repository

$ install --globalskills.sh

$ download --local

Run Skill in Manus

[HINT] Download the complete skill directory including SKILL.md and all related files

Run any Skill with one click

name	phototransduction
description	Reconstruct documents from photos/screenshots as structured markdown. Use when user shares photos, screenshots, or images of slides/papers/emails/screens and wants them captured as text. Common triggers — "OCR this", "reconstruct from photos", "transcribe slides", "photo to markdown", "took photos", "took a photo", "synced to iCloud", "photos on my phone", "screenshot of an email", "took screenshots", "photos in iCloud Photos", "captured an email/slide/page", "process the photos I just took".
triggers	["photos of documents","photos of email","screenshots of email","OCR","reconstruct from photos","transcribe slides","photo to markdown","document from camera","took photos","took a photo","synced to iCloud","iCloud Photos","photos on my phone","screenshots","captured an email","capture verbatim from photo"]
tools	["rhodopsin.py","Read","Write","Bash","Edit"]
epistemics	[]

Phototransduction

Convert photos of documents (slides, papers, screens) into structured markdown with frontmatter.

Biology: phototransduction converts absorbed photons into molecular signals. Here: photos → structured internal text.

When to use

User took photos of a presentation, Word doc, or screen
User wants document content captured as searchable, linkable markdown
User says "OCR", "reconstruct", "transcribe slides", "photo to markdown"

Procedure

1. Acquire photos

# From macOS Photos by date
rhodopsin.py today
rhodopsin.py date 2026-04-18
rhodopsin.py recent 30

# Export as JPEG (handles HEIC conversion)
rhodopsin.py export UUID1 UUID2 UUID3...

# PREFER for doc photo audits — forces iCloud download + numbers sequentially
rhodopsin.py batch YYYY-MM-DD HH:MM HH:MM

Prefer batch DATE HH:MM HH:MM over selective export UUID... for doc photo audits. Selective export silently skips iCloud-only photos (warns [!] not available locally); batch forces sync of the full time range and exports all photos numbered sequentially. Slot 67 (8 May 2026) failure: selective export pulled 14 of 23 photos; the missing IMG_3936 contained a new §9 Data-led governance section that wrong claim ("§9 doesn't exist") was made against, before user pushed for full re-export. Coverage-sensitive audits = batch; targeted single-photo lookup = export.

If photos are on the local machine already (e.g., /tmp/), skip to step 2.

If accessing via SSH to Mac, use rhodopsin.py on the Mac side or AppleScript export.

2. Auto-rotate (CRITICAL — EXIF is not enough)

Always rotate images to correct orientation before reading. This was the single biggest source of transcription errors — reading rotated text produces garbled output that looks plausible but is substantially wrong.

EXIF rotation is necessary but NOT sufficient. When the source document was displayed sideways on screen (e.g., Word/PDF page rotated within the viewer) and then photographed in landscape, the EXIF says "normal" but the content is sideways. sips --rotate 0 will not fix this — it only applies EXIF metadata.

Mandatory workflow for doc photos:

# Step 1 — apply EXIF first
sips -r 0 image.jpg

# Step 2 — verify content orientation by reading ONE image
# If the doc text appears sideways (90° or 270°), rotate 90° clockwise OR counter-clockwise
sips -r 90 image.jpg --out image_cw.jpg
sips -r -90 image.jpg --out image_ccw.jpg

# Step 3 — Read both rotations, pick the one with upright text
# Don't transcribe from a rotated photo. Stop and rotate first.

# Batch fallback (Linux):
for f in *.jpg; do convert "$f" -auto-orient "$f"; done

# Soma fallback — neither sips nor ImageMagick (`convert`/`magick`) are
# installed on the Fly host. Use Python + PIL via the germline venv. PIL is
# NOT preinstalled by default — first use on a fresh soma needs:
#   ~/germline/.venv/bin/pip install pillow
# Then invoke explicitly with the venv interpreter (system python3 has no PIL):
~/germline/.venv/bin/python3 -c "
from PIL import Image
for f in ['IMG_001.jpg', 'IMG_002.jpg']:
    img = Image.open(f)
    img.rotate(90, expand=True).save(f.replace('.jpg','_ccw.jpg'))
    img.rotate(-90, expand=True).save(f.replace('.jpg','_cw.jpg'))
"
# Then Read both _cw and _ccw, pick whichever has upright text.
# PIL's positive-degrees rotates COUNTER-clockwise; expand=True preserves
# the rotated bounding box rather than cropping.

Stop rule: if you cannot read the document text upright after EXIF rotation, do NOT transcribe. Rotate +90° and -90°, save both, pick the readable version. This is mandatory, not optional.

Lesson from 2026-04-25: A 12-photo OpCo paper transcription produced material errors on multiple bullets (Sponsor comments, Value Streams, RMG&R) because the photos were of a sideways-displayed Word doc. EXIF said "normal", content was 90° off. Single-pass model vision produced fluent reconstructions that read like the source but weren't. Rotation to upright fixed all of them.

3. Identify document boundaries

Photos may cover multiple documents. Before reading, scan all images to identify clusters:

Check timestamps — bursts with gaps indicate different documents
Check visual style — different templates, orientations, or formats
Note page numbers if visible (e.g., "p 3 of 7")

Group photos by document. Process each document separately.

4. Read and reconstruct

Read images in order. For each page:

Read the image with the Read tool
Transcribe ALL visible text — don't paraphrase or summarize
Preserve structure: headings, bullets, tables, numbered lists
Note anything unclear with [?] markers
If text is dense or hard to read, flag for Apple OCR verification

Time-zone label cluster (mandatory for invite/calendar/scheduling screenshots). Outlook, Google Calendar, and similar tools display both the auto-adjusted local time AND a meta-line stating the original creation timezone (e.g., "This meeting has been adjusted to reflect your current time zone. It was initially created in the following time zone: (UTC+00:00) Dublin, Edinburgh, Lisbon, London"). The displayed time is the viewer's local time, NOT the creator's timezone. Single-pass label-skipping that asserts the displayed time is in the original timezone is the failure shape (Slot 29 instance, 2026-05-04). DO: read every timezone-related label in the field block — both the displayed time and any "auto-adjusted / originally created in / reflect your current time zone" hints — before asserting any time. DO NOT: derive timezone from one label when a sibling label flips the interpretation.

Key lesson: Dense rotated text is where errors concentrate. If photos were taken at an angle or the document was displayed sideways on screen, even after rotation the text quality may be poor. Flag these pages for user verification via Apple Live Text (camera OCR on iPhone/iPad is more accurate than model vision on rotated photos).

5. Structure as markdown

Write the reconstructed document with frontmatter:

---
title: "Document Title (reconstruction)"
date: YYYY-MM-DD
type: deliverable|reference
author: Original Author
source: Photos taken YYYY-MM-DD HH:MM TZ
original_format: Word document|PowerPoint deck, N pages/slides
status: reconstructed
pii: false
tags: [relevant, tags]
---

# Document Title

[reconstructed content]

---

**Related:**
- [[linked-document-1]] — relationship
- [[linked-document-2]] — relationship

6. Verify with Apple OCR

For any page where confidence is low (rotated text, dense paragraphs, small fonts):

Ask the user to open the photo on their iPhone/iPad
Use Apple Live Text (long-press on text in Photos app) to copy the text
User pastes the OCR text into the conversation
Compare against reconstruction and fix discrepancies

This step is not optional for dense text pages. Model vision on rotated/angled photos produces plausible but wrong text — errors that look like bad writing rather than bad OCR.

7. Save and interlink

Save to ~/epigenome/chromatin/immunity/ (private, not public)
Add frontmatter with source provenance
Add **Related:** wikilinks to connected documents
Commit to epigenome repo and push

Anti-patterns (learned 2026-04-18)

Never read rotated images without rotating first. The model produces fluent-sounding but wrong text. "Phishing is a single system" was actually "What is missing is the operating model."
Never assume transcription errors are the author's writing problems. Review comments based on bad OCR waste everyone's time. Verify before critiquing.
Never substitute numbers from other sources. "86 in pilot, 1,319 in ideation" came from a different document — the actual text said "66 in pilot, 162 in POC." Keep transcription and analysis separate.
Check for missing pages. Count page numbers if visible. Compare photo count vs page count. One index off = one missing page.
Duplicate photos of the same page exist. Different angles, zoom levels. Don't double-count as separate pages.

CLI enhancement needed

rhodopsin.py export should auto-rotate based on EXIF orientation. Currently it converts HEIC→JPEG but doesn't fix rotation. Add sips --rotate 0 (which applies EXIF metadata) after conversion.

name	phototransduction
description	Reconstruct documents from photos/screenshots as structured markdown. Use when user shares photos, screenshots, or images of slides/papers/emails/screens and wants them captured as text. Common triggers — "OCR this", "reconstruct from photos", "transcribe slides", "photo to markdown", "took photos", "took a photo", "synced to iCloud", "photos on my phone", "screenshot of an email", "took screenshots", "photos in iCloud Photos", "captured an email/slide/page", "process the photos I just took".
triggers	["photos of documents","photos of email","screenshots of email","OCR","reconstruct from photos","transcribe slides","photo to markdown","document from camera","took photos","took a photo","synced to iCloud","iCloud Photos","photos on my phone","screenshots","captured an email","capture verbatim from photo"]
tools	["rhodopsin.py","Read","Write","Bash","Edit"]
epistemics	[]