一键在 Manus 中运行任何 Skill

daily-sourcelibrary

星标1

分支0

更新时间2026年2月11日 11:28

Batch OCR and translation for Source Library using Gemini Batch API (50% cheaper). Process books from the roadmap queue.

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

JDerekLomas

JDerekLomas/claude-code-skills

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

Source Library Batch Processing

Process historical Latin texts using Gemini Batch API for 50% cost savings.

Current Status (2025-12-27)

6 books complete:

Euclid (Elements) - 278 pages
Plato (Complete Works) - 718 pages
Ptolemy (Cosmographia) - 276 pages
Boethius (Consolation of Philosophy) - 294 pages
Plotinus (Enneads) - 544 pages
Copernicus (De revolutionibus) - 422 pages

Results at: ~/translate/data/pipeline/results/

Quick Commands

cd ~/translate && source .venv/bin/activate && set -a && source .env && set +a

Check batch job status on Gemini

python3 -c "
from pathlib import Path
import json
from google import genai
import os
client = genai.Client(api_key=os.environ['GEMINI_API_KEY'])

for f in sorted(Path('data/pipeline/batch_jobs').glob('batch_*.json'))[-10:]:
    if '_keys' in f.name: continue
    data = json.loads(f.read_text())
    gname = data.get('gemini_job_name')
    if gname:
        r = client.batches.get(name=gname)
        status = str(r.state).replace('JobState.JOB_STATE_', '')
        print(f\"{data['book_id'][:25]:<25} {data['job_type']:<8} {status}\")
"

Check local results

python3 -c "
from pathlib import Path
for d in sorted(Path('data/pipeline/results').iterdir()):
    if d.is_dir():
        ocr = len(list(d.glob('ocr_*.md')))
        trans = len(list(d.glob('trans_*.md')))
        print(f'{d.name}: {ocr} OCR, {trans} Trans')
"

Process next book from roadmap

python -m src.pipeline.daily_run

Process specific book

python -m src.pipeline.daily_run --book <ia_identifier>

Collect results from completed batch

python3 -c "
from google import genai
from pathlib import Path
import json, os
client = genai.Client(api_key=os.environ['GEMINI_API_KEY'])

job_id = 'batch_YYYYMMDD_HHMMSS'  # Replace with actual job ID
job_file = Path(f'data/pipeline/batch_jobs/{job_id}.json')
data = json.loads(job_file.read_text())
result = client.batches.get(name=data['gemini_job_name'])

keys_file = Path(f'data/pipeline/batch_jobs/{job_id}_keys.json')
keys = json.loads(keys_file.read_text())
results_dir = Path(f'data/pipeline/results/{data[\"book_id\"]}')
results_dir.mkdir(parents=True, exist_ok=True)

for i, resp in enumerate(result.dest.inlined_responses):
    if i < len(keys) and resp.response and resp.response.candidates:
        text = resp.response.candidates[0].content.parts[0].text
        (results_dir / f'{keys[i]}.md').write_text(text)
        print(f'Collected {keys[i]}')
"

Roadmap Queue

Next books to process (from data/roadmap.json):

Ptolemy - Almagest (need non-encrypted source)
Apollonius - Conics
Ficino - Platonic Theology
Pico - Oration on Dignity of Man
Corpus Hermeticum
Kepler - Mysterium Cosmographicum
Galileo - Sidereus Nuncius

Pipeline Architecture

Internet Archive  -->  PDF Download  -->  PNG Extraction
                                              |
                                              v
                       Split Detection (pages 10, 20, 25)
                                              |
                          Single pages?  -----+
                              |               |
                              v               v
                      OCR Batch Submit    Flag for manual split
                              |
                              v
                     Translation Batch
                              |
                              v
                      Collect Results
                              |
                              v
                    ~/translate/data/pipeline/results/

Cost Savings

Gemini Batch API: 50% cheaper than real-time

No rate limits
1-24 hour turnaround
Results retained ~30 days

Troubleshooting

Proxy errors

unset ALL_PROXY HTTP_PROXY HTTPS_PROXY

Missing API key

set -a && source .env && set +a

Large book (>1GB payload)

Split into multiple batches of ~180 pages each.

同仓库更多 Skills

同仓库

codevibing

JDerekLomas/claude-code-skills

Share to codevibing.com - the social network for Claude Code users. Zero friction posting, heartbeats, friends.

2026-05-111

html-to-svg

JDerekLomas/claude-code-skills

Convert rendered HTML/CSS to outlined SVG vectors. Renders with Puppeteer at high resolution, traces with potrace to produce clean vector paths. Use when asked to create an SVG logo, convert text to outlines, vectorize a component, or export HTML as a vector graphic.

2026-03-041

visual-compare

JDerekLomas/claude-code-skills

Visual comparison of a reference app against a replica build. Screenshots both via Chrome DevTools, uses Claude vision to identify differences, generates gap reports. Use when asked to compare, audit visuals, check replication fidelity, or run a visual diff.

2026-03-041

card-deck-creation

JDerekLomas/claude-code-skills

Create themed card decks with AI-generated artwork, Puppeteer rendering, and web deployment. Use when asked to make a card deck, playing cards, tarot deck, or similar card-based content.

2026-02-111

codevibing

JDerekLomas/claude-code-skills

Share to codevibing.com - the social network for Claude Code users. Zero friction posting, heartbeats, friends.

2026-02-111

input

JDerekLomas/claude-code-skills

Human-in-the-loop feedback tools for reviewing AI output. Use when asked to review a site, get design feedback, check generated images, or review AI content. Commands include /input check (see feedback), /input apply (apply edits), /input clear (reset), and /input page setup (add widget to project).

2026-02-111

name	daily-sourcelibrary
description	Batch OCR and translation for Source Library using Gemini Batch API (50% cheaper). Process books from the roadmap queue.

Source Library Batch Processing

Process historical Latin texts using Gemini Batch API for 50% cost savings.

Current Status (2025-12-27)

6 books complete:

Euclid (Elements) - 278 pages
Plato (Complete Works) - 718 pages
Ptolemy (Cosmographia) - 276 pages
Boethius (Consolation of Philosophy) - 294 pages
Plotinus (Enneads) - 544 pages
Copernicus (De revolutionibus) - 422 pages

Results at: ~/translate/data/pipeline/results/

Quick Commands

cd ~/translate && source .venv/bin/activate && set -a && source .env && set +a

Check batch job status on Gemini

python3 -c "
from pathlib import Path
import json
from google import genai
import os
client = genai.Client(api_key=os.environ['GEMINI_API_KEY'])

for f in sorted(Path('data/pipeline/batch_jobs').glob('batch_*.json'))[-10:]:
    if '_keys' in f.name: continue
    data = json.loads(f.read_text())
    gname = data.get('gemini_job_name')
    if gname:
        r = client.batches.get(name=gname)
        status = str(r.state).replace('JobState.JOB_STATE_', '')
        print(f\"{data['book_id'][:25]:<25} {data['job_type']:<8} {status}\")
"

Check local results

python3 -c "
from pathlib import Path
for d in sorted(Path('data/pipeline/results').iterdir()):
    if d.is_dir():
        ocr = len(list(d.glob('ocr_*.md')))
        trans = len(list(d.glob('trans_*.md')))
        print(f'{d.name}: {ocr} OCR, {trans} Trans')
"

Process next book from roadmap

python -m src.pipeline.daily_run

Process specific book

python -m src.pipeline.daily_run --book <ia_identifier>

Collect results from completed batch

python3 -c "
from google import genai
from pathlib import Path
import json, os
client = genai.Client(api_key=os.environ['GEMINI_API_KEY'])

job_id = 'batch_YYYYMMDD_HHMMSS'  # Replace with actual job ID
job_file = Path(f'data/pipeline/batch_jobs/{job_id}.json')
data = json.loads(job_file.read_text())
result = client.batches.get(name=data['gemini_job_name'])

keys_file = Path(f'data/pipeline/batch_jobs/{job_id}_keys.json')
keys = json.loads(keys_file.read_text())
results_dir = Path(f'data/pipeline/results/{data[\"book_id\"]}')
results_dir.mkdir(parents=True, exist_ok=True)

for i, resp in enumerate(result.dest.inlined_responses):
    if i < len(keys) and resp.response and resp.response.candidates:
        text = resp.response.candidates[0].content.parts[0].text
        (results_dir / f'{keys[i]}.md').write_text(text)
        print(f'Collected {keys[i]}')
"

Roadmap Queue

Next books to process (from data/roadmap.json):

Ptolemy - Almagest (need non-encrypted source)
Apollonius - Conics
Ficino - Platonic Theology
Pico - Oration on Dignity of Man
Corpus Hermeticum
Kepler - Mysterium Cosmographicum
Galileo - Sidereus Nuncius

Pipeline Architecture

Internet Archive  -->  PDF Download  -->  PNG Extraction
                                              |
                                              v
                       Split Detection (pages 10, 20, 25)
                                              |
                          Single pages?  -----+
                              |               |
                              v               v
                      OCR Batch Submit    Flag for manual split
                              |
                              v
                     Translation Batch
                              |
                              v
                      Collect Results
                              |
                              v
                    ~/translate/data/pipeline/results/

Cost Savings

Gemini Batch API: 50% cheaper than real-time

No rate limits
1-24 hour turnaround
Results retained ~30 days

Troubleshooting

Proxy errors

unset ALL_PROXY HTTP_PROXY HTTPS_PROXY

Missing API key

set -a && source .env && set +a

Large book (>1GB payload)

Split into multiple batches of ~180 pages each.