원클릭으로 Manus에서 모든 스킬 실행

daily-sourcelibrary

스타1

포크0

업데이트2026년 2월 11일 11:28

Batch OCR and translation for Source Library using Gemini Batch API (50% cheaper). Process books from the roadmap queue.

설치

Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.

Manus에서 실행

출처

JDerekLomas

JDerekLomas/claude-code-skills

GitHub 저장소 열기 Creator 저장소 보기

다운로드

Manus에서 실행

Source Library Batch Processing

Process historical Latin texts using Gemini Batch API for 50% cost savings.

Current Status (2025-12-27)

6 books complete:

Euclid (Elements) - 278 pages
Plato (Complete Works) - 718 pages
Ptolemy (Cosmographia) - 276 pages
Boethius (Consolation of Philosophy) - 294 pages
Plotinus (Enneads) - 544 pages
Copernicus (De revolutionibus) - 422 pages

Results at: ~/translate/data/pipeline/results/

Quick Commands

cd ~/translate && source .venv/bin/activate && set -a && source .env && set +a

Check batch job status on Gemini

python3 -c "
from pathlib import Path
import json
from google import genai
import os
client = genai.Client(api_key=os.environ['GEMINI_API_KEY'])

for f in sorted(Path('data/pipeline/batch_jobs').glob('batch_*.json'))[-10:]:
    if '_keys' in f.name: continue
    data = json.loads(f.read_text())
    gname = data.get('gemini_job_name')
    if gname:
        r = client.batches.get(name=gname)
        status = str(r.state).replace('JobState.JOB_STATE_', '')
        print(f\"{data['book_id'][:25]:<25} {data['job_type']:<8} {status}\")
"

Check local results

python3 -c "
from pathlib import Path
for d in sorted(Path('data/pipeline/results').iterdir()):
    if d.is_dir():
        ocr = len(list(d.glob('ocr_*.md')))
        trans = len(list(d.glob('trans_*.md')))
        print(f'{d.name}: {ocr} OCR, {trans} Trans')
"

Process next book from roadmap

python -m src.pipeline.daily_run

Process specific book

python -m src.pipeline.daily_run --book <ia_identifier>

Collect results from completed batch

python3 -c "
from google import genai
from pathlib import Path
import json, os
client = genai.Client(api_key=os.environ['GEMINI_API_KEY'])

job_id = 'batch_YYYYMMDD_HHMMSS'  # Replace with actual job ID
job_file = Path(f'data/pipeline/batch_jobs/{job_id}.json')
data = json.loads(job_file.read_text())
result = client.batches.get(name=data['gemini_job_name'])

keys_file = Path(f'data/pipeline/batch_jobs/{job_id}_keys.json')
keys = json.loads(keys_file.read_text())
results_dir = Path(f'data/pipeline/results/{data[\"book_id\"]}')
results_dir.mkdir(parents=True, exist_ok=True)

for i, resp in enumerate(result.dest.inlined_responses):
    if i < len(keys) and resp.response and resp.response.candidates:
        text = resp.response.candidates[0].content.parts[0].text
        (results_dir / f'{keys[i]}.md').write_text(text)
        print(f'Collected {keys[i]}')
"

Roadmap Queue

Next books to process (from data/roadmap.json):

Ptolemy - Almagest (need non-encrypted source)
Apollonius - Conics
Ficino - Platonic Theology
Pico - Oration on Dignity of Man
Corpus Hermeticum
Kepler - Mysterium Cosmographicum
Galileo - Sidereus Nuncius

Pipeline Architecture

Internet Archive  -->  PDF Download  -->  PNG Extraction
                                              |
                                              v
                       Split Detection (pages 10, 20, 25)
                                              |
                          Single pages?  -----+
                              |               |
                              v               v
                      OCR Batch Submit    Flag for manual split
                              |
                              v
                     Translation Batch
                              |
                              v
                      Collect Results
                              |
                              v
                    ~/translate/data/pipeline/results/

Cost Savings

Gemini Batch API: 50% cheaper than real-time

No rate limits
1-24 hour turnaround
Results retained ~30 days

Troubleshooting

Proxy errors

unset ALL_PROXY HTTP_PROXY HTTPS_PROXY

Missing API key

set -a && source .env && set +a

Large book (>1GB payload)

Split into multiple batches of ~180 pages each.

이 저장소의 다른 Skills

같은 저장소

codevibing

JDerekLomas/claude-code-skills

Share to codevibing.com - the social network for Claude Code users. Zero friction posting, heartbeats, friends.

2026-05-111

html-to-svg

JDerekLomas/claude-code-skills

Convert rendered HTML/CSS to outlined SVG vectors. Renders with Puppeteer at high resolution, traces with potrace to produce clean vector paths. Use when asked to create an SVG logo, convert text to outlines, vectorize a component, or export HTML as a vector graphic.

2026-03-041

visual-compare

JDerekLomas/claude-code-skills

Visual comparison of a reference app against a replica build. Screenshots both via Chrome DevTools, uses Claude vision to identify differences, generates gap reports. Use when asked to compare, audit visuals, check replication fidelity, or run a visual diff.

2026-03-041

card-deck-creation

JDerekLomas/claude-code-skills

Create themed card decks with AI-generated artwork, Puppeteer rendering, and web deployment. Use when asked to make a card deck, playing cards, tarot deck, or similar card-based content.

2026-02-111

codevibing

JDerekLomas/claude-code-skills

Share to codevibing.com - the social network for Claude Code users. Zero friction posting, heartbeats, friends.

2026-02-111

input

JDerekLomas/claude-code-skills

Human-in-the-loop feedback tools for reviewing AI output. Use when asked to review a site, get design feedback, check generated images, or review AI content. Commands include /input check (see feedback), /input apply (apply edits), /input clear (reset), and /input page setup (add widget to project).

2026-02-111

name	daily-sourcelibrary
description	Batch OCR and translation for Source Library using Gemini Batch API (50% cheaper). Process books from the roadmap queue.

Source Library Batch Processing

Process historical Latin texts using Gemini Batch API for 50% cost savings.

Current Status (2025-12-27)

6 books complete:

Euclid (Elements) - 278 pages
Plato (Complete Works) - 718 pages
Ptolemy (Cosmographia) - 276 pages
Boethius (Consolation of Philosophy) - 294 pages
Plotinus (Enneads) - 544 pages
Copernicus (De revolutionibus) - 422 pages

Results at: ~/translate/data/pipeline/results/

Quick Commands

cd ~/translate && source .venv/bin/activate && set -a && source .env && set +a

Check batch job status on Gemini

python3 -c "
from pathlib import Path
import json
from google import genai
import os
client = genai.Client(api_key=os.environ['GEMINI_API_KEY'])

for f in sorted(Path('data/pipeline/batch_jobs').glob('batch_*.json'))[-10:]:
    if '_keys' in f.name: continue
    data = json.loads(f.read_text())
    gname = data.get('gemini_job_name')
    if gname:
        r = client.batches.get(name=gname)
        status = str(r.state).replace('JobState.JOB_STATE_', '')
        print(f\"{data['book_id'][:25]:<25} {data['job_type']:<8} {status}\")
"

Check local results

python3 -c "
from pathlib import Path
for d in sorted(Path('data/pipeline/results').iterdir()):
    if d.is_dir():
        ocr = len(list(d.glob('ocr_*.md')))
        trans = len(list(d.glob('trans_*.md')))
        print(f'{d.name}: {ocr} OCR, {trans} Trans')
"

Process next book from roadmap

python -m src.pipeline.daily_run

Process specific book

python -m src.pipeline.daily_run --book <ia_identifier>

Collect results from completed batch

python3 -c "
from google import genai
from pathlib import Path
import json, os
client = genai.Client(api_key=os.environ['GEMINI_API_KEY'])

job_id = 'batch_YYYYMMDD_HHMMSS'  # Replace with actual job ID
job_file = Path(f'data/pipeline/batch_jobs/{job_id}.json')
data = json.loads(job_file.read_text())
result = client.batches.get(name=data['gemini_job_name'])

keys_file = Path(f'data/pipeline/batch_jobs/{job_id}_keys.json')
keys = json.loads(keys_file.read_text())
results_dir = Path(f'data/pipeline/results/{data[\"book_id\"]}')
results_dir.mkdir(parents=True, exist_ok=True)

for i, resp in enumerate(result.dest.inlined_responses):
    if i < len(keys) and resp.response and resp.response.candidates:
        text = resp.response.candidates[0].content.parts[0].text
        (results_dir / f'{keys[i]}.md').write_text(text)
        print(f'Collected {keys[i]}')
"

Roadmap Queue

Next books to process (from data/roadmap.json):

Ptolemy - Almagest (need non-encrypted source)
Apollonius - Conics
Ficino - Platonic Theology
Pico - Oration on Dignity of Man
Corpus Hermeticum
Kepler - Mysterium Cosmographicum
Galileo - Sidereus Nuncius

Pipeline Architecture

Internet Archive  -->  PDF Download  -->  PNG Extraction
                                              |
                                              v
                       Split Detection (pages 10, 20, 25)
                                              |
                          Single pages?  -----+
                              |               |
                              v               v
                      OCR Batch Submit    Flag for manual split
                              |
                              v
                     Translation Batch
                              |
                              v
                      Collect Results
                              |
                              v
                    ~/translate/data/pipeline/results/

Cost Savings

Gemini Batch API: 50% cheaper than real-time

No rate limits
1-24 hour turnaround
Results retained ~30 days

Troubleshooting

Proxy errors

unset ALL_PROXY HTTP_PROXY HTTPS_PROXY

Missing API key

set -a && source .env && set +a

Large book (>1GB payload)

Split into multiple batches of ~180 pages each.