بنقرة واحدة
PDF 파일 작업 — 생성/편집/병합/분할/회전/워터마크/암호화/폼 채우기/이미지 추출/OCR. .pdf 파일 언급 또는 PDF 산출물 요청 시 사용.
التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.
القائمة
PDF 파일 작업 — 생성/편집/병합/분할/회전/워터마크/암호화/폼 채우기/이미지 추출/OCR. .pdf 파일 언급 또는 PDF 산출물 요청 시 사용.
التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.
استنادا إلى تصنيف SOC المهني
Git 브랜치 전략·커밋 컨벤션·머지 정책 가이드라인. Conventional Commits, GitFlow vs Trunk-based vs Feature Branch 선택 기준, 브랜치 네이밍 규칙(Jira 이슈 키 연계), 리베이스 vs 머지 vs squash 전략, destructive 작업 안전 가이드(force-push/reset/branch 삭제). git 에이전트(git-workflow-expert)가 실행 시 참조하는 SSOT.
Atlassian Jira (Cloud) 워크플로우 가이드라인 — Epic/Task 생성, 상태 전이, 브랜치 생성 컨벤션 검증, 시안 사전 승인 정책. 모든 write/update/delete 작업은 시안 제시 후 사용자 명시 OK 대기. 프로젝트별 커스터마이징(PROJECT_KEY / ASSIGNEE / COMPONENT / customfield) 가이드 포함. Jira CLI/MCP/REST 4-tier fallback 설계.
Create, edit, read .docx Word documents with formatting (TOC, headings, page numbers, letterheads, tables). Also for tracked changes, find-and-replace, image insertion, or converting content to polished Word format. Triggers: 'Word doc', '.docx', 'report', 'memo', 'letter', 'template'. Do NOT use for PDFs, spreadsheets, or Google Docs.
C99 embedded firmware — HAL Ops Table pattern, FreeRTOS static tasks/queues/semaphores, ISR-safe ring buffer, no-malloc policy, Unity test with mock drivers, CMake HOST/STM32 dual-build. For writing drivers (GPIO/UART/SPI/I2C), HAL abstraction, RTOS tasks, or MCU firmware (STM32/ESP32).
Create, modify, and manage Claude Code skills. Covers SKILL.md YAML frontmatter, skill-rules.json trigger patterns (keywords/intent/path/content), enforcement levels (block/suggest/warn), hook mechanisms (UserPromptSubmit/PreToolUse), session tracking, and 500-line rule with resources/ progressive disclosure.
Spring Boot 3 + Java 21 백엔드 — Controller→Service→Repository 계층, JPA Entity(Lombok+@PrePersist), Spring Security JWT Filter(jjwt 0.12.x + Refresh Rotation), GlobalExceptionHandler, ApiResponse<T>, @Transactional(readOnly), @PreAuthorize 메서드 인가.
| name | |
| description | PDF 파일 작업 — 생성/편집/병합/분할/회전/워터마크/암호화/폼 채우기/이미지 추출/OCR. .pdf 파일 언급 또는 PDF 산출물 요청 시 사용. |
| license | Proprietary. LICENSE.txt has complete terms |
| triggers | ["PDF","pdf",".pdf","merge PDF","split PDF","PDF form","OCR","PDF encrypt","extract text from PDF","create PDF"] |
This guide covers essential PDF processing operations using Python libraries and command-line tools. For advanced features, JavaScript libraries, and detailed examples, see REFERENCE.md. If you need to fill out a PDF form, read FORMS.md and follow its instructions.
from pypdf import PdfReader, PdfWriter
# Read a PDF
reader = PdfReader("document.pdf")
print(f"Pages: {len(reader.pages)}")
# Extract text
text = ""
for page in reader.pages:
text += page.extract_text()
from pypdf import PdfWriter, PdfReader
writer = PdfWriter()
for pdf_file in ["doc1.pdf", "doc2.pdf", "doc3.pdf"]:
reader = PdfReader(pdf_file)
for page in reader.pages:
writer.add_page(page)
with open("merged.pdf", "wb") as output:
writer.write(output)
reader = PdfReader("input.pdf")
for i, page in enumerate(reader.pages):
writer = PdfWriter()
writer.add_page(page)
with open(f"page_{i+1}.pdf", "wb") as output:
writer.write(output)
reader = PdfReader("document.pdf")
meta = reader.metadata
print(f"Title: {meta.title}")
print(f"Author: {meta.author}")
print(f"Subject: {meta.subject}")
print(f"Creator: {meta.creator}")
reader = PdfReader("input.pdf")
writer = PdfWriter()
page = reader.pages[0]
page.rotate(90) # Rotate 90 degrees clockwise
writer.add_page(page)
with open("rotated.pdf", "wb") as output:
writer.write(output)
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
for page in pdf.pages:
text = page.extract_text()
print(text)
with pdfplumber.open("document.pdf") as pdf:
for i, page in enumerate(pdf.pages):
tables = page.extract_tables()
for j, table in enumerate(tables):
print(f"Table {j+1} on page {i+1}:")
for row in table:
print(row)
import pandas as pd
with pdfplumber.open("document.pdf") as pdf:
all_tables = []
for page in pdf.pages:
tables = page.extract_tables()
for table in tables:
if table: # Check if table is not empty
df = pd.DataFrame(table[1:], columns=table[0])
all_tables.append(df)
# Combine all tables
if all_tables:
combined_df = pd.concat(all_tables, ignore_index=True)
combined_df.to_excel("extracted_tables.xlsx", index=False)
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
c = canvas.Canvas("hello.pdf", pagesize=letter)
width, height = letter
# Add text
c.drawString(100, height - 100, "Hello World!")
c.drawString(100, height - 120, "This is a PDF created with reportlab")
# Add a line
c.line(100, height - 140, 400, height - 140)
# Save
c.save()
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, PageBreak
from reportlab.lib.styles import getSampleStyleSheet
doc = SimpleDocTemplate("report.pdf", pagesize=letter)
styles = getSampleStyleSheet()
story = []
# Add content
title = Paragraph("Report Title", styles['Title'])
story.append(title)
story.append(Spacer(1, 12))
body = Paragraph("This is the body of the report. " * 20, styles['Normal'])
story.append(body)
story.append(PageBreak())
# Page 2
story.append(Paragraph("Page 2", styles['Heading1']))
story.append(Paragraph("Content for page 2", styles['Normal']))
# Build PDF
doc.build(story)
IMPORTANT: Never use Unicode subscript/superscript characters (₀₁₂₃₄₅₆₇₈₉, ⁰¹²³⁴⁵⁶⁷⁸⁹) in ReportLab PDFs. The built-in fonts do not include these glyphs, causing them to render as solid black boxes.
Instead, use ReportLab's XML markup tags in Paragraph objects:
from reportlab.platypus import Paragraph
from reportlab.lib.styles import getSampleStyleSheet
styles = getSampleStyleSheet()
# Subscripts: use <sub> tag
chemical = Paragraph("H<sub>2</sub>O", styles['Normal'])
# Superscripts: use <super> tag
squared = Paragraph("x<super>2</super> + y<super>2</super>", styles['Normal'])
For canvas-drawn text (not Paragraph objects), manually adjust font the size and position rather than using Unicode subscripts/superscripts.
# Extract text
pdftotext input.pdf output.txt
# Extract text preserving layout
pdftotext -layout input.pdf output.txt
# Extract specific pages
pdftotext -f 1 -l 5 input.pdf output.txt # Pages 1-5
# Merge PDFs
qpdf --empty --pages file1.pdf file2.pdf -- merged.pdf
# Split pages
qpdf input.pdf --pages . 1-5 -- pages1-5.pdf
qpdf input.pdf --pages . 6-10 -- pages6-10.pdf
# Rotate pages
qpdf input.pdf output.pdf --rotate=+90:1 # Rotate page 1 by 90 degrees
# Remove password
qpdf --password=mypassword --decrypt encrypted.pdf decrypted.pdf
# Merge
pdftk file1.pdf file2.pdf cat output merged.pdf
# Split
pdftk input.pdf burst
# Rotate
pdftk input.pdf rotate 1east output rotated.pdf
# Requires: pip install pytesseract pdf2image
import pytesseract
from pdf2image import convert_from_path
# Convert PDF to images
images = convert_from_path('scanned.pdf')
# OCR each page
text = ""
for i, image in enumerate(images):
text += f"Page {i+1}:\n"
text += pytesseract.image_to_string(image)
text += "\n\n"
print(text)
from pypdf import PdfReader, PdfWriter
# Create watermark (or load existing)
watermark = PdfReader("watermark.pdf").pages[0]
# Apply to all pages
reader = PdfReader("document.pdf")
writer = PdfWriter()
for page in reader.pages:
page.merge_page(watermark)
writer.add_page(page)
with open("watermarked.pdf", "wb") as output:
writer.write(output)
# Using pdfimages (poppler-utils)
pdfimages -j input.pdf output_prefix
# This extracts all images as output_prefix-000.jpg, output_prefix-001.jpg, etc.
from pypdf import PdfReader, PdfWriter
reader = PdfReader("input.pdf")
writer = PdfWriter()
for page in reader.pages:
writer.add_page(page)
# Add password
writer.encrypt("userpassword", "ownerpassword")
with open("encrypted.pdf", "wb") as output:
writer.write(output)
| Task | Best Tool | Command/Code |
|---|---|---|
| Merge PDFs | pypdf | writer.add_page(page) |
| Split PDFs | pypdf | One page per file |
| Extract text | pdfplumber | page.extract_text() |
| Extract tables | pdfplumber | page.extract_tables() |
| Create PDFs | reportlab | Canvas or Platypus |
| Command line merge | qpdf | qpdf --empty --pages ... |
| OCR scanned PDFs | pytesseract | Convert to image first |
| Fill PDF forms | pdf-lib or pypdf (see FORMS.md) | See FORMS.md |
"H₂O" — Unicode subscript characters render as black boxes in built-in fonts"H<sub>2</sub>O" in Paragraph objectsPdfReader("500page.pdf") — loads all pages at once, causes memory spikesreader = PdfReader("large.pdf")
for page in reader.pages[start:end]: # Process in batches
text = page.extract_text()
writer.encrypt("userpassword", "ownerpassword")