with one click
pdf-verification-cli
Verify PDF page count and content using command-line tools when Python libraries unavailable
Menu
Verify PDF page count and content using command-line tools when Python libraries unavailable
| name | pdf-verification-cli |
| description | Verify PDF page count and content using command-line tools when Python libraries unavailable |
When verifying PDF files during task execution, Python libraries like PyPDF2 may not be available in the environment. This skill provides a reliable alternative using standard command-line tools from the poppler-utils package.
pdfinfo - Extract PDF MetadataUse pdfinfo to get page count and other metadata:
# Get full PDF info
pdfinfo document.pdf
# Get only page count
pdfinfo document.pdf | grep Pages
# Extract page count as a number
pdfinfo document.pdf | grep Pages | awk '{print $2}'
Key metadata fields:
Pages: Number of pages in the PDFTitle: Document titleAuthor: Document authorCreator: Application that created the PDFProducer: Application that processed the PDFCreationDate: When the PDF was createdModDate: Last modification datepdftotext - Extract Text ContentUse pdftotext to inspect the actual content of the PDF:
# Extract all text to stdout
pdftotext document.pdf -
# Extract text to a file
pdftotext document.pdf output.txt
# Extract text from specific page range
pdftotext -f 1 -l 3 document.pdf output.txt
# Preserve layout (rough formatting)
pdftotext -layout document.pdf output.txt
# Check if tools are installed
which pdfinfo
which pdftotext
# Or test with --help
pdfinfo --help 2>&1 | head -1
# Debian/Ubuntu
apt-get update && apt-get install -y poppler-utils
# RHEL/CentOS/Fedora
yum install -y poppler-utils
# or
dnf install -y poppler-utils
# macOS (with Homebrew)
brew install poppler
# Verify page count matches expected
EXPECTED_PAGES=4
ACTUAL_PAGES=$(pdfinfo document.pdf | grep Pages | awk '{print $2}')
if [ "$ACTUAL_PAGES" -eq "$EXPECTED_PAGES" ]; then
echo "✓ Page count verified: $ACTUAL_PAGES pages"
else
echo "✗ Page count mismatch: expected $EXPECTED_PAGES, got $ACTUAL_PAGES"
fi
# Check for required sections/content
pdftotext document.pdf - | grep -i "checklist" && echo "✓ Contains checklist section"
pdftotext document.pdf - | grep -i "references" && echo "✓ Contains references section"
# Count occurrences of key terms
pdftotext document.pdf - | grep -ci "assessment" # Case-insensitive count
import subprocess
def get_pdf_page_count(pdf_path):
"""Get page count using pdfinfo"""
result = subprocess.run(
['pdfinfo', pdf_path],
capture_output=True,
text=True
)
for line in result.stdout.split('\n'):
if line.startswith('Pages:'):
return int(line.split(':')[1].strip())
return None
def extract_pdf_text(pdf_path):
"""Extract all text from PDF using pdftotext"""
result = subprocess.run(
['pdftotext', pdf_path, '-'],
capture_output=True,
text=True
)
return result.stdout
def verify_pdf(pdf_path, expected_pages, required_terms):
"""Verify PDF has expected page count and contains required terms"""
# Check page count
pages = get_pdf_page_count(pdf_path)
if pages != expected_pages:
return False, f"Expected {expected_pages} pages, got {pages}"
# Check content
text = extract_pdf_text(pdf_path).lower()
missing = [term for term in required_terms if term.lower() not in text]
if missing:
return False, f"Missing terms: {missing}"
return True, "PDF verification passed"
| Task | Command |
|---|---|
| Count pages | pdfinfo file.pdf | grep Pages |
| Check if PDF has text | pdftotext file.pdf - | head -5 |
| Search for keyword | pdftotext file.pdf - | grep -i "keyword" |
| Extract first page | pdftotext -f 1 -l 1 file.pdf out.txt |
| Get PDF title | pdfinfo file.pdf | grep Title |
pdfinfo: command not found
pdftotext returns empty output
pdftotext -layout for better text extractionPage count seems wrong
pdftotext to see actual content per pageDelegate tasks to OpenSpace — a full-stack autonomous worker for coding, DevOps, web research, and desktop automation, backed by an extensive MCP tool and skill library. Skills auto-improve through use, reducing token consumption over time. A cloud community lets agents share and collectively evolve reusable skills.
Incremental audio production with duration mismatch handling, adaptive stem extension, and pre-mix alignment verification
Audio production with diagnostic analysis, timecode parsing from documents, and verified export workflow
Incremental audio production with duration alignment handling, per-stem verification, and adaptive extension strategies
Step-by-step audio production with per-stem verification, timing alignment, and incremental quality gates
End-to-end audio production workflow with stems, effects, archiving, and verification