| name | freight-doc-processor |
| description | Automatically detect, OCR-parse, and verify POD (Proof of Delivery) and BOL (Bill of Lading) documents received as email attachments. Matches documents to load records, flags discrepancies, and triggers invoicing. Use when a broker receives delivery documents, needs POD confirmation, wants to verify delivery details, or needs to process BOLs. Triggered by phrases like "missing PODs", "check documents", "verify delivery", "resend POD", "process BOL", or automatically when email with PDF/image arrives. |
Freight Document Processor
Detects, OCR-parses, and verifies POD and BOL documents from email attachments. Matches to load records and confirms delivery or flags issues.
Setup
pip3 install pdfplumber pytesseract Pillow
Usage
Detect document type
cd skills/freight-doc-processor/scripts
python3 doc_detector.py --file /path/to/document.pdf
OCR extract text
python3 ocr_extractor.py --file document.pdf --type POD
python3 ocr_extractor.py --file scan.png --type BOL --json
Match to load record
python3 doc_matcher.py --bol-number BOL-12345
python3 doc_matcher.py --shipper "ABC Corp"
Document Types
| Type | Description | Typical Fields |
|---|
| POD | Proof of Delivery | Delivery date, receiver name, signature, exceptions |
| BOL | Bill of Lading | Shipper, consignee, commodity, weight, units |
| RATE_CON | Rate Confirmation | Agreed rate, terms, carrier MC, load details |
| INVOICE | Invoice | Amount due, invoice number, payment terms |
Broker Text Commands
| Command | Action |
|---|
missing PODs | List loads awaiting POD confirmation |
resend POD [load ID] | Request POD from carrier |
check documents | Show recently processed documents |
verify delivery [load ID] | Confirm if load is marked delivered |
OCR Accuracy
- Digital PDFs: 95%+ accuracy with pdfplumber
- Scanned documents: 70-90% depending on scan quality
- Poor quality: May flag as unreadable, requests clearer copy
Example SMS Outputs
POD Confirmed:
ā
POD RECEIVED - Load #12345
Delivered: Dec 15 at 3:45pm
Receiver: J. Smith | Chicago, IL
Units: 24 pallets ā | No exceptions
Ready for invoicing. Reply 'invoice' to generate.
POD with discrepancies:
ā ļø DISCREPANCY - Load #12345
BOL shows: 48 pallets | POD shows: 46 pallets
Shortage noted by receiver
Reply 'override' to invoice 46 units
Reply 'resolve' to contact carrier
Unmatched document:
š DOCUMENT RECEIVED
Type: POD (confidence: medium)
Could not match to any load record
Reply 'match [load ID]' to link this document
Processing Chain
Email attachment arrives
ā
doc_pipeline.py (orchestrator)
ā
1. Docling (local, free, preferred)
ā fails or low confidence
2. Mistral OCR (cloud API fallback)
ā fails
3. Alert broker for manual review
Scripts Reference
doc_pipeline.py ā Main orchestrator (start here)
python3 doc_pipeline.py --demo
python3 doc_pipeline.py --file document.pdf
python3 doc_pipeline.py --scan
python3 doc_pipeline.py --scan --json
docling_processor.py ā Primary OCR (local, no API key needed)
pip3 install docling
python3 docling_processor.py --file document.pdf
python3 docling_processor.py --file scan.jpg --type POD --json
mistral_ocr.py ā Fallback OCR (cloud)
export MISTRAL_API_KEY=your_key_here
python3 mistral_ocr.py --file document.pdf --json
doc_detector.py ā Classify document type
python3 doc_detector.py --file /path/to/document.pdf
ocr_extractor.py ā Extract text fields (Tesseract-based)
python3 ocr_extractor.py --file document.pdf --type POD
python3 ocr_extractor.py --file scan.png --type BOL --json
doc_matcher.py ā Match document to load record
python3 doc_matcher.py --bol-number BOL-12345
python3 doc_matcher.py --shipper "ABC Corp"
Install
pip3 install docling requests
pip3 install pdfplumber pytesseract Pillow
brew install tesseract
Integrations
See INTEGRATIONS.md for full details.
| Service | Purpose | Status |
|---|
| Docling | Primary processor ā local, free | ā
Built |
| Mistral OCR | Cloud fallback | ā
Built |
| Azure Document Intelligence | Enterprise alternative | Documented only |
| AWS Textract | AWS cloud alternative | Documented only |
| Tesseract OCR | Legacy local fallback | ā
Built (ocr_extractor.py) |
Cron Setup
*/5 * * * * cd /path/to/freight-doc-processor/scripts && python3 doc_pipeline.py --scan >> /tmp/doc-pipeline.log 2>&1
Storage
- Documents:
~/.freight-broker/attachments/
- OCR cache:
~/.freight-broker/ocr_cache/
- Processed log:
~/.freight-broker/processed_docs.json