|
|
8600fbe23f
|
Add OCR PDF Service project documentation
New project: Online OCR service for PDFs with dual output modes
- Mode 1: Extract text from scanned PDFs
- Mode 2: Generate searchable PDFs with embedded OCR text layer
Key features:
- Multi-language support (CN/EN/FR) via PaddleOCR
- Two output formats: plain text or searchable PDF
- Reuses validated OCR pipeline from ClassGen (99.97% accuracy)
- Proposed architecture: Node.js API + Python OCR worker + job queue
Suggested stack:
- Backend: PaddleOCR (already validated), Node.js + Express
- PDF processing: pdf-lib, PyPDF2
- Queue: Redis + Bull for async processing
Timeline: 3-4 weeks for production-ready MVP
Status: Conception phase - awaiting prioritization decision
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
|
2025-11-19 11:37:06 +08:00 |
|