Commit Graph

1 Commits

Author SHA1 Message Date
8600fbe23f Add OCR PDF Service project documentation
New project: Online OCR service for PDFs with dual output modes
- Mode 1: Extract text from scanned PDFs
- Mode 2: Generate searchable PDFs with embedded OCR text layer

Key features:
- Multi-language support (CN/EN/FR) via PaddleOCR
- Two output formats: plain text or searchable PDF
- Reuses validated OCR pipeline from ClassGen (99.97% accuracy)
- Proposed architecture: Node.js API + Python OCR worker + job queue

Suggested stack:
- Backend: PaddleOCR (already validated), Node.js + Express
- PDF processing: pdf-lib, PyPDF2
- Queue: Redis + Bull for async processing

Timeline: 3-4 weeks for production-ready MVP
Status: Conception phase - awaiting prioritization decision

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 11:37:06 +08:00