couple-repo

StillHammer/couple-repo

Fork 0

Commit Graph

Author	SHA1	Message	Date
StillHammer	8600fbe23f	Add OCR PDF Service project documentation New project: Online OCR service for PDFs with dual output modes - Mode 1: Extract text from scanned PDFs - Mode 2: Generate searchable PDFs with embedded OCR text layer Key features: - Multi-language support (CN/EN/FR) via PaddleOCR - Two output formats: plain text or searchable PDF - Reuses validated OCR pipeline from ClassGen (99.97% accuracy) - Proposed architecture: Node.js API + Python OCR worker + job queue Suggested stack: - Backend: PaddleOCR (already validated), Node.js + Express - PDF processing: pdf-lib, PyPDF2 - Queue: Redis + Bull for async processing Timeline: 3-4 weeks for production-ready MVP Status: Conception phase - awaiting prioritization decision 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 11:37:06 +08:00

Author

SHA1

Message

Date

StillHammer

8600fbe23f

Add OCR PDF Service project documentation

New project: Online OCR service for PDFs with dual output modes
- Mode 1: Extract text from scanned PDFs
- Mode 2: Generate searchable PDFs with embedded OCR text layer

Key features:
- Multi-language support (CN/EN/FR) via PaddleOCR
- Two output formats: plain text or searchable PDF
- Reuses validated OCR pipeline from ClassGen (99.97% accuracy)
- Proposed architecture: Node.js API + Python OCR worker + job queue

Suggested stack:
- Backend: PaddleOCR (already validated), Node.js + Express
- PDF processing: pdf-lib, PyPDF2
- Queue: Redis + Bull for async processing

Timeline: 3-4 weeks for production-ready MVP
Status: Conception phase - awaiting prioritization decision

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-19 11:37:06 +08:00

1 Commits