Complete Python implementation with strict type safety and best practices.
Features:
- RSS/Atom/HTML web scraping
- GPT-4 Vision image analysis
- Node.js API integration
- RSS/JSON feed publishing
Modules:
- src/config.py: Configuration with strict validation
- src/exceptions.py: Custom exception hierarchy
- src/scraper.py: Multi-format news scraping (RSS/Atom/HTML)
- src/image_analyzer.py: GPT-4 Vision integration with retry
- src/aggregator.py: Content aggregation and filtering
- src/article_client.py: Node.js API client with retry
- src/publisher.py: RSS/JSON feed generation
- scripts/run.py: Complete pipeline orchestrator
- scripts/validate.py: Code quality validation
Code Quality:
- 100% type hint coverage (mypy strict mode)
- Zero bare except clauses
- Logger throughout (no print statements)
- Comprehensive test suite (598 lines)
- Immutable dataclasses (frozen=True)
- Explicit error handling
- Structured logging
Stats:
- 1,431 lines of source code
- 598 lines of test code
- 15 Python files
- 8 core modules
- 4 test suites
All validation checks pass.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
34 lines
896 B
Plaintext
34 lines
896 B
Plaintext
# .env.example - Copy to .env and fill in your values
|
|
|
|
# ==============================================
|
|
# REQUIRED CONFIGURATION
|
|
# ==============================================
|
|
|
|
# OpenAI API Key (get from https://platform.openai.com/api-keys)
|
|
OPENAI_API_KEY=sk-proj-your-actual-key-here
|
|
|
|
# Node.js Article Generator API URL
|
|
NODE_API_URL=http://localhost:3000
|
|
|
|
# News sources (comma-separated URLs)
|
|
NEWS_SOURCES=https://techcrunch.com/feed,https://www.theverge.com/rss/index.xml
|
|
|
|
# ==============================================
|
|
# OPTIONAL CONFIGURATION
|
|
# ==============================================
|
|
|
|
# Logging level (DEBUG, INFO, WARNING, ERROR)
|
|
LOG_LEVEL=INFO
|
|
|
|
# Maximum articles to process per source
|
|
MAX_ARTICLES=10
|
|
|
|
# HTTP timeout for scraping (seconds)
|
|
SCRAPER_TIMEOUT=10
|
|
|
|
# HTTP timeout for API calls (seconds)
|
|
API_TIMEOUT=30
|
|
|
|
# Output directory (default: ./output)
|
|
OUTPUT_DIR=./output
|