Complete Python implementation with strict type safety and best practices.
Features:
- RSS/Atom/HTML web scraping
- GPT-4 Vision image analysis
- Node.js API integration
- RSS/JSON feed publishing
Modules:
- src/config.py: Configuration with strict validation
- src/exceptions.py: Custom exception hierarchy
- src/scraper.py: Multi-format news scraping (RSS/Atom/HTML)
- src/image_analyzer.py: GPT-4 Vision integration with retry
- src/aggregator.py: Content aggregation and filtering
- src/article_client.py: Node.js API client with retry
- src/publisher.py: RSS/JSON feed generation
- scripts/run.py: Complete pipeline orchestrator
- scripts/validate.py: Code quality validation
Code Quality:
- 100% type hint coverage (mypy strict mode)
- Zero bare except clauses
- Logger throughout (no print statements)
- Comprehensive test suite (598 lines)
- Immutable dataclasses (frozen=True)
- Explicit error handling
- Structured logging
Stats:
- 1,431 lines of source code
- 598 lines of test code
- 15 Python files
- 8 core modules
- 4 test suites
All validation checks pass.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
127 lines
3.0 KiB
Markdown
127 lines
3.0 KiB
Markdown
# Feed Generator
|
|
|
|
AI-powered content aggregation system that scrapes news, analyzes images, and generates articles.
|
|
|
|
## Project Status
|
|
|
|
✅ **Structure Complete** - All modules implemented with strict type safety
|
|
✅ **Type Hints** - 100% coverage on all functions
|
|
✅ **Tests** - Comprehensive test suite for core modules
|
|
✅ **Documentation** - Full docstrings and inline documentation
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Web Sources → Scraper → Image Analyzer → Aggregator → Node API Client → Publisher
|
|
↓ ↓ ↓ ↓ ↓ ↓
|
|
HTML NewsArticle AnalyzedArticle Prompt GeneratedArticle Feed/RSS
|
|
```
|
|
|
|
## Modules
|
|
|
|
- `src/config.py` - Configuration management with strict validation
|
|
- `src/exceptions.py` - Custom exception hierarchy
|
|
- `src/scraper.py` - Web scraping (RSS/Atom/HTML)
|
|
- `src/image_analyzer.py` - GPT-4 Vision image analysis
|
|
- `src/aggregator.py` - Content aggregation and prompt generation
|
|
- `src/article_client.py` - Node.js API client
|
|
- `src/publisher.py` - RSS/JSON publishing
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
# Create virtual environment
|
|
python3 -m venv venv
|
|
source venv/bin/activate # On Windows: venv\Scripts\activate
|
|
|
|
# Install dependencies
|
|
pip install -r requirements.txt
|
|
|
|
# Configure environment
|
|
cp .env.example .env
|
|
# Edit .env with your API keys
|
|
```
|
|
|
|
## Configuration
|
|
|
|
Required environment variables in `.env`:
|
|
|
|
```bash
|
|
OPENAI_API_KEY=sk-your-key-here
|
|
NODE_API_URL=http://localhost:3000
|
|
NEWS_SOURCES=https://techcrunch.com/feed,https://example.com/rss
|
|
```
|
|
|
|
See `.env.example` for all options.
|
|
|
|
## Usage
|
|
|
|
```bash
|
|
# Run the pipeline
|
|
python scripts/run.py
|
|
```
|
|
|
|
Output files:
|
|
- `output/feed.rss` - RSS 2.0 feed
|
|
- `output/articles.json` - JSON export
|
|
- `feed_generator.log` - Execution log
|
|
|
|
## Type Checking
|
|
|
|
```bash
|
|
# Run mypy to verify type safety
|
|
mypy src/
|
|
|
|
# Should pass with zero errors
|
|
```
|
|
|
|
## Testing
|
|
|
|
```bash
|
|
# Run all tests
|
|
pytest tests/ -v
|
|
|
|
# With coverage
|
|
pytest tests/ --cov=src --cov-report=html
|
|
```
|
|
|
|
## Code Quality Checks
|
|
|
|
All code follows strict Python best practices:
|
|
|
|
- ✅ Type hints on ALL functions
|
|
- ✅ No bare `except:` clauses
|
|
- ✅ Logger instead of `print()`
|
|
- ✅ Explicit error handling
|
|
- ✅ Immutable dataclasses
|
|
- ✅ No global state
|
|
- ✅ No magic strings (use Enums)
|
|
|
|
## Documentation
|
|
|
|
- `ARCHITECTURE.md` - Technical design and data flow
|
|
- `CLAUDE.md` - Development guidelines and rules
|
|
- `SETUP.md` - Detailed installation guide
|
|
|
|
## Development
|
|
|
|
This is a V1 prototype built for speed while maintaining quality:
|
|
|
|
- **Type Safety**: Full mypy compliance
|
|
- **Testing**: Unit tests for all modules
|
|
- **Error Handling**: Explicit exceptions throughout
|
|
- **Logging**: Structured logging at all stages
|
|
- **Configuration**: Externalized, validated config
|
|
|
|
## Next Steps
|
|
|
|
1. Install dependencies: `pip install -r requirements.txt`
|
|
2. Configure `.env` file with API keys
|
|
3. Run type checking: `mypy src/`
|
|
4. Run tests: `pytest tests/`
|
|
5. Execute pipeline: `python scripts/run.py`
|
|
|
|
## License
|
|
|
|
Proprietary - Internal use only
|