Go to file

StillHammer 40138c2d45 Initial implementation: Feed Generator V1 Complete Python implementation with strict type safety and best practices. Features: - RSS/Atom/HTML web scraping - GPT-4 Vision image analysis - Node.js API integration - RSS/JSON feed publishing Modules: - src/config.py: Configuration with strict validation - src/exceptions.py: Custom exception hierarchy - src/scraper.py: Multi-format news scraping (RSS/Atom/HTML) - src/image_analyzer.py: GPT-4 Vision integration with retry - src/aggregator.py: Content aggregation and filtering - src/article_client.py: Node.js API client with retry - src/publisher.py: RSS/JSON feed generation - scripts/run.py: Complete pipeline orchestrator - scripts/validate.py: Code quality validation Code Quality: - 100% type hint coverage (mypy strict mode) - Zero bare except clauses - Logger throughout (no print statements) - Comprehensive test suite (598 lines) - Immutable dataclasses (frozen=True) - Explicit error handling - Structured logging Stats: - 1,431 lines of source code - 598 lines of test code - 15 Python files - 8 core modules - 4 test suites All validation checks pass. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>		2025-10-07 22:28:18 +08:00
scripts	Initial implementation: Feed Generator V1	2025-10-07 22:28:18 +08:00
src	Initial implementation: Feed Generator V1	2025-10-07 22:28:18 +08:00
tests	Initial implementation: Feed Generator V1	2025-10-07 22:28:18 +08:00
.env.example	Initial implementation: Feed Generator V1	2025-10-07 22:28:18 +08:00
.gitignore	Initial implementation: Feed Generator V1	2025-10-07 22:28:18 +08:00
ARCHITECTURE.md	Initial implementation: Feed Generator V1	2025-10-07 22:28:18 +08:00
CLAUDE.md	Initial implementation: Feed Generator V1	2025-10-07 22:28:18 +08:00
mypy.ini	Initial implementation: Feed Generator V1	2025-10-07 22:28:18 +08:00
pyproject.toml	Initial implementation: Feed Generator V1	2025-10-07 22:28:18 +08:00
QUICKSTART.md	Initial implementation: Feed Generator V1	2025-10-07 22:28:18 +08:00
README.md	Initial implementation: Feed Generator V1	2025-10-07 22:28:18 +08:00
requirements.txt	Initial implementation: Feed Generator V1	2025-10-07 22:28:18 +08:00
SETUP.md	Initial implementation: Feed Generator V1	2025-10-07 22:28:18 +08:00
STATUS.md	Initial implementation: Feed Generator V1	2025-10-07 22:28:18 +08:00

README.md

Feed Generator

AI-powered content aggregation system that scrapes news, analyzes images, and generates articles.

Project Status

✅ Structure Complete - All modules implemented with strict type safety ✅ Type Hints - 100% coverage on all functions ✅ Tests - Comprehensive test suite for core modules ✅ Documentation - Full docstrings and inline documentation

Architecture

Web Sources → Scraper → Image Analyzer → Aggregator → Node API Client → Publisher
     ↓           ↓            ↓              ↓              ↓              ↓
   HTML      NewsArticle  AnalyzedArticle  Prompt    GeneratedArticle  Feed/RSS

Modules

src/config.py - Configuration management with strict validation
src/exceptions.py - Custom exception hierarchy
src/scraper.py - Web scraping (RSS/Atom/HTML)
src/image_analyzer.py - GPT-4 Vision image analysis
src/aggregator.py - Content aggregation and prompt generation
src/article_client.py - Node.js API client
src/publisher.py - RSS/JSON publishing

Installation

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Edit .env with your API keys

Configuration

Required environment variables in .env:

OPENAI_API_KEY=sk-your-key-here
NODE_API_URL=http://localhost:3000
NEWS_SOURCES=https://techcrunch.com/feed,https://example.com/rss

See .env.example for all options.

Usage

# Run the pipeline
python scripts/run.py

Output files:

output/feed.rss - RSS 2.0 feed
output/articles.json - JSON export
feed_generator.log - Execution log

Type Checking

# Run mypy to verify type safety
mypy src/

# Should pass with zero errors

Testing

# Run all tests
pytest tests/ -v

# With coverage
pytest tests/ --cov=src --cov-report=html

Code Quality Checks

All code follows strict Python best practices:

✅ Type hints on ALL functions
✅ No bare except: clauses
✅ Logger instead of print()
✅ Explicit error handling
✅ Immutable dataclasses
✅ No global state
✅ No magic strings (use Enums)

Documentation

ARCHITECTURE.md - Technical design and data flow
CLAUDE.md - Development guidelines and rules
SETUP.md - Detailed installation guide

Development

This is a V1 prototype built for speed while maintaining quality:

Type Safety: Full mypy compliance
Testing: Unit tests for all modules
Error Handling: Explicit exceptions throughout
Logging: Structured logging at all stages
Configuration: Externalized, validated config

Next Steps

Install dependencies: pip install -r requirements.txt
Configure .env file with API keys
Run type checking: mypy src/
Run tests: pytest tests/
Execute pipeline: python scripts/run.py

License

Proprietary - Internal use only