Complete Python implementation with strict type safety and best practices.
Features:
- RSS/Atom/HTML web scraping
- GPT-4 Vision image analysis
- Node.js API integration
- RSS/JSON feed publishing
Modules:
- src/config.py: Configuration with strict validation
- src/exceptions.py: Custom exception hierarchy
- src/scraper.py: Multi-format news scraping (RSS/Atom/HTML)
- src/image_analyzer.py: GPT-4 Vision integration with retry
- src/aggregator.py: Content aggregation and filtering
- src/article_client.py: Node.js API client with retry
- src/publisher.py: RSS/JSON feed generation
- scripts/run.py: Complete pipeline orchestrator
- scripts/validate.py: Code quality validation
Code Quality:
- 100% type hint coverage (mypy strict mode)
- Zero bare except clauses
- Logger throughout (no print statements)
- Comprehensive test suite (598 lines)
- Immutable dataclasses (frozen=True)
- Explicit error handling
- Structured logging
Stats:
- 1,431 lines of source code
- 598 lines of test code
- 15 Python files
- 8 core modules
- 4 test suites
All validation checks pass.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
878 lines
21 KiB
Markdown
878 lines
21 KiB
Markdown
# CLAUDE.md - Feed Generator Project Instructions
|
|
|
|
```markdown
|
|
# CLAUDE.md - Feed Generator Development Instructions
|
|
|
|
> **CRITICAL**: This document contains mandatory rules for AI-assisted development with Claude Code.
|
|
> **NEVER** deviate from these rules without explicit human approval.
|
|
|
|
---
|
|
|
|
## PROJECT OVERVIEW
|
|
|
|
**Feed Generator** is a Python-based content aggregation system that:
|
|
1. Scrapes news from web sources
|
|
2. Analyzes images using GPT-4 Vision
|
|
3. Aggregates content into structured prompts
|
|
4. Calls existing Node.js article generation API
|
|
5. Publishes to feeds (RSS/WordPress)
|
|
|
|
**Philosophy**: Quick, functional prototype. NOT a production system yet.
|
|
**Timeline**: 3-5 days maximum for V1.
|
|
**Future**: May be rewritten in Node.js/TypeScript with strict architecture.
|
|
|
|
---
|
|
|
|
## CORE PRINCIPLES
|
|
|
|
### 1. Type Safety is MANDATORY
|
|
|
|
**NEVER write untyped Python code.**
|
|
|
|
```python
|
|
# ❌ FORBIDDEN - No type hints
|
|
def scrape_news(url):
|
|
return requests.get(url)
|
|
|
|
# ✅ REQUIRED - Full type hints
|
|
from typing import List, Dict, Optional
|
|
import requests
|
|
|
|
def scrape_news(url: str) -> Optional[Dict[str, str]]:
|
|
response: requests.Response = requests.get(url)
|
|
return response.json() if response.ok else None
|
|
```
|
|
|
|
**Rules:**
|
|
- Every function MUST have type hints for parameters and return values
|
|
- Use `typing` module: `List`, `Dict`, `Optional`, `Union`, `Tuple`
|
|
- Use `from __future__ import annotations` for forward references
|
|
- Complex types should use `TypedDict` or `dataclasses`
|
|
|
|
### 2. Explicit is Better Than Implicit
|
|
|
|
**NEVER use magic or implicit behavior.**
|
|
|
|
```python
|
|
# ❌ FORBIDDEN - Implicit dictionary keys
|
|
def process(data):
|
|
return data['title'] # What if 'title' doesn't exist?
|
|
|
|
# ✅ REQUIRED - Explicit with error handling
|
|
def process(data: Dict[str, str]) -> str:
|
|
if 'title' not in data:
|
|
raise ValueError("Missing required key: 'title'")
|
|
return data['title']
|
|
```
|
|
|
|
### 3. Fail Fast and Loud
|
|
|
|
**NEVER silently swallow errors.**
|
|
|
|
```python
|
|
# ❌ FORBIDDEN - Silent failure
|
|
try:
|
|
result = dangerous_operation()
|
|
except:
|
|
result = None
|
|
|
|
# ✅ REQUIRED - Explicit error handling
|
|
try:
|
|
result = dangerous_operation()
|
|
except SpecificException as e:
|
|
logger.error(f"Operation failed: {e}")
|
|
raise
|
|
```
|
|
|
|
### 4. Single Responsibility Modules
|
|
|
|
**Each module has ONE clear purpose.**
|
|
|
|
- `scraper.py` - ONLY scraping logic
|
|
- `image_analyzer.py` - ONLY image analysis
|
|
- `article_client.py` - ONLY API communication
|
|
- `aggregator.py` - ONLY content aggregation
|
|
- `publisher.py` - ONLY feed publishing
|
|
|
|
**NEVER mix responsibilities.**
|
|
|
|
---
|
|
|
|
## FORBIDDEN PATTERNS
|
|
|
|
### ❌ NEVER Use These
|
|
|
|
```python
|
|
# 1. Bare except
|
|
try:
|
|
something()
|
|
except: # ❌ FORBIDDEN
|
|
pass
|
|
|
|
# 2. Mutable default arguments
|
|
def func(items=[]): # ❌ FORBIDDEN
|
|
items.append(1)
|
|
return items
|
|
|
|
# 3. Global state
|
|
CACHE = {} # ❌ FORBIDDEN at module level
|
|
|
|
def use_cache():
|
|
CACHE['key'] = 'value'
|
|
|
|
# 4. Star imports
|
|
from module import * # ❌ FORBIDDEN
|
|
|
|
# 5. Untyped functions
|
|
def process(data): # ❌ FORBIDDEN - no types
|
|
return data
|
|
|
|
# 6. Magic strings
|
|
if mode == "production": # ❌ FORBIDDEN
|
|
do_something()
|
|
|
|
# 7. Implicit None returns
|
|
def maybe_returns(): # ❌ FORBIDDEN - unclear return
|
|
if condition:
|
|
return value
|
|
|
|
# 8. Nested functions for reuse
|
|
def outer():
|
|
def inner(): # ❌ FORBIDDEN if used multiple times
|
|
pass
|
|
inner()
|
|
inner()
|
|
```
|
|
|
|
### ✅ REQUIRED Patterns
|
|
|
|
```python
|
|
# 1. Specific exceptions
|
|
try:
|
|
something()
|
|
except ValueError as e: # ✅ REQUIRED
|
|
logger.error(f"Value error: {e}")
|
|
raise
|
|
|
|
# 2. Immutable defaults
|
|
def func(items: Optional[List[str]] = None) -> List[str]: # ✅ REQUIRED
|
|
if items is None:
|
|
items = []
|
|
items.append('new')
|
|
return items
|
|
|
|
# 3. Explicit configuration objects
|
|
from dataclasses import dataclass
|
|
|
|
@dataclass
|
|
class CacheConfig:
|
|
max_size: int
|
|
ttl_seconds: int
|
|
|
|
cache = Cache(config=CacheConfig(max_size=100, ttl_seconds=60))
|
|
|
|
# 4. Explicit imports
|
|
from module import SpecificClass, specific_function # ✅ REQUIRED
|
|
|
|
# 5. Typed functions
|
|
def process(data: Dict[str, Any]) -> Optional[str]: # ✅ REQUIRED
|
|
return data.get('value')
|
|
|
|
# 6. Enums for constants
|
|
from enum import Enum
|
|
|
|
class Mode(Enum): # ✅ REQUIRED
|
|
PRODUCTION = "production"
|
|
DEVELOPMENT = "development"
|
|
|
|
if mode == Mode.PRODUCTION:
|
|
do_something()
|
|
|
|
# 7. Explicit Optional returns
|
|
def maybe_returns() -> Optional[str]: # ✅ REQUIRED
|
|
if condition:
|
|
return value
|
|
return None
|
|
|
|
# 8. Extract functions to module level
|
|
def inner_logic() -> None: # ✅ REQUIRED
|
|
pass
|
|
|
|
def outer() -> None:
|
|
inner_logic()
|
|
inner_logic()
|
|
```
|
|
|
|
---
|
|
|
|
## MODULE STRUCTURE
|
|
|
|
### Standard Module Template
|
|
|
|
Every module MUST follow this structure:
|
|
|
|
```python
|
|
"""
|
|
Module: module_name.py
|
|
Purpose: [ONE sentence describing ONLY responsibility]
|
|
Dependencies: [List external dependencies]
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
# Standard library imports
|
|
import logging
|
|
from typing import Dict, List, Optional
|
|
|
|
# Third-party imports
|
|
import requests
|
|
from bs4 import BeautifulSoup
|
|
|
|
# Local imports
|
|
from .config import Config
|
|
|
|
# Module-level logger
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
class ModuleName:
|
|
"""[Clear description of class responsibility]"""
|
|
|
|
def __init__(self, config: Config) -> None:
|
|
"""Initialize with configuration.
|
|
|
|
Args:
|
|
config: Configuration object
|
|
|
|
Raises:
|
|
ValueError: If config is invalid
|
|
"""
|
|
self._config = config
|
|
self._validate_config()
|
|
|
|
def _validate_config(self) -> None:
|
|
"""Validate configuration."""
|
|
if not self._config.api_key:
|
|
raise ValueError("API key is required")
|
|
|
|
def public_method(self, param: str) -> Optional[Dict[str, str]]:
|
|
"""[Clear description]
|
|
|
|
Args:
|
|
param: [Description]
|
|
|
|
Returns:
|
|
[Description of return value]
|
|
|
|
Raises:
|
|
[Exceptions that can be raised]
|
|
"""
|
|
try:
|
|
result = self._internal_logic(param)
|
|
return result
|
|
except SpecificException as e:
|
|
logger.error(f"Failed to process {param}: {e}")
|
|
raise
|
|
|
|
def _internal_logic(self, param: str) -> Dict[str, str]:
|
|
"""Private methods use underscore prefix."""
|
|
return {"key": param}
|
|
```
|
|
|
|
---
|
|
|
|
## CONFIGURATION MANAGEMENT
|
|
|
|
**NEVER hardcode values. Use configuration objects.**
|
|
|
|
### config.py Structure
|
|
|
|
```python
|
|
"""Configuration management for Feed Generator."""
|
|
|
|
from __future__ import annotations
|
|
|
|
import os
|
|
from dataclasses import dataclass
|
|
from typing import List
|
|
from pathlib import Path
|
|
|
|
|
|
@dataclass(frozen=True) # Immutable
|
|
class APIConfig:
|
|
"""Configuration for external APIs."""
|
|
openai_key: str
|
|
node_api_url: str
|
|
timeout_seconds: int = 30
|
|
|
|
|
|
@dataclass(frozen=True)
|
|
class ScraperConfig:
|
|
"""Configuration for news scraping."""
|
|
sources: List[str]
|
|
max_articles: int = 10
|
|
timeout_seconds: int = 10
|
|
|
|
|
|
@dataclass(frozen=True)
|
|
class Config:
|
|
"""Main configuration object."""
|
|
api: APIConfig
|
|
scraper: ScraperConfig
|
|
log_level: str = "INFO"
|
|
|
|
@classmethod
|
|
def from_env(cls) -> Config:
|
|
"""Load configuration from environment variables.
|
|
|
|
Returns:
|
|
Loaded configuration
|
|
|
|
Raises:
|
|
ValueError: If required environment variables are missing
|
|
"""
|
|
openai_key = os.getenv("OPENAI_API_KEY")
|
|
if not openai_key:
|
|
raise ValueError("OPENAI_API_KEY environment variable required")
|
|
|
|
node_api_url = os.getenv("NODE_API_URL", "http://localhost:3000")
|
|
|
|
sources_str = os.getenv("NEWS_SOURCES", "")
|
|
sources = [s.strip() for s in sources_str.split(",") if s.strip()]
|
|
|
|
if not sources:
|
|
raise ValueError("NEWS_SOURCES environment variable required")
|
|
|
|
return cls(
|
|
api=APIConfig(
|
|
openai_key=openai_key,
|
|
node_api_url=node_api_url
|
|
),
|
|
scraper=ScraperConfig(
|
|
sources=sources
|
|
)
|
|
)
|
|
```
|
|
|
|
---
|
|
|
|
## ERROR HANDLING STRATEGY
|
|
|
|
### 1. Define Custom Exceptions
|
|
|
|
```python
|
|
"""Custom exceptions for Feed Generator."""
|
|
|
|
class FeedGeneratorError(Exception):
|
|
"""Base exception for all Feed Generator errors."""
|
|
pass
|
|
|
|
|
|
class ScrapingError(FeedGeneratorError):
|
|
"""Raised when scraping fails."""
|
|
pass
|
|
|
|
|
|
class ImageAnalysisError(FeedGeneratorError):
|
|
"""Raised when image analysis fails."""
|
|
pass
|
|
|
|
|
|
class APIClientError(FeedGeneratorError):
|
|
"""Raised when API communication fails."""
|
|
pass
|
|
```
|
|
|
|
### 2. Use Specific Error Handling
|
|
|
|
```python
|
|
def scrape_news(url: str) -> Dict[str, str]:
|
|
"""Scrape news from URL.
|
|
|
|
Raises:
|
|
ScrapingError: If scraping fails
|
|
"""
|
|
try:
|
|
response = requests.get(url, timeout=10)
|
|
response.raise_for_status()
|
|
except requests.Timeout as e:
|
|
raise ScrapingError(f"Timeout scraping {url}") from e
|
|
except requests.RequestException as e:
|
|
raise ScrapingError(f"Failed to scrape {url}") from e
|
|
|
|
try:
|
|
return response.json()
|
|
except ValueError as e:
|
|
raise ScrapingError(f"Invalid JSON from {url}") from e
|
|
```
|
|
|
|
### 3. Log Before Raising
|
|
|
|
```python
|
|
def critical_operation() -> None:
|
|
"""Perform critical operation."""
|
|
try:
|
|
result = dangerous_call()
|
|
except SpecificError as e:
|
|
logger.error(f"Critical operation failed: {e}", exc_info=True)
|
|
raise # Re-raise after logging
|
|
```
|
|
|
|
---
|
|
|
|
## TESTING REQUIREMENTS
|
|
|
|
### Every Module MUST Have Tests
|
|
|
|
```python
|
|
"""Test module for scraper.py"""
|
|
|
|
import pytest
|
|
from unittest.mock import Mock, patch
|
|
|
|
from src.scraper import NewsScraper
|
|
from src.config import ScraperConfig
|
|
from src.exceptions import ScrapingError
|
|
|
|
|
|
def test_scraper_success() -> None:
|
|
"""Test successful scraping."""
|
|
config = ScraperConfig(sources=["https://example.com"])
|
|
scraper = NewsScraper(config)
|
|
|
|
with patch('requests.get') as mock_get:
|
|
mock_response = Mock()
|
|
mock_response.ok = True
|
|
mock_response.json.return_value = {"title": "Test"}
|
|
mock_get.return_value = mock_response
|
|
|
|
result = scraper.scrape("https://example.com")
|
|
|
|
assert result is not None
|
|
assert result["title"] == "Test"
|
|
|
|
|
|
def test_scraper_timeout() -> None:
|
|
"""Test scraping timeout."""
|
|
config = ScraperConfig(sources=["https://example.com"])
|
|
scraper = NewsScraper(config)
|
|
|
|
with patch('requests.get', side_effect=requests.Timeout):
|
|
with pytest.raises(ScrapingError):
|
|
scraper.scrape("https://example.com")
|
|
```
|
|
|
|
---
|
|
|
|
## LOGGING STRATEGY
|
|
|
|
### Standard Logger Setup
|
|
|
|
```python
|
|
import logging
|
|
import sys
|
|
|
|
def setup_logging(level: str = "INFO") -> None:
|
|
"""Setup logging configuration.
|
|
|
|
Args:
|
|
level: Logging level (DEBUG, INFO, WARNING, ERROR)
|
|
"""
|
|
logging.basicConfig(
|
|
level=getattr(logging, level.upper()),
|
|
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
|
|
handlers=[
|
|
logging.StreamHandler(sys.stdout),
|
|
logging.FileHandler('feed_generator.log')
|
|
]
|
|
)
|
|
|
|
# In each module
|
|
logger = logging.getLogger(__name__)
|
|
```
|
|
|
|
### Logging Best Practices
|
|
|
|
```python
|
|
# ✅ REQUIRED - Structured logging
|
|
logger.info(f"Scraping {url}", extra={"url": url, "attempt": 1})
|
|
|
|
# ✅ REQUIRED - Log exceptions with context
|
|
try:
|
|
result = operation()
|
|
except Exception as e:
|
|
logger.error(f"Operation failed", exc_info=True, extra={"context": data})
|
|
raise
|
|
|
|
# ❌ FORBIDDEN - Print statements
|
|
print("Debug info") # Use logger.debug() instead
|
|
```
|
|
|
|
---
|
|
|
|
## DEPENDENCIES MANAGEMENT
|
|
|
|
### requirements.txt Structure
|
|
|
|
```txt
|
|
# Core dependencies
|
|
requests==2.31.0
|
|
beautifulsoup4==4.12.2
|
|
openai==1.3.0
|
|
|
|
# Utilities
|
|
python-dotenv==1.0.0
|
|
|
|
# Testing
|
|
pytest==7.4.3
|
|
pytest-cov==4.1.0
|
|
|
|
# Type checking
|
|
mypy==1.7.1
|
|
types-requests==2.31.0
|
|
```
|
|
|
|
### Installing Dependencies
|
|
|
|
```bash
|
|
# Create virtual environment
|
|
python -m venv venv
|
|
source venv/bin/activate # On Windows: venv\Scripts\activate
|
|
|
|
# Install dependencies
|
|
pip install -r requirements.txt
|
|
|
|
# Install in development mode
|
|
pip install -e .
|
|
```
|
|
|
|
---
|
|
|
|
## TYPE CHECKING WITH MYPY
|
|
|
|
### mypy.ini Configuration
|
|
|
|
```ini
|
|
[mypy]
|
|
python_version = 3.11
|
|
warn_return_any = True
|
|
warn_unused_configs = True
|
|
disallow_untyped_defs = True
|
|
disallow_any_unimported = True
|
|
no_implicit_optional = True
|
|
warn_redundant_casts = True
|
|
warn_unused_ignores = True
|
|
warn_no_return = True
|
|
check_untyped_defs = True
|
|
strict_equality = True
|
|
```
|
|
|
|
### Running Type Checks
|
|
|
|
```bash
|
|
# Type check all code
|
|
mypy src/
|
|
|
|
# MUST pass before committing
|
|
```
|
|
|
|
---
|
|
|
|
## COMMON PATTERNS
|
|
|
|
### 1. Retry Logic
|
|
|
|
```python
|
|
from typing import Callable, TypeVar
|
|
import time
|
|
|
|
T = TypeVar('T')
|
|
|
|
def retry(
|
|
func: Callable[..., T],
|
|
max_attempts: int = 3,
|
|
delay_seconds: float = 1.0
|
|
) -> T:
|
|
"""Retry a function with exponential backoff.
|
|
|
|
Args:
|
|
func: Function to retry
|
|
max_attempts: Maximum number of attempts
|
|
delay_seconds: Initial delay between retries
|
|
|
|
Returns:
|
|
Function result
|
|
|
|
Raises:
|
|
Exception: Last exception if all retries fail
|
|
"""
|
|
last_exception: Optional[Exception] = None
|
|
|
|
for attempt in range(max_attempts):
|
|
try:
|
|
return func()
|
|
except Exception as e:
|
|
last_exception = e
|
|
if attempt < max_attempts - 1:
|
|
sleep_time = delay_seconds * (2 ** attempt)
|
|
logger.warning(
|
|
f"Attempt {attempt + 1} failed, retrying in {sleep_time}s",
|
|
extra={"exception": str(e)}
|
|
)
|
|
time.sleep(sleep_time)
|
|
|
|
raise last_exception # type: ignore
|
|
```
|
|
|
|
### 2. Data Validation
|
|
|
|
```python
|
|
from dataclasses import dataclass
|
|
|
|
@dataclass
|
|
class Article:
|
|
"""Validated article data."""
|
|
title: str
|
|
url: str
|
|
image_url: Optional[str] = None
|
|
|
|
def __post_init__(self) -> None:
|
|
"""Validate data after initialization."""
|
|
if not self.title:
|
|
raise ValueError("Title cannot be empty")
|
|
if not self.url.startswith(('http://', 'https://')):
|
|
raise ValueError(f"Invalid URL: {self.url}")
|
|
```
|
|
|
|
### 3. Context Managers for Resources
|
|
|
|
```python
|
|
from contextlib import contextmanager
|
|
from typing import Generator
|
|
|
|
@contextmanager
|
|
def api_client(config: APIConfig) -> Generator[APIClient, None, None]:
|
|
"""Context manager for API client.
|
|
|
|
Yields:
|
|
Configured API client
|
|
"""
|
|
client = APIClient(config)
|
|
try:
|
|
client.connect()
|
|
yield client
|
|
finally:
|
|
client.disconnect()
|
|
|
|
# Usage
|
|
with api_client(config) as client:
|
|
result = client.call()
|
|
```
|
|
|
|
---
|
|
|
|
## WORKING WITH EXTERNAL APIS
|
|
|
|
### OpenAI GPT-4 Vision
|
|
|
|
```python
|
|
from openai import OpenAI
|
|
from typing import Optional
|
|
|
|
class ImageAnalyzer:
|
|
"""Analyze images using GPT-4 Vision."""
|
|
|
|
def __init__(self, api_key: str) -> None:
|
|
self._client = OpenAI(api_key=api_key)
|
|
|
|
def analyze_image(self, image_url: str, prompt: str) -> Optional[str]:
|
|
"""Analyze image with custom prompt.
|
|
|
|
Args:
|
|
image_url: URL of image to analyze
|
|
prompt: Analysis prompt
|
|
|
|
Returns:
|
|
Analysis result or None if failed
|
|
|
|
Raises:
|
|
ImageAnalysisError: If analysis fails
|
|
"""
|
|
try:
|
|
response = self._client.chat.completions.create(
|
|
model="gpt-4o",
|
|
messages=[{
|
|
"role": "user",
|
|
"content": [
|
|
{"type": "text", "text": prompt},
|
|
{"type": "image_url", "image_url": {"url": image_url}}
|
|
]
|
|
}],
|
|
max_tokens=300
|
|
)
|
|
return response.choices[0].message.content
|
|
except Exception as e:
|
|
logger.error(f"Image analysis failed: {e}")
|
|
raise ImageAnalysisError(f"Failed to analyze {image_url}") from e
|
|
```
|
|
|
|
### Calling Node.js API
|
|
|
|
```python
|
|
import requests
|
|
from typing import Dict, Any
|
|
|
|
class ArticleAPIClient:
|
|
"""Client for Node.js article generation API."""
|
|
|
|
def __init__(self, base_url: str, timeout: int = 30) -> None:
|
|
self._base_url = base_url.rstrip('/')
|
|
self._timeout = timeout
|
|
|
|
def generate_article(
|
|
self,
|
|
topic: str,
|
|
context: str,
|
|
image_description: Optional[str] = None
|
|
) -> Dict[str, Any]:
|
|
"""Generate article via API.
|
|
|
|
Args:
|
|
topic: Article topic
|
|
context: Context information
|
|
image_description: Optional image description
|
|
|
|
Returns:
|
|
Generated article data
|
|
|
|
Raises:
|
|
APIClientError: If API call fails
|
|
"""
|
|
payload = {
|
|
"topic": topic,
|
|
"context": context,
|
|
}
|
|
if image_description:
|
|
payload["image_description"] = image_description
|
|
|
|
try:
|
|
response = requests.post(
|
|
f"{self._base_url}/api/generate",
|
|
json=payload,
|
|
timeout=self._timeout
|
|
)
|
|
response.raise_for_status()
|
|
return response.json()
|
|
except requests.RequestException as e:
|
|
logger.error(f"API call failed: {e}")
|
|
raise APIClientError("Article generation failed") from e
|
|
```
|
|
|
|
---
|
|
|
|
## WHEN TO ASK FOR HUMAN INPUT
|
|
|
|
Claude Code MUST ask before:
|
|
|
|
1. **Changing module structure** - Architecture changes
|
|
2. **Adding new dependencies** - New libraries
|
|
3. **Changing configuration format** - Breaking changes
|
|
4. **Implementing complex logic** - Business rules
|
|
5. **Error handling strategy** - Recovery approaches
|
|
6. **Performance optimizations** - Trade-offs
|
|
|
|
Claude Code CAN proceed without asking:
|
|
|
|
1. **Adding type hints** - Always required
|
|
2. **Adding logging** - Always beneficial
|
|
3. **Adding tests** - Always needed
|
|
4. **Fixing obvious bugs** - Clear errors
|
|
5. **Improving documentation** - Clarity improvements
|
|
6. **Refactoring for clarity** - Same behavior, better code
|
|
|
|
---
|
|
|
|
## DEVELOPMENT WORKFLOW
|
|
|
|
### 1. Start with Types and Interfaces
|
|
|
|
```python
|
|
# Define data structures FIRST
|
|
from dataclasses import dataclass
|
|
from typing import List, Optional
|
|
|
|
@dataclass
|
|
class NewsArticle:
|
|
title: str
|
|
url: str
|
|
content: str
|
|
image_url: Optional[str] = None
|
|
|
|
@dataclass
|
|
class AnalyzedArticle:
|
|
news: NewsArticle
|
|
image_description: Optional[str] = None
|
|
```
|
|
|
|
### 2. Implement Core Logic
|
|
|
|
```python
|
|
# Then implement with clear types
|
|
def scrape_news(url: str) -> List[NewsArticle]:
|
|
"""Implementation with clear contract."""
|
|
pass
|
|
```
|
|
|
|
### 3. Add Tests
|
|
|
|
```python
|
|
def test_scrape_news() -> None:
|
|
"""Test before considering feature complete."""
|
|
pass
|
|
```
|
|
|
|
### 4. Integrate
|
|
|
|
```python
|
|
def pipeline() -> None:
|
|
"""Combine modules with clear flow."""
|
|
articles = scrape_news(url)
|
|
analyzed = analyze_images(articles)
|
|
generated = generate_articles(analyzed)
|
|
publish_feed(generated)
|
|
```
|
|
|
|
---
|
|
|
|
## CRITICAL REMINDERS
|
|
|
|
1. **Type hints are NOT optional** - Every function must be typed
|
|
2. **Error handling is NOT optional** - Every external call must have error handling
|
|
3. **Logging is NOT optional** - Every significant operation must be logged
|
|
4. **Tests are NOT optional** - Every module must have tests
|
|
5. **Configuration is NOT optional** - No hardcoded values
|
|
|
|
**If you find yourself thinking "I'll add types/tests/docs later"** - STOP. Do it now.
|
|
|
|
**If code works but isn't typed/tested/documented** - It's NOT done.
|
|
|
|
**This is NOT Node.js with its loose culture** - Python gives us the tools for rigor, USE THEM.
|
|
|
|
---
|
|
|
|
## SUCCESS CRITERIA
|
|
|
|
A module is complete when:
|
|
|
|
- ✅ All functions have type hints
|
|
- ✅ `mypy` passes with no errors
|
|
- ✅ All tests pass
|
|
- ✅ Test coverage > 80%
|
|
- ✅ No print statements (use logger)
|
|
- ✅ No bare excepts
|
|
- ✅ No magic strings (use Enums)
|
|
- ✅ Documentation is clear and complete
|
|
- ✅ Error handling is explicit
|
|
- ✅ Configuration is externalized
|
|
|
|
**If ANY of these is missing, the module is NOT complete.** |