# CLAUDE.md - Feed Generator Project Instructions ```markdown # CLAUDE.md - Feed Generator Development Instructions > **CRITICAL**: This document contains mandatory rules for AI-assisted development with Claude Code. > **NEVER** deviate from these rules without explicit human approval. --- ## PROJECT OVERVIEW **Feed Generator** is a Python-based content aggregation system that: 1. Scrapes news from web sources 2. Analyzes images using GPT-4 Vision 3. Aggregates content into structured prompts 4. Calls existing Node.js article generation API 5. Publishes to feeds (RSS/WordPress) **Philosophy**: Quick, functional prototype. NOT a production system yet. **Timeline**: 3-5 days maximum for V1. **Future**: May be rewritten in Node.js/TypeScript with strict architecture. --- ## CORE PRINCIPLES ### 1. Type Safety is MANDATORY **NEVER write untyped Python code.** ```python # ❌ FORBIDDEN - No type hints def scrape_news(url): return requests.get(url) # ✅ REQUIRED - Full type hints from typing import List, Dict, Optional import requests def scrape_news(url: str) -> Optional[Dict[str, str]]: response: requests.Response = requests.get(url) return response.json() if response.ok else None ``` **Rules:** - Every function MUST have type hints for parameters and return values - Use `typing` module: `List`, `Dict`, `Optional`, `Union`, `Tuple` - Use `from __future__ import annotations` for forward references - Complex types should use `TypedDict` or `dataclasses` ### 2. Explicit is Better Than Implicit **NEVER use magic or implicit behavior.** ```python # ❌ FORBIDDEN - Implicit dictionary keys def process(data): return data['title'] # What if 'title' doesn't exist? # ✅ REQUIRED - Explicit with error handling def process(data: Dict[str, str]) -> str: if 'title' not in data: raise ValueError("Missing required key: 'title'") return data['title'] ``` ### 3. Fail Fast and Loud **NEVER silently swallow errors.** ```python # ❌ FORBIDDEN - Silent failure try: result = dangerous_operation() except: result = None # ✅ REQUIRED - Explicit error handling try: result = dangerous_operation() except SpecificException as e: logger.error(f"Operation failed: {e}") raise ``` ### 4. Single Responsibility Modules **Each module has ONE clear purpose.** - `scraper.py` - ONLY scraping logic - `image_analyzer.py` - ONLY image analysis - `article_client.py` - ONLY API communication - `aggregator.py` - ONLY content aggregation - `publisher.py` - ONLY feed publishing **NEVER mix responsibilities.** --- ## FORBIDDEN PATTERNS ### ❌ NEVER Use These ```python # 1. Bare except try: something() except: # ❌ FORBIDDEN pass # 2. Mutable default arguments def func(items=[]): # ❌ FORBIDDEN items.append(1) return items # 3. Global state CACHE = {} # ❌ FORBIDDEN at module level def use_cache(): CACHE['key'] = 'value' # 4. Star imports from module import * # ❌ FORBIDDEN # 5. Untyped functions def process(data): # ❌ FORBIDDEN - no types return data # 6. Magic strings if mode == "production": # ❌ FORBIDDEN do_something() # 7. Implicit None returns def maybe_returns(): # ❌ FORBIDDEN - unclear return if condition: return value # 8. Nested functions for reuse def outer(): def inner(): # ❌ FORBIDDEN if used multiple times pass inner() inner() ``` ### ✅ REQUIRED Patterns ```python # 1. Specific exceptions try: something() except ValueError as e: # ✅ REQUIRED logger.error(f"Value error: {e}") raise # 2. Immutable defaults def func(items: Optional[List[str]] = None) -> List[str]: # ✅ REQUIRED if items is None: items = [] items.append('new') return items # 3. Explicit configuration objects from dataclasses import dataclass @dataclass class CacheConfig: max_size: int ttl_seconds: int cache = Cache(config=CacheConfig(max_size=100, ttl_seconds=60)) # 4. Explicit imports from module import SpecificClass, specific_function # ✅ REQUIRED # 5. Typed functions def process(data: Dict[str, Any]) -> Optional[str]: # ✅ REQUIRED return data.get('value') # 6. Enums for constants from enum import Enum class Mode(Enum): # ✅ REQUIRED PRODUCTION = "production" DEVELOPMENT = "development" if mode == Mode.PRODUCTION: do_something() # 7. Explicit Optional returns def maybe_returns() -> Optional[str]: # ✅ REQUIRED if condition: return value return None # 8. Extract functions to module level def inner_logic() -> None: # ✅ REQUIRED pass def outer() -> None: inner_logic() inner_logic() ``` --- ## MODULE STRUCTURE ### Standard Module Template Every module MUST follow this structure: ```python """ Module: module_name.py Purpose: [ONE sentence describing ONLY responsibility] Dependencies: [List external dependencies] """ from __future__ import annotations # Standard library imports import logging from typing import Dict, List, Optional # Third-party imports import requests from bs4 import BeautifulSoup # Local imports from .config import Config # Module-level logger logger = logging.getLogger(__name__) class ModuleName: """[Clear description of class responsibility]""" def __init__(self, config: Config) -> None: """Initialize with configuration. Args: config: Configuration object Raises: ValueError: If config is invalid """ self._config = config self._validate_config() def _validate_config(self) -> None: """Validate configuration.""" if not self._config.api_key: raise ValueError("API key is required") def public_method(self, param: str) -> Optional[Dict[str, str]]: """[Clear description] Args: param: [Description] Returns: [Description of return value] Raises: [Exceptions that can be raised] """ try: result = self._internal_logic(param) return result except SpecificException as e: logger.error(f"Failed to process {param}: {e}") raise def _internal_logic(self, param: str) -> Dict[str, str]: """Private methods use underscore prefix.""" return {"key": param} ``` --- ## CONFIGURATION MANAGEMENT **NEVER hardcode values. Use configuration objects.** ### config.py Structure ```python """Configuration management for Feed Generator.""" from __future__ import annotations import os from dataclasses import dataclass from typing import List from pathlib import Path @dataclass(frozen=True) # Immutable class APIConfig: """Configuration for external APIs.""" openai_key: str node_api_url: str timeout_seconds: int = 30 @dataclass(frozen=True) class ScraperConfig: """Configuration for news scraping.""" sources: List[str] max_articles: int = 10 timeout_seconds: int = 10 @dataclass(frozen=True) class Config: """Main configuration object.""" api: APIConfig scraper: ScraperConfig log_level: str = "INFO" @classmethod def from_env(cls) -> Config: """Load configuration from environment variables. Returns: Loaded configuration Raises: ValueError: If required environment variables are missing """ openai_key = os.getenv("OPENAI_API_KEY") if not openai_key: raise ValueError("OPENAI_API_KEY environment variable required") node_api_url = os.getenv("NODE_API_URL", "http://localhost:3000") sources_str = os.getenv("NEWS_SOURCES", "") sources = [s.strip() for s in sources_str.split(",") if s.strip()] if not sources: raise ValueError("NEWS_SOURCES environment variable required") return cls( api=APIConfig( openai_key=openai_key, node_api_url=node_api_url ), scraper=ScraperConfig( sources=sources ) ) ``` --- ## ERROR HANDLING STRATEGY ### 1. Define Custom Exceptions ```python """Custom exceptions for Feed Generator.""" class FeedGeneratorError(Exception): """Base exception for all Feed Generator errors.""" pass class ScrapingError(FeedGeneratorError): """Raised when scraping fails.""" pass class ImageAnalysisError(FeedGeneratorError): """Raised when image analysis fails.""" pass class APIClientError(FeedGeneratorError): """Raised when API communication fails.""" pass ``` ### 2. Use Specific Error Handling ```python def scrape_news(url: str) -> Dict[str, str]: """Scrape news from URL. Raises: ScrapingError: If scraping fails """ try: response = requests.get(url, timeout=10) response.raise_for_status() except requests.Timeout as e: raise ScrapingError(f"Timeout scraping {url}") from e except requests.RequestException as e: raise ScrapingError(f"Failed to scrape {url}") from e try: return response.json() except ValueError as e: raise ScrapingError(f"Invalid JSON from {url}") from e ``` ### 3. Log Before Raising ```python def critical_operation() -> None: """Perform critical operation.""" try: result = dangerous_call() except SpecificError as e: logger.error(f"Critical operation failed: {e}", exc_info=True) raise # Re-raise after logging ``` --- ## TESTING REQUIREMENTS ### Every Module MUST Have Tests ```python """Test module for scraper.py""" import pytest from unittest.mock import Mock, patch from src.scraper import NewsScraper from src.config import ScraperConfig from src.exceptions import ScrapingError def test_scraper_success() -> None: """Test successful scraping.""" config = ScraperConfig(sources=["https://example.com"]) scraper = NewsScraper(config) with patch('requests.get') as mock_get: mock_response = Mock() mock_response.ok = True mock_response.json.return_value = {"title": "Test"} mock_get.return_value = mock_response result = scraper.scrape("https://example.com") assert result is not None assert result["title"] == "Test" def test_scraper_timeout() -> None: """Test scraping timeout.""" config = ScraperConfig(sources=["https://example.com"]) scraper = NewsScraper(config) with patch('requests.get', side_effect=requests.Timeout): with pytest.raises(ScrapingError): scraper.scrape("https://example.com") ``` --- ## LOGGING STRATEGY ### Standard Logger Setup ```python import logging import sys def setup_logging(level: str = "INFO") -> None: """Setup logging configuration. Args: level: Logging level (DEBUG, INFO, WARNING, ERROR) """ logging.basicConfig( level=getattr(logging, level.upper()), format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', handlers=[ logging.StreamHandler(sys.stdout), logging.FileHandler('feed_generator.log') ] ) # In each module logger = logging.getLogger(__name__) ``` ### Logging Best Practices ```python # ✅ REQUIRED - Structured logging logger.info(f"Scraping {url}", extra={"url": url, "attempt": 1}) # ✅ REQUIRED - Log exceptions with context try: result = operation() except Exception as e: logger.error(f"Operation failed", exc_info=True, extra={"context": data}) raise # ❌ FORBIDDEN - Print statements print("Debug info") # Use logger.debug() instead ``` --- ## DEPENDENCIES MANAGEMENT ### requirements.txt Structure ```txt # Core dependencies requests==2.31.0 beautifulsoup4==4.12.2 openai==1.3.0 # Utilities python-dotenv==1.0.0 # Testing pytest==7.4.3 pytest-cov==4.1.0 # Type checking mypy==1.7.1 types-requests==2.31.0 ``` ### Installing Dependencies ```bash # Create virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install -r requirements.txt # Install in development mode pip install -e . ``` --- ## TYPE CHECKING WITH MYPY ### mypy.ini Configuration ```ini [mypy] python_version = 3.11 warn_return_any = True warn_unused_configs = True disallow_untyped_defs = True disallow_any_unimported = True no_implicit_optional = True warn_redundant_casts = True warn_unused_ignores = True warn_no_return = True check_untyped_defs = True strict_equality = True ``` ### Running Type Checks ```bash # Type check all code mypy src/ # MUST pass before committing ``` --- ## COMMON PATTERNS ### 1. Retry Logic ```python from typing import Callable, TypeVar import time T = TypeVar('T') def retry( func: Callable[..., T], max_attempts: int = 3, delay_seconds: float = 1.0 ) -> T: """Retry a function with exponential backoff. Args: func: Function to retry max_attempts: Maximum number of attempts delay_seconds: Initial delay between retries Returns: Function result Raises: Exception: Last exception if all retries fail """ last_exception: Optional[Exception] = None for attempt in range(max_attempts): try: return func() except Exception as e: last_exception = e if attempt < max_attempts - 1: sleep_time = delay_seconds * (2 ** attempt) logger.warning( f"Attempt {attempt + 1} failed, retrying in {sleep_time}s", extra={"exception": str(e)} ) time.sleep(sleep_time) raise last_exception # type: ignore ``` ### 2. Data Validation ```python from dataclasses import dataclass @dataclass class Article: """Validated article data.""" title: str url: str image_url: Optional[str] = None def __post_init__(self) -> None: """Validate data after initialization.""" if not self.title: raise ValueError("Title cannot be empty") if not self.url.startswith(('http://', 'https://')): raise ValueError(f"Invalid URL: {self.url}") ``` ### 3. Context Managers for Resources ```python from contextlib import contextmanager from typing import Generator @contextmanager def api_client(config: APIConfig) -> Generator[APIClient, None, None]: """Context manager for API client. Yields: Configured API client """ client = APIClient(config) try: client.connect() yield client finally: client.disconnect() # Usage with api_client(config) as client: result = client.call() ``` --- ## WORKING WITH EXTERNAL APIS ### OpenAI GPT-4 Vision ```python from openai import OpenAI from typing import Optional class ImageAnalyzer: """Analyze images using GPT-4 Vision.""" def __init__(self, api_key: str) -> None: self._client = OpenAI(api_key=api_key) def analyze_image(self, image_url: str, prompt: str) -> Optional[str]: """Analyze image with custom prompt. Args: image_url: URL of image to analyze prompt: Analysis prompt Returns: Analysis result or None if failed Raises: ImageAnalysisError: If analysis fails """ try: response = self._client.chat.completions.create( model="gpt-4o", messages=[{ "role": "user", "content": [ {"type": "text", "text": prompt}, {"type": "image_url", "image_url": {"url": image_url}} ] }], max_tokens=300 ) return response.choices[0].message.content except Exception as e: logger.error(f"Image analysis failed: {e}") raise ImageAnalysisError(f"Failed to analyze {image_url}") from e ``` ### Calling Node.js API ```python import requests from typing import Dict, Any class ArticleAPIClient: """Client for Node.js article generation API.""" def __init__(self, base_url: str, timeout: int = 30) -> None: self._base_url = base_url.rstrip('/') self._timeout = timeout def generate_article( self, topic: str, context: str, image_description: Optional[str] = None ) -> Dict[str, Any]: """Generate article via API. Args: topic: Article topic context: Context information image_description: Optional image description Returns: Generated article data Raises: APIClientError: If API call fails """ payload = { "topic": topic, "context": context, } if image_description: payload["image_description"] = image_description try: response = requests.post( f"{self._base_url}/api/generate", json=payload, timeout=self._timeout ) response.raise_for_status() return response.json() except requests.RequestException as e: logger.error(f"API call failed: {e}") raise APIClientError("Article generation failed") from e ``` --- ## WHEN TO ASK FOR HUMAN INPUT Claude Code MUST ask before: 1. **Changing module structure** - Architecture changes 2. **Adding new dependencies** - New libraries 3. **Changing configuration format** - Breaking changes 4. **Implementing complex logic** - Business rules 5. **Error handling strategy** - Recovery approaches 6. **Performance optimizations** - Trade-offs Claude Code CAN proceed without asking: 1. **Adding type hints** - Always required 2. **Adding logging** - Always beneficial 3. **Adding tests** - Always needed 4. **Fixing obvious bugs** - Clear errors 5. **Improving documentation** - Clarity improvements 6. **Refactoring for clarity** - Same behavior, better code --- ## DEVELOPMENT WORKFLOW ### 1. Start with Types and Interfaces ```python # Define data structures FIRST from dataclasses import dataclass from typing import List, Optional @dataclass class NewsArticle: title: str url: str content: str image_url: Optional[str] = None @dataclass class AnalyzedArticle: news: NewsArticle image_description: Optional[str] = None ``` ### 2. Implement Core Logic ```python # Then implement with clear types def scrape_news(url: str) -> List[NewsArticle]: """Implementation with clear contract.""" pass ``` ### 3. Add Tests ```python def test_scrape_news() -> None: """Test before considering feature complete.""" pass ``` ### 4. Integrate ```python def pipeline() -> None: """Combine modules with clear flow.""" articles = scrape_news(url) analyzed = analyze_images(articles) generated = generate_articles(analyzed) publish_feed(generated) ``` --- ## CRITICAL REMINDERS 1. **Type hints are NOT optional** - Every function must be typed 2. **Error handling is NOT optional** - Every external call must have error handling 3. **Logging is NOT optional** - Every significant operation must be logged 4. **Tests are NOT optional** - Every module must have tests 5. **Configuration is NOT optional** - No hardcoded values **If you find yourself thinking "I'll add types/tests/docs later"** - STOP. Do it now. **If code works but isn't typed/tested/documented** - It's NOT done. **This is NOT Node.js with its loose culture** - Python gives us the tools for rigor, USE THEM. --- ## SUCCESS CRITERIA A module is complete when: - ✅ All functions have type hints - ✅ `mypy` passes with no errors - ✅ All tests pass - ✅ Test coverage > 80% - ✅ No print statements (use logger) - ✅ No bare excepts - ✅ No magic strings (use Enums) - ✅ Documentation is clear and complete - ✅ Error handling is explicit - ✅ Configuration is externalized **If ANY of these is missing, the module is NOT complete.**