feedgenerator/CLAUDE.md

# CLAUDE.md - Feed Generator Project Instructions

```markdown
# CLAUDE.md - Feed Generator Development Instructions

> **CRITICAL**: This document contains mandatory rules for AI-assisted development with Claude Code.
> **NEVER** deviate from these rules without explicit human approval.

---

## PROJECT OVERVIEW

**Feed Generator** is a Python-based content aggregation system that:
1. Scrapes news from web sources
2. Analyzes images using GPT-4 Vision
3. Aggregates content into structured prompts
4. Calls existing Node.js article generation API
5. Publishes to feeds (RSS/WordPress)

**Philosophy**: Quick, functional prototype. NOT a production system yet.
**Timeline**: 3-5 days maximum for V1.
**Future**: May be rewritten in Node.js/TypeScript with strict architecture.

---

## CORE PRINCIPLES

### 1. Type Safety is MANDATORY

**NEVER write untyped Python code.**

```python
# ❌ FORBIDDEN - No type hints
def scrape_news(url):
    return requests.get(url)

# ✅ REQUIRED - Full type hints
from typing import List, Dict, Optional
import requests

def scrape_news(url: str) -> Optional[Dict[str, str]]:
    response: requests.Response = requests.get(url)
    return response.json() if response.ok else None
```

**Rules:**
- Every function MUST have type hints for parameters and return values
- Use `typing` module: `List`, `Dict`, `Optional`, `Union`, `Tuple`
- Use `from __future__ import annotations` for forward references
- Complex types should use `TypedDict` or `dataclasses`

### 2. Explicit is Better Than Implicit

**NEVER use magic or implicit behavior.**

```python
# ❌ FORBIDDEN - Implicit dictionary keys
def process(data):
    return data['title']  # What if 'title' doesn't exist?

# ✅ REQUIRED - Explicit with error handling
def process(data: Dict[str, str]) -> str:
    if 'title' not in data:
        raise ValueError("Missing required key: 'title'")
    return data['title']
```

### 3. Fail Fast and Loud

**NEVER silently swallow errors.**

```python
# ❌ FORBIDDEN - Silent failure
try:
    result = dangerous_operation()
except:
    result = None

# ✅ REQUIRED - Explicit error handling
try:
    result = dangerous_operation()
except SpecificException as e:
    logger.error(f"Operation failed: {e}")
    raise
```

### 4. Single Responsibility Modules

**Each module has ONE clear purpose.**

- `scraper.py` - ONLY scraping logic
- `image_analyzer.py` - ONLY image analysis
- `article_client.py` - ONLY API communication
- `aggregator.py` - ONLY content aggregation
- `publisher.py` - ONLY feed publishing

**NEVER mix responsibilities.**

---

## FORBIDDEN PATTERNS

### ❌ NEVER Use These

```python
# 1. Bare except
try:
    something()
except:  # ❌ FORBIDDEN
    pass

# 2. Mutable default arguments
def func(items=[]):  # ❌ FORBIDDEN
    items.append(1)
    return items

# 3. Global state
CACHE = {}  # ❌ FORBIDDEN at module level

def use_cache():
    CACHE['key'] = 'value'

# 4. Star imports
from module import *  # ❌ FORBIDDEN

# 5. Untyped functions
def process(data):  # ❌ FORBIDDEN - no types
    return data

# 6. Magic strings
if mode == "production":  # ❌ FORBIDDEN
    do_something()

# 7. Implicit None returns
def maybe_returns():  # ❌ FORBIDDEN - unclear return
    if condition:
        return value

# 8. Nested functions for reuse
def outer():
    def inner():  # ❌ FORBIDDEN if used multiple times
        pass
    inner()
    inner()
```

### ✅ REQUIRED Patterns

```python
# 1. Specific exceptions
try:
    something()
except ValueError as e:  # ✅ REQUIRED
    logger.error(f"Value error: {e}")
    raise

# 2. Immutable defaults
def func(items: Optional[List[str]] = None) -> List[str]:  # ✅ REQUIRED
    if items is None:
        items = []
    items.append('new')
    return items

# 3. Explicit configuration objects
from dataclasses import dataclass

@dataclass
class CacheConfig:
    max_size: int
    ttl_seconds: int

cache = Cache(config=CacheConfig(max_size=100, ttl_seconds=60))

# 4. Explicit imports
from module import SpecificClass, specific_function  # ✅ REQUIRED

# 5. Typed functions
def process(data: Dict[str, Any]) -> Optional[str]:  # ✅ REQUIRED
    return data.get('value')

# 6. Enums for constants
from enum import Enum

class Mode(Enum):  # ✅ REQUIRED
    PRODUCTION = "production"
    DEVELOPMENT = "development"

if mode == Mode.PRODUCTION:
    do_something()

# 7. Explicit Optional returns
def maybe_returns() -> Optional[str]:  # ✅ REQUIRED
    if condition:
        return value
    return None

# 8. Extract functions to module level
def inner_logic() -> None:  # ✅ REQUIRED
    pass

def outer() -> None:
    inner_logic()
    inner_logic()
```

---

## MODULE STRUCTURE

### Standard Module Template

Every module MUST follow this structure:

```python
"""
Module: module_name.py
Purpose: [ONE sentence describing ONLY responsibility]
Dependencies: [List external dependencies]
"""

from __future__ import annotations

# Standard library imports
import logging
from typing import Dict, List, Optional

# Third-party imports
import requests
from bs4 import BeautifulSoup

# Local imports
from .config import Config

# Module-level logger
logger = logging.getLogger(__name__)


class ModuleName:
    """[Clear description of class responsibility]"""

    def __init__(self, config: Config) -> None:
        """Initialize with configuration.

        Args:
            config: Configuration object

        Raises:
            ValueError: If config is invalid
        """
        self._config = config
        self._validate_config()

    def _validate_config(self) -> None:
        """Validate configuration."""
        if not self._config.api_key:
            raise ValueError("API key is required")

    def public_method(self, param: str) -> Optional[Dict[str, str]]:
        """[Clear description]

        Args:
            param: [Description]

        Returns:
            [Description of return value]

        Raises:
            [Exceptions that can be raised]
        """
        try:
            result = self._internal_logic(param)
            return result
        except SpecificException as e:
            logger.error(f"Failed to process {param}: {e}")
            raise

    def _internal_logic(self, param: str) -> Dict[str, str]:
        """Private methods use underscore prefix."""
        return {"key": param}
```

---

## CONFIGURATION MANAGEMENT

**NEVER hardcode values. Use configuration objects.**

### config.py Structure

```python
"""Configuration management for Feed Generator."""

from __future__ import annotations

import os
from dataclasses import dataclass
from typing import List
from pathlib import Path


@dataclass(frozen=True)  # Immutable
class APIConfig:
    """Configuration for external APIs."""
    openai_key: str
    node_api_url: str
    timeout_seconds: int = 30


@dataclass(frozen=True)
class ScraperConfig:
    """Configuration for news scraping."""
    sources: List[str]
    max_articles: int = 10
    timeout_seconds: int = 10


@dataclass(frozen=True)
class Config:
    """Main configuration object."""
    api: APIConfig
    scraper: ScraperConfig
    log_level: str = "INFO"

    @classmethod
    def from_env(cls) -> Config:
        """Load configuration from environment variables.

        Returns:
            Loaded configuration

        Raises:
            ValueError: If required environment variables are missing
        """
        openai_key = os.getenv("OPENAI_API_KEY")
        if not openai_key:
            raise ValueError("OPENAI_API_KEY environment variable required")

        node_api_url = os.getenv("NODE_API_URL", "http://localhost:3000")

        sources_str = os.getenv("NEWS_SOURCES", "")
        sources = [s.strip() for s in sources_str.split(",") if s.strip()]

        if not sources:
            raise ValueError("NEWS_SOURCES environment variable required")

        return cls(
            api=APIConfig(
                openai_key=openai_key,
                node_api_url=node_api_url
            ),
            scraper=ScraperConfig(
                sources=sources
            )
        )
```

---

## ERROR HANDLING STRATEGY

### 1. Define Custom Exceptions

```python
"""Custom exceptions for Feed Generator."""

class FeedGeneratorError(Exception):
    """Base exception for all Feed Generator errors."""
    pass


class ScrapingError(FeedGeneratorError):
    """Raised when scraping fails."""
    pass


class ImageAnalysisError(FeedGeneratorError):
    """Raised when image analysis fails."""
    pass


class APIClientError(FeedGeneratorError):
    """Raised when API communication fails."""
    pass
```

### 2. Use Specific Error Handling

```python
def scrape_news(url: str) -> Dict[str, str]:
    """Scrape news from URL.

    Raises:
        ScrapingError: If scraping fails
    """
    try:
        response = requests.get(url, timeout=10)
        response.raise_for_status()
    except requests.Timeout as e:
        raise ScrapingError(f"Timeout scraping {url}") from e
    except requests.RequestException as e:
        raise ScrapingError(f"Failed to scrape {url}") from e

    try:
        return response.json()
    except ValueError as e:
        raise ScrapingError(f"Invalid JSON from {url}") from e
```

### 3. Log Before Raising

```python
def critical_operation() -> None:
    """Perform critical operation."""
    try:
        result = dangerous_call()
    except SpecificError as e:
        logger.error(f"Critical operation failed: {e}", exc_info=True)
        raise  # Re-raise after logging
```

---

## TESTING REQUIREMENTS

### Every Module MUST Have Tests

```python
"""Test module for scraper.py"""

import pytest
from unittest.mock import Mock, patch

from src.scraper import NewsScraper
from src.config import ScraperConfig
from src.exceptions import ScrapingError


def test_scraper_success() -> None:
    """Test successful scraping."""
    config = ScraperConfig(sources=["https://example.com"])
    scraper = NewsScraper(config)

    with patch('requests.get') as mock_get:
        mock_response = Mock()
        mock_response.ok = True
        mock_response.json.return_value = {"title": "Test"}
        mock_get.return_value = mock_response

        result = scraper.scrape("https://example.com")

        assert result is not None
        assert result["title"] == "Test"


def test_scraper_timeout() -> None:
    """Test scraping timeout."""
    config = ScraperConfig(sources=["https://example.com"])
    scraper = NewsScraper(config)

    with patch('requests.get', side_effect=requests.Timeout):
        with pytest.raises(ScrapingError):
            scraper.scrape("https://example.com")
```

---

## LOGGING STRATEGY

### Standard Logger Setup

```python
import logging
import sys

def setup_logging(level: str = "INFO") -> None:
    """Setup logging configuration.

    Args:
        level: Logging level (DEBUG, INFO, WARNING, ERROR)
    """
    logging.basicConfig(
        level=getattr(logging, level.upper()),
        format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
        handlers=[
            logging.StreamHandler(sys.stdout),
            logging.FileHandler('feed_generator.log')
        ]
    )

# In each module
logger = logging.getLogger(__name__)
```

### Logging Best Practices

```python
# ✅ REQUIRED - Structured logging
logger.info(f"Scraping {url}", extra={"url": url, "attempt": 1})

# ✅ REQUIRED - Log exceptions with context
try:
    result = operation()
except Exception as e:
    logger.error(f"Operation failed", exc_info=True, extra={"context": data})
    raise

# ❌ FORBIDDEN - Print statements
print("Debug info")  # Use logger.debug() instead
```

---

## DEPENDENCIES MANAGEMENT

### requirements.txt Structure

```txt
# Core dependencies
requests==2.31.0
beautifulsoup4==4.12.2
openai==1.3.0

# Utilities
python-dotenv==1.0.0

# Testing
pytest==7.4.3
pytest-cov==4.1.0

# Type checking
mypy==1.7.1
types-requests==2.31.0
```

### Installing Dependencies

```bash
# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Install in development mode
pip install -e .
```

---

## TYPE CHECKING WITH MYPY

### mypy.ini Configuration

```ini
[mypy]
python_version = 3.11
warn_return_any = True
warn_unused_configs = True
disallow_untyped_defs = True
disallow_any_unimported = True
no_implicit_optional = True
warn_redundant_casts = True
warn_unused_ignores = True
warn_no_return = True
check_untyped_defs = True
strict_equality = True
```

### Running Type Checks

```bash
# Type check all code
mypy src/

# MUST pass before committing
```

---

## COMMON PATTERNS

### 1. Retry Logic

```python
from typing import Callable, TypeVar
import time

T = TypeVar('T')

def retry(
    func: Callable[..., T],
    max_attempts: int = 3,
    delay_seconds: float = 1.0
) -> T:
    """Retry a function with exponential backoff.

    Args:
        func: Function to retry
        max_attempts: Maximum number of attempts
        delay_seconds: Initial delay between retries

    Returns:
        Function result

    Raises:
        Exception: Last exception if all retries fail
    """
    last_exception: Optional[Exception] = None

    for attempt in range(max_attempts):
        try:
            return func()
        except Exception as e:
            last_exception = e
            if attempt < max_attempts - 1:
                sleep_time = delay_seconds * (2 ** attempt)
                logger.warning(
                    f"Attempt {attempt + 1} failed, retrying in {sleep_time}s",
                    extra={"exception": str(e)}
                )
                time.sleep(sleep_time)

    raise last_exception  # type: ignore
```

### 2. Data Validation

```python
from dataclasses import dataclass

@dataclass
class Article:
    """Validated article data."""
    title: str
    url: str
    image_url: Optional[str] = None

    def __post_init__(self) -> None:
        """Validate data after initialization."""
        if not self.title:
            raise ValueError("Title cannot be empty")
        if not self.url.startswith(('http://', 'https://')):
            raise ValueError(f"Invalid URL: {self.url}")
```

### 3. Context Managers for Resources

```python
from contextlib import contextmanager
from typing import Generator

@contextmanager
def api_client(config: APIConfig) -> Generator[APIClient, None, None]:
    """Context manager for API client.

    Yields:
        Configured API client
    """
    client = APIClient(config)
    try:
        client.connect()
        yield client
    finally:
        client.disconnect()

# Usage
with api_client(config) as client:
    result = client.call()
```

---

## WORKING WITH EXTERNAL APIS

### OpenAI GPT-4 Vision

```python
from openai import OpenAI
from typing import Optional

class ImageAnalyzer:
    """Analyze images using GPT-4 Vision."""

    def __init__(self, api_key: str) -> None:
        self._client = OpenAI(api_key=api_key)

    def analyze_image(self, image_url: str, prompt: str) -> Optional[str]:
        """Analyze image with custom prompt.

        Args:
            image_url: URL of image to analyze
            prompt: Analysis prompt

        Returns:
            Analysis result or None if failed

        Raises:
            ImageAnalysisError: If analysis fails
        """
        try:
            response = self._client.chat.completions.create(
                model="gpt-4o",
                messages=[{
                    "role": "user",
                    "content": [
                        {"type": "text", "text": prompt},
                        {"type": "image_url", "image_url": {"url": image_url}}
                    ]
                }],
                max_tokens=300
            )
            return response.choices[0].message.content
        except Exception as e:
            logger.error(f"Image analysis failed: {e}")
            raise ImageAnalysisError(f"Failed to analyze {image_url}") from e
```

### Calling Node.js API

```python
import requests
from typing import Dict, Any

class ArticleAPIClient:
    """Client for Node.js article generation API."""

    def __init__(self, base_url: str, timeout: int = 30) -> None:
        self._base_url = base_url.rstrip('/')
        self._timeout = timeout

    def generate_article(
        self,
        topic: str,
        context: str,
        image_description: Optional[str] = None
    ) -> Dict[str, Any]:
        """Generate article via API.

        Args:
            topic: Article topic
            context: Context information
            image_description: Optional image description

        Returns:
            Generated article data

        Raises:
            APIClientError: If API call fails
        """
        payload = {
            "topic": topic,
            "context": context,
        }
        if image_description:
            payload["image_description"] = image_description

        try:
            response = requests.post(
                f"{self._base_url}/api/generate",
                json=payload,
                timeout=self._timeout
            )
            response.raise_for_status()
            return response.json()
        except requests.RequestException as e:
            logger.error(f"API call failed: {e}")
            raise APIClientError("Article generation failed") from e
```

---

## WHEN TO ASK FOR HUMAN INPUT

Claude Code MUST ask before:

1. **Changing module structure** - Architecture changes
2. **Adding new dependencies** - New libraries
3. **Changing configuration format** - Breaking changes
4. **Implementing complex logic** - Business rules
5. **Error handling strategy** - Recovery approaches
6. **Performance optimizations** - Trade-offs

Claude Code CAN proceed without asking:

1. **Adding type hints** - Always required
2. **Adding logging** - Always beneficial
3. **Adding tests** - Always needed
4. **Fixing obvious bugs** - Clear errors
5. **Improving documentation** - Clarity improvements
6. **Refactoring for clarity** - Same behavior, better code

---

## DEVELOPMENT WORKFLOW

### 1. Start with Types and Interfaces

```python
# Define data structures FIRST
from dataclasses import dataclass
from typing import List, Optional

@dataclass
class NewsArticle:
    title: str
    url: str
    content: str
    image_url: Optional[str] = None

@dataclass
class AnalyzedArticle:
    news: NewsArticle
    image_description: Optional[str] = None
```

### 2. Implement Core Logic

```python
# Then implement with clear types
def scrape_news(url: str) -> List[NewsArticle]:
    """Implementation with clear contract."""
    pass
```

### 3. Add Tests

```python
def test_scrape_news() -> None:
    """Test before considering feature complete."""
    pass
```

### 4. Integrate

```python
def pipeline() -> None:
    """Combine modules with clear flow."""
    articles = scrape_news(url)
    analyzed = analyze_images(articles)
    generated = generate_articles(analyzed)
    publish_feed(generated)
```

---

## CRITICAL REMINDERS

1. **Type hints are NOT optional** - Every function must be typed
2. **Error handling is NOT optional** - Every external call must have error handling
3. **Logging is NOT optional** - Every significant operation must be logged
4. **Tests are NOT optional** - Every module must have tests
5. **Configuration is NOT optional** - No hardcoded values

**If you find yourself thinking "I'll add types/tests/docs later"** - STOP. Do it now.

**If code works but isn't typed/tested/documented** - It's NOT done.

**This is NOT Node.js with its loose culture** - Python gives us the tools for rigor, USE THEM.

---

## SUCCESS CRITERIA

A module is complete when:

- ✅ All functions have type hints
- ✅ `mypy` passes with no errors
- ✅ All tests pass
- ✅ Test coverage > 80%
- ✅ No print statements (use logger)
- ✅ No bare excepts
- ✅ No magic strings (use Enums)
- ✅ Documentation is clear and complete
- ✅ Error handling is explicit
- ✅ Configuration is externalized

**If ANY of these is missing, the module is NOT complete.**