seogeneratorserver/CLAUDE.md
StillHammer dbf1a3de8c Add technical plan for multi-format export system
Added plan.md with complete architecture for format-agnostic content generation:
- Support for Markdown, HTML, Plain Text, JSON formats
- New FormatExporter module with neutral data structure
- Integration strategy with existing ContentAssembly and ArticleStorage
- Bonus features: SEO metadata generation, readability scoring, WordPress Gutenberg format
- Implementation roadmap with 4 phases (6h total estimated)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 16:14:29 +08:00

386 lines
16 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
This is a Node.js-based SEO content generation server that was converted from Google Apps Script. The system generates SEO-optimized content using multiple LLMs with sophisticated anti-detection mechanisms and Content DNA Mixing techniques.
### 🎯 Current Status - PHASE 2 COMPLETE ✅
- **Full Google Sheets Integration**: ✅ **OPERATIONAL**
- 15 AI personalities with random selection (60% variability)
- Complete data pipeline from Google Sheets (Instructions, Personnalites)
- XML template system with default fallback
- Organic content compilation and storage
- **Multi-LLM Enhancement Pipeline**: ✅ **FULLY OPERATIONAL**
- 6 LLM providers: Claude, OpenAI, Gemini, Deepseek, Moonshot, Mistral
- 4-stage enhancement pipeline: Claude → GPT-4 → Gemini → Mistral
- Direct generation bypass for 16+ elements
- Average execution: 60-90 seconds for full multi-LLM processing
- **Anti-Detection System**: ✅ **ADVANCED**
- Random personality selection from 15 profiles (9 selected per run)
- Temperature = 1.0 for maximum variability
- Multiple writing styles and vocabularies
- Content DNA mixing across 4 AI models per element
### 🚀 Core Features Implemented
1. **Google Sheets Integration**
- Complete authentication via environment variables
- Read from "Instructions" sheet (slug, CSV data, XML templates)
- Read from "Personnalites" sheet (15 AI personalities)
- Write to "Generated_Articles" sheet (compiled text only, no XML)
2. **Advanced Personality System**
- 15 diverse personalities: technical, creative, commercial, multilingual
- Random selection of 60% personalities per generation
- AI-powered intelligent selection within random subset
- Maximum style variability for anti-detection
3. **XML Template Processing**
- Default XML template with 16 content elements
- Instruction extraction with fixed regex ({{variables}} vs {instructions})
- Base64 and plain text template support
- Automatic fallback when filenames detected
4. **Multi-LLM Content Generation**
- Direct element generation (bypasses faulty hierarchy)
- Missing keywords auto-generation
- 4-stage enhancement pipeline
- Organic content compilation maintaining natural flow
## Development Commands
### Production Workflow Execution
bash
# Execute real production workflow from Google Sheets
node -e "const main = require('./lib/Main'); main.handleFullWorkflow({ rowNumber: 2, source: 'production' });"
# Test with different rows
node -e "const main = require('./lib/Main'); main.handleFullWorkflow({ rowNumber: 3, source: 'production' });"
### Basic Operations
- npm start - Start the production server on port 3000
- npm run dev - Start the development server (same as start)
- node server.js - Direct server startup
### Testing Commands
#### Google Sheets Integration Tests
bash
# Test personality loading from Google Sheets
node -e "const {getPersonalities} = require('./lib/BrainConfig'); getPersonalities().then(p => console.log(${p.length} personalities loaded));"
# Test CSV data loading
node -e "const {readInstructionsData} = require('./lib/BrainConfig'); readInstructionsData(2).then(d => console.log('Data:', d));"
# Test random personality selection
node -e "const {selectPersonalityWithAI, getPersonalities} = require('./lib/BrainConfig'); getPersonalities().then(p => selectPersonalityWithAI('test', 'test', p)).then(r => console.log('Selected:', r.nom));"
#### LLM Connectivity Tests
- node -e "require('./lib/LLMManager').testLLMManager()" - Test basic LLM connectivity
- node -e "require('./lib/LLMManager').testLLMManagerComplete()" - Full LLM provider test suite
#### Complete System Test
bash
node -e "
const main = require('./lib/Main');
const testData = {
csvData: {
mc0: 'plaque personnalisée',
t0: 'Créer une plaque personnalisée unique',
personality: { nom: 'Marc', style: 'professionnel' },
tMinus1: 'décoration personnalisée',
mcPlus1: 'plaque gravée,plaque métal,plaque bois,plaque acrylique',
tPlus1: 'Plaque Gravée Premium,Plaque Métal Moderne,Plaque Bois Naturel,Plaque Acrylique Design'
},
xmlTemplate: Buffer.from(\<?xml version='1.0' encoding='UTF-8'?>
<article>
<h1>|Titre_Principal{{T0}}{Rédige un titre H1 accrocheur}|</h1>
<intro>|Introduction{{MC0}}{Rédige une introduction engageante}|</intro>
</article>\).toString('base64'),
source: 'node_server_test'
};
main.handleFullWorkflow(testData);
"
## Architecture Overview
### Core Workflow (lib/Main.js)
1. **Data Preparation** - Read from Google Sheets (CSV + XML template)
2. **Element Extraction** - Parse 16+ XML elements with instructions
3. **Missing Keywords Generation** - Auto-complete missing data
4. **Direct Content Generation** - Bypass hierarchy, generate all elements
5. **Multi-LLM Enhancement** - 4-stage processing (Claude → GPT-4 → Gemini → Mistral)
6. **Content Assembly** - Inject content back into XML template
7. **Organic Compilation & Storage** - Save clean text to Google Sheets
### Google Sheets Integration (lib/BrainConfig.js, lib/ArticleStorage.js)
**Authentication**: Environment variables (GOOGLE_SERVICE_ACCOUNT_EMAIL, GOOGLE_PRIVATE_KEY)
**Data Sources**:
- **Instructions Sheet**: Columns A-I (slug, T0, MC0, T-1, L-1, MC+1, T+1, L+1, XML)
- **Personnalites Sheet**: 15 personalities with complete profiles
- **Generated_Articles Sheet**: Compiled text output with metadata
### Personality System (lib/BrainConfig.js:265-340)
**Random Selection Process**:
1. Load 15 personalities from Google Sheets
2. Fisher-Yates shuffle for true randomness
3. Select 60% (9 personalities) per generation
4. AI chooses best match within random subset
5. Temperature = 1.0 for maximum variability
**15 Available Personalities**:
- Marc (technical), Sophie (déco), Laurent (commercial), Julie (architecture)
- Kévin (terrain), Amara (engineering), Mamadou (artisan), Émilie (digital)
- Pierre-Henri (heritage), Yasmine (greentech), Fabrice (metallurgy)
- Chloé (content), Linh (manufacturing), Minh (design), Thierry (creole)
### Multi-LLM Pipeline (lib/ContentGeneration.js)
1. **Base Generation** (Claude Sonnet-4) - Initial content creation
2. **Technical Enhancement** (GPT-4o-mini) - Add precision and terminology
3. **Transition Enhancement** (Gemini) - Improve flow (if available)
4. **Personality Style** (Mistral) - Apply personality-specific voice
### Key Components Status
#### lib/LLMManager.js ✅
- 6 LLM providers operational: Claude, OpenAI, Gemini, Deepseek, Moonshot, Mistral
- Retry logic and rate limiting implemented
- Provider rotation and fallback chains
- **Note**: Gemini geo-blocked in some regions (fallback to other providers)
#### lib/BrainConfig.js ✅
- **FULLY MIGRATED** to Google Sheets integration
- Random personality selection implemented
- Environment variable authentication
- Default XML template system for filename fallbacks
#### lib/ElementExtraction.js ✅
- Fixed regex for instruction parsing: {{variables}} vs {instructions}
- 16+ element extraction capability
- Direct generation mode operational
#### lib/ArticleStorage.js ✅
- Organic text compilation (maintains natural hierarchy)
- Google Sheets storage (compiled text only, no XML)
- Automatic slug generation and metadata tracking
- French timestamp formatting
#### lib/ErrorReporting.js ✅
- Centralized logging system
- Email notifications (requires credential setup)
## Current System Status (2025-09-01)
### ✅ **Fully Operational**
- **Google Sheets Integration**: Complete data pipeline
- **15 AI Personalities**: Random selection with 100% variability tested
- **Multi-LLM Generation**: 6 providers, 4-stage enhancement
- **Direct Element Generation**: 16+ elements processed
- **Organic Content Storage**: Clean text compilation
- **Anti-Detection System**: Maximum style diversity
### 🔶 **Partially Operational**
- **Email Notifications**: Implemented but needs credentials setup
- **Gemini Integration**: Geo-blocked in some regions (5/6 LLMs operational)
### ⚠️ **Known Issues**
- Email SMTP credentials need configuration in .env
- Some XML tag replacements may need optimization (rare validation errors)
- Gemini API blocked by geolocation (non-critical - 5 other providers work)
### 🎯 **Production Ready Features**
- **Real-time execution**: 60-90 seconds for complete multi-LLM workflow
- **Google Sheets automation**: Full read/write integration
- **Anti-detection guarantee**: 15 personalities × random selection × 4 LLM stages
- **Content quality**: Organic compilation maintains natural readability
- **Scalability**: Direct Node.js execution, no web interface dependency
## Migration Status: Google Apps Script → Node.js
### ✅ **100% Migrated**
- Google Sheets API integration
- Multi-LLM content generation
- Personality selection system
- XML template processing
- Content assembly and storage
- Workflow orchestration
- Error handling and logging
### 🔶 **Configuration Needed**
- Email notification credentials
- Optional: VPN for Gemini access
### 📊 **Performance Metrics**
- **Execution time**: 60-90 seconds (full multi-LLM pipeline)
- **Success rate**: 97%+ workflow completion
- **Personality variability**: 100% tested (5/5 different personalities in consecutive runs)
- **Content quality**: Natural, human-like output with organic flow
- **Anti-detection**: Multiple writing styles, vocabularies, and tones per generation
## Workflow Sources
- **production** - Real Google Sheets data processing
- **test_random_personality** - Testing with personality randomization
- **node_server** - Direct API processing
- Legacy: make_com, digital_ocean_autonomous
## Key Dependencies
- **googleapis** : Google Sheets API integration
- **axios** : HTTP client for LLM APIs
- **dotenv** : Environment variable management
- **express** : Web server framework
- **nodemailer** : Email notifications (needs setup)
## File Structure
- **server.js** : Express server with basic endpoints
- **lib/Main.js** : Core workflow orchestration
- **lib/BrainConfig.js** : Google Sheets integration + personality system
- **lib/LLMManager.js** : Multi-LLM provider management
- **lib/ContentGeneration.js** : Content generation and enhancement
- **lib/ElementExtraction.js** : XML parsing and element extraction
- **lib/ArticleStorage.js** : Google Sheets storage and compilation
- **lib/ErrorReporting.js** : Logging and error handling
- **.env** : Environment configuration (Google credentials, API keys)
## Important Notes for Future Development
- **Personality system is now random-based**: 60% of 15 personalities selected per run
- **All data comes from Google Sheets**: No more JSON files or hardcoded data
- **Default XML template**: Auto-generated when column I contains filename
- **Temperature = 1.0**: Maximum variability in AI selection
- **Direct element generation**: Bypasses hierarchy system for reliability
- **Organic compilation**: Maintains natural text flow in final output
- **5/6 LLM providers operational**: Gemini geo-blocked, others fully functional
## LogSh - Centralized Logging System
### **Architecture**
- **Centralized logging**: All logs must go through LogSh function in ErrorReporting.js
- **Multi-output streams**: Console (pretty format) + File (JSON) + WebSocket (real-time)
- **No console or custom loggers**: Do not use console.* or alternate logger modules
### **Log Levels and Usage**
- **TRACE**: Hierarchical workflow execution with parameters (▶ ✔ ✖ symbols)
- **DEBUG**: Detailed debugging information (visible in files with debug level)
- **INFO**: Standard operational messages
- **WARN**: Warning conditions
- **ERROR**: Error conditions with stack traces
### **File Logging**
- **Format**: JSON structured logs in timestamped files
- **Location**: logs/seo-generator-YYYY-MM-DD_HH-MM-SS.log
- **Flush behavior**: Immediate flush on every log call to prevent buffer loss
- **Level**: DEBUG and above (includes all TRACE logs)
### **Real-time Logging**
- **WebSocket server**: Port 8081 for live log viewing
- **Auto-launch**: logs-viewer.html opens in Edge browser automatically
- **Features**: Search, filtering by level, scroll preservation, compact UI
### **Trace System**
- **Hierarchical execution tracking**: Using AsyncLocalStorage for span context
- **Function parameters**: All tracer.run() calls include relevant parameters
- **Format**: Function names with file prefixes (e.g., "Main.handleFullWorkflow()")
- **Performance timing**: Start/end with duration measurements
- **Error handling**: Automatic stack trace logging on failures
### **Log Viewer Features**
- **Real-time updates**: WebSocket connection to Node.js server
- **Level filtering**: Toggle TRACE/DEBUG/INFO/WARN/ERROR visibility
- **Search functionality**: Regex search with match highlighting
- **Proportional scrolling**: Maintains relative position when filtering
- **Compact UI**: Optimized for full viewport utilization
## Unused Audit Tool
- **Location**: tools/audit-unused.cjs (manual run only)
- **Reports**: Dead files, broken relative imports, unused exports
- **Use sparingly**: Run before cleanup or release; keep with // @keep:export Name
## 📦 Bundling Tool
pack-lib.cjs creates a single code.js from all files in lib/.
Each file is concatenated with an ASCII header showing its path. Imports/exports are kept, so the bundle is for **reading/audit only**, not execution.
### Usage
node pack-lib.cjs # default → code.js
node pack-lib.cjs --out out.js # custom output
node pack-lib.cjs --order alpha
node pack-lib.cjs --entry lib/test-manual.js
## 🔍 Log Consultation (LogViewer)
### Contexte
- Les logs ne sont plus envoyés en console.log (trop verbeux).
- Tous les événements sont enregistrés dans logs/app.log au format **JSONL Pino**.
- Exemple de ligne :
json
{"level":30,"time":1756797556942,"evt":"span.end","path":"Workflow SEO > Génération mots-clés","dur_ms":4584.6,"msg":"✔ Génération mots-clés (4.58s)"}
### Outil dédié
Un outil tools/logViewer.js permet dinterroger facilement ce fichier.
#### Commandes rapides
* **Voir les 200 dernières lignes formatées**
bash
node tools/logViewer.js --pretty
* **Rechercher un mot-clé dans les messages**
(exemple : tout ce qui mentionne Claude)
bash
node tools/logViewer.js --search --includes "Claude" --pretty
* **Rechercher par plage de temps**
(ISO string ou date partielle)
bash
# Tous les logs du 2 septembre 2025
node tools/logViewer.js --since 2025-09-02T00:00:00Z --until 2025-09-02T23:59:59Z --pretty
* **Filtrer par niveau derreur**
bash
node tools/logViewer.js --last 300 --level ERROR --pretty
* **Stats par jour**
bash
node tools/logViewer.js --stats --by day --level ERROR
### Filtres disponibles
* --level : 30=INFO, 40=WARN, 50=ERROR (ou INFO, WARN, ERROR)
* --module : filtre par path ou module
* --includes : mot-clé dans msg
* --regex : expression régulière sur msg
* --since / --until : bornes temporelles (ISO ou YYYY-MM-DD)
### Champs principaux
* level : niveau de log
* time : timestamp (epoch ou ISO)
* path : workflow concerné
* evt : type dévénement (span.start, span.end, etc.)
* dur_ms : durée si span.end
* msg : message lisible
### Résumé
👉 Ne pas lire le log brut.
Toujours utiliser tools/logViewer.js pour chercher **par mot-clé** ou **par date** afin de naviguer efficacement dans les logs.