StillHammer dbf1a3de8c Add technical plan for multi-format export system

Added plan.md with complete architecture for format-agnostic content generation:
- Support for Markdown, HTML, Plain Text, JSON formats
- New FormatExporter module with neutral data structure
- Integration strategy with existing ContentAssembly and ArticleStorage
- Bonus features: SEO metadata generation, readability scoring, WordPress Gutenberg format
- Implementation roadmap with 4 phases (6h total estimated)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-18 16:14:29 +08:00

16 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is a Node.js-based SEO content generation server that was converted from Google Apps Script. The system generates SEO-optimized content using multiple LLMs with sophisticated anti-detection mechanisms and Content DNA Mixing techniques.

🎯 Current Status - PHASE 2 COMPLETE ✅

Full Google Sheets Integration: ✅ OPERATIONAL
- 15 AI personalities with random selection (60% variability)
- Complete data pipeline from Google Sheets (Instructions, Personnalites)
- XML template system with default fallback
- Organic content compilation and storage
Multi-LLM Enhancement Pipeline: ✅ FULLY OPERATIONAL
- 6 LLM providers: Claude, OpenAI, Gemini, Deepseek, Moonshot, Mistral
- 4-stage enhancement pipeline: Claude → GPT-4 → Gemini → Mistral
- Direct generation bypass for 16+ elements
- Average execution: 60-90 seconds for full multi-LLM processing
Anti-Detection System: ✅ ADVANCED
- Random personality selection from 15 profiles (9 selected per run)
- Temperature = 1.0 for maximum variability
- Multiple writing styles and vocabularies
- Content DNA mixing across 4 AI models per element

🚀 Core Features Implemented

Google Sheets Integration
- Complete authentication via environment variables
- Read from "Instructions" sheet (slug, CSV data, XML templates)
- Read from "Personnalites" sheet (15 AI personalities)
- Write to "Generated_Articles" sheet (compiled text only, no XML)
Advanced Personality System
- 15 diverse personalities: technical, creative, commercial, multilingual
- Random selection of 60% personalities per generation
- AI-powered intelligent selection within random subset
- Maximum style variability for anti-detection
XML Template Processing
- Default XML template with 16 content elements
- Instruction extraction with fixed regex ({{variables}} vs {instructions})
- Base64 and plain text template support
- Automatic fallback when filenames detected
Multi-LLM Content Generation
- Direct element generation (bypasses faulty hierarchy)
- Missing keywords auto-generation
- 4-stage enhancement pipeline
- Organic content compilation maintaining natural flow

Development Commands

Production Workflow Execution

bash

Execute real production workflow from Google Sheets

node -e "const main = require('./lib/Main'); main.handleFullWorkflow({ rowNumber: 2, source: 'production' });"

Test with different rows

node -e "const main = require('./lib/Main'); main.handleFullWorkflow({ rowNumber: 3, source: 'production' });"

Basic Operations

npm start - Start the production server on port 3000
npm run dev - Start the development server (same as start)
node server.js - Direct server startup

Testing Commands

Google Sheets Integration Tests

bash

Test personality loading from Google Sheets

node -e "const {getPersonalities} = require('./lib/BrainConfig'); getPersonalities().then(p => console.log(${p.length} personalities loaded));"

Test CSV data loading

node -e "const {readInstructionsData} = require('./lib/BrainConfig'); readInstructionsData(2).then(d => console.log('Data:', d));"

Test random personality selection

node -e "const {selectPersonalityWithAI, getPersonalities} = require('./lib/BrainConfig'); getPersonalities().then(p => selectPersonalityWithAI('test', 'test', p)).then(r => console.log('Selected:', r.nom));"

LLM Connectivity Tests

node -e "require('./lib/LLMManager').testLLMManager()" - Test basic LLM connectivity
node -e "require('./lib/LLMManager').testLLMManagerComplete()" - Full LLM provider test suite

Complete System Test

bash node -e " const main = require('./lib/Main'); const testData = { csvData: { mc0: 'plaque personnalisée', t0: 'Créer une plaque personnalisée unique', personality: { nom: 'Marc', style: 'professionnel' }, tMinus1: 'décoration personnalisée', mcPlus1: 'plaque gravée,plaque métal,plaque bois,plaque acrylique', tPlus1: 'Plaque Gravée Premium,Plaque Métal Moderne,Plaque Bois Naturel,Plaque Acrylique Design' }, xmlTemplate: Buffer.from(<?xml version='1.0' encoding='UTF-8'?>

|Titre_Principal{{T0}}{Rédige un titre H1 accrocheur}|

|Introduction{{MC0}}{Rédige une introduction engageante}|

\).toString('base64'), source: 'node_server_test' }; main.handleFullWorkflow(testData); "

Architecture Overview

Core Workflow (lib/Main.js)

Data Preparation - Read from Google Sheets (CSV + XML template)
Element Extraction - Parse 16+ XML elements with instructions
Missing Keywords Generation - Auto-complete missing data
Direct Content Generation - Bypass hierarchy, generate all elements
Multi-LLM Enhancement - 4-stage processing (Claude → GPT-4 → Gemini → Mistral)
Content Assembly - Inject content back into XML template
Organic Compilation & Storage - Save clean text to Google Sheets

Google Sheets Integration (lib/BrainConfig.js, lib/ArticleStorage.js)

Authentication: Environment variables (GOOGLE_SERVICE_ACCOUNT_EMAIL, GOOGLE_PRIVATE_KEY)

Data Sources:

Instructions Sheet: Columns A-I (slug, T0, MC0, T-1, L-1, MC+1, T+1, L+1, XML)
Personnalites Sheet: 15 personalities with complete profiles
Generated_Articles Sheet: Compiled text output with metadata

Personality System (lib/BrainConfig.js:265-340)

Random Selection Process:

Load 15 personalities from Google Sheets
Fisher-Yates shuffle for true randomness
Select 60% (9 personalities) per generation
AI chooses best match within random subset
Temperature = 1.0 for maximum variability

15 Available Personalities:

Marc (technical), Sophie (déco), Laurent (commercial), Julie (architecture)
Kévin (terrain), Amara (engineering), Mamadou (artisan), Émilie (digital)
Pierre-Henri (heritage), Yasmine (greentech), Fabrice (metallurgy)
Chloé (content), Linh (manufacturing), Minh (design), Thierry (creole)

Multi-LLM Pipeline (lib/ContentGeneration.js)

Base Generation (Claude Sonnet-4) - Initial content creation
Technical Enhancement (GPT-4o-mini) - Add precision and terminology
Transition Enhancement (Gemini) - Improve flow (if available)
Personality Style (Mistral) - Apply personality-specific voice

Key Components Status

lib/LLMManager.js ✅

6 LLM providers operational: Claude, OpenAI, Gemini, Deepseek, Moonshot, Mistral
Retry logic and rate limiting implemented
Provider rotation and fallback chains
Note: Gemini geo-blocked in some regions (fallback to other providers)

lib/BrainConfig.js ✅

FULLY MIGRATED to Google Sheets integration
Random personality selection implemented
Environment variable authentication
Default XML template system for filename fallbacks

lib/ElementExtraction.js ✅

Fixed regex for instruction parsing: {{variables}} vs {instructions}
16+ element extraction capability
Direct generation mode operational

lib/ArticleStorage.js ✅

Organic text compilation (maintains natural hierarchy)
Google Sheets storage (compiled text only, no XML)
Automatic slug generation and metadata tracking
French timestamp formatting

lib/ErrorReporting.js ✅

Centralized logging system
Email notifications (requires credential setup)

Current System Status (2025-09-01)

✅ Fully Operational

Google Sheets Integration: Complete data pipeline
15 AI Personalities: Random selection with 100% variability tested
Multi-LLM Generation: 6 providers, 4-stage enhancement
Direct Element Generation: 16+ elements processed
Organic Content Storage: Clean text compilation
Anti-Detection System: Maximum style diversity

🔶 Partially Operational

Email Notifications: Implemented but needs credentials setup
Gemini Integration: Geo-blocked in some regions (5/6 LLMs operational)

⚠️ Known Issues

Email SMTP credentials need configuration in .env
Some XML tag replacements may need optimization (rare validation errors)
Gemini API blocked by geolocation (non-critical - 5 other providers work)

🎯 Production Ready Features

Real-time execution: 60-90 seconds for complete multi-LLM workflow
Google Sheets automation: Full read/write integration
Anti-detection guarantee: 15 personalities × random selection × 4 LLM stages
Content quality: Organic compilation maintains natural readability
Scalability: Direct Node.js execution, no web interface dependency

Migration Status: Google Apps Script → Node.js

✅ 100% Migrated

Google Sheets API integration
Multi-LLM content generation
Personality selection system
XML template processing
Content assembly and storage
Workflow orchestration
Error handling and logging

🔶 Configuration Needed

Email notification credentials
Optional: VPN for Gemini access

📊 Performance Metrics

Execution time: 60-90 seconds (full multi-LLM pipeline)
Success rate: 97%+ workflow completion
Personality variability: 100% tested (5/5 different personalities in consecutive runs)
Content quality: Natural, human-like output with organic flow
Anti-detection: Multiple writing styles, vocabularies, and tones per generation

Workflow Sources

production - Real Google Sheets data processing
test_random_personality - Testing with personality randomization
node_server - Direct API processing
Legacy: make_com, digital_ocean_autonomous

Key Dependencies

googleapis : Google Sheets API integration
axios : HTTP client for LLM APIs
dotenv : Environment variable management
express : Web server framework
nodemailer : Email notifications (needs setup)

File Structure

server.js : Express server with basic endpoints
lib/Main.js : Core workflow orchestration
lib/BrainConfig.js : Google Sheets integration + personality system
lib/LLMManager.js : Multi-LLM provider management
lib/ContentGeneration.js : Content generation and enhancement
lib/ElementExtraction.js : XML parsing and element extraction
lib/ArticleStorage.js : Google Sheets storage and compilation
lib/ErrorReporting.js : Logging and error handling
.env : Environment configuration (Google credentials, API keys)

Important Notes for Future Development

Personality system is now random-based: 60% of 15 personalities selected per run
All data comes from Google Sheets: No more JSON files or hardcoded data
Default XML template: Auto-generated when column I contains filename
Temperature = 1.0: Maximum variability in AI selection
Direct element generation: Bypasses hierarchy system for reliability
Organic compilation: Maintains natural text flow in final output
5/6 LLM providers operational: Gemini geo-blocked, others fully functional

LogSh - Centralized Logging System

Architecture

Centralized logging: All logs must go through LogSh function in ErrorReporting.js
Multi-output streams: Console (pretty format) + File (JSON) + WebSocket (real-time)
No console or custom loggers: Do not use console.* or alternate logger modules

Log Levels and Usage

TRACE: Hierarchical workflow execution with parameters (▶ ✔ ✖ symbols)
DEBUG: Detailed debugging information (visible in files with debug level)
INFO: Standard operational messages
WARN: Warning conditions
ERROR: Error conditions with stack traces

File Logging

Format: JSON structured logs in timestamped files
Location: logs/seo-generator-YYYY-MM-DD_HH-MM-SS.log
Flush behavior: Immediate flush on every log call to prevent buffer loss
Level: DEBUG and above (includes all TRACE logs)

Real-time Logging

WebSocket server: Port 8081 for live log viewing
Auto-launch: logs-viewer.html opens in Edge browser automatically
Features: Search, filtering by level, scroll preservation, compact UI

Trace System

Hierarchical execution tracking: Using AsyncLocalStorage for span context
Function parameters: All tracer.run() calls include relevant parameters
Format: Function names with file prefixes (e.g., "Main.handleFullWorkflow()")
Performance timing: Start/end with duration measurements
Error handling: Automatic stack trace logging on failures

Log Viewer Features

Real-time updates: WebSocket connection to Node.js server
Level filtering: Toggle TRACE/DEBUG/INFO/WARN/ERROR visibility
Search functionality: Regex search with match highlighting
Proportional scrolling: Maintains relative position when filtering
Compact UI: Optimized for full viewport utilization

Unused Audit Tool

Location: tools/audit-unused.cjs (manual run only)
Reports: Dead files, broken relative imports, unused exports
Use sparingly: Run before cleanup or release; keep with // @keep:export Name

📦 Bundling Tool

pack-lib.cjs creates a single code.js from all files in lib/.
Each file is concatenated with an ASCII header showing its path. Imports/exports are kept, so the bundle is for reading/audit only, not execution.

Usage

node pack-lib.cjs # default → code.js node pack-lib.cjs --out out.js # custom output node pack-lib.cjs --order alpha node pack-lib.cjs --entry lib/test-manual.js

🔍 Log Consultation (LogViewer)

Contexte

Les logs ne sont plus envoyés en console.log (trop verbeux).
Tous les événements sont enregistrés dans logs/app.log au format JSONL Pino.
Exemple de ligne : json {"level":30,"time":1756797556942,"evt":"span.end","path":"Workflow SEO > Génération mots-clés","dur_ms":4584.6,"msg":"✔ Génération mots-clés (4.58s)"}

Outil dédié

Un outil tools/logViewer.js permet d’interroger facilement ce fichier.

Commandes rapides

Voir les 200 dernières lignes formatées

bash node tools/logViewer.js --pretty
Rechercher un mot-clé dans les messages (exemple : tout ce qui mentionne Claude)

bash node tools/logViewer.js --search --includes "Claude" --pretty
Rechercher par plage de temps (ISO string ou date partielle)

bash

Tous les logs du 2 septembre 2025

node tools/logViewer.js --since 2025-09-02T00:00:00Z --until 2025-09-02T23:59:59Z --pretty
Filtrer par niveau d’erreur

bash node tools/logViewer.js --last 300 --level ERROR --pretty
Stats par jour

bash node tools/logViewer.js --stats --by day --level ERROR

Filtres disponibles

--level : 30=INFO, 40=WARN, 50=ERROR (ou INFO, WARN, ERROR)
--module : filtre par path ou module
--includes : mot-clé dans msg
--regex : expression régulière sur msg
--since / --until : bornes temporelles (ISO ou YYYY-MM-DD)

Champs principaux

level : niveau de log
time : timestamp (epoch ou ISO)
path : workflow concerné
evt : type d’événement (span.start, span.end, etc.)
dur_ms : durée si span.end
msg : message lisible

Résumé

👉 Ne pas lire le log brut. Toujours utiliser tools/logViewer.js pour chercher par mot-clé ou par date afin de naviguer efficacement dans les logs.

16 KiB Raw Blame History Unescape Escape