sourcefinder/docs/MODULAR_ARCHITECTURE.md
Alexis Trouvé a7bd6115b7
Some checks failed
SourceFinder CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
SourceFinder CI/CD Pipeline / Unit Tests (push) Has been cancelled
SourceFinder CI/CD Pipeline / Security Tests (push) Has been cancelled
SourceFinder CI/CD Pipeline / Integration Tests (push) Has been cancelled
SourceFinder CI/CD Pipeline / Performance Tests (push) Has been cancelled
SourceFinder CI/CD Pipeline / Code Coverage Report (push) Has been cancelled
SourceFinder CI/CD Pipeline / Build & Deployment Validation (16.x) (push) Has been cancelled
SourceFinder CI/CD Pipeline / Build & Deployment Validation (18.x) (push) Has been cancelled
SourceFinder CI/CD Pipeline / Build & Deployment Validation (20.x) (push) Has been cancelled
SourceFinder CI/CD Pipeline / Regression Tests (push) Has been cancelled
SourceFinder CI/CD Pipeline / Security Audit (push) Has been cancelled
SourceFinder CI/CD Pipeline / Notify Results (push) Has been cancelled
feat: Implémentation complète du système SourceFinder avec tests
- Architecture modulaire avec injection de dépendances
- Système de scoring intelligent multi-facteurs (spécificité, fraîcheur, qualité, réutilisation)
- Moteur anti-injection 4 couches (preprocessing, patterns, sémantique, pénalités)
- API REST complète avec validation et rate limiting
- Repository JSON avec index mémoire et backup automatique
- Provider LLM modulaire pour génération de contenu
- Suite de tests complète (Jest) :
  * Tests unitaires pour sécurité et scoring
  * Tests d'intégration API end-to-end
  * Tests de sécurité avec simulation d'attaques
  * Tests de performance et charge
- Pipeline CI/CD avec GitHub Actions
- Logging structuré et monitoring
- Configuration ESLint et environnement de test

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-15 23:06:10 +08:00

16 KiB

🏗️ ARCHITECTURE ULTRA-MODULAIRE - SourceFinder

Version modulaire, gratuite, full LLM avec composants interchangeables


🎯 Principe architectural

Règle d'or : Chaque composant respecte une interface stricte et peut être remplacé sans impacter les autres.

// ❌ Couplage fort (mauvais)
const mongodb = require('mongodb');
const puppeteer = require('puppeteer');
class NewsService {
  async search() {
    const db = mongodb.connect(); // Couplé à MongoDB
    const browser = puppeteer.launch(); // Couplé à Puppeteer
  }
}

// ✅ Architecture modulaire (bon)
class NewsService {
  constructor(stockRepo, newsProvider, scorer) {
    this.stock = stockRepo; // Interface IStockRepository
    this.provider = newsProvider; // Interface INewsProvider
    this.scorer = scorer; // Interface IScoringEngine
  }
}

🔌 Interfaces Core

INewsProvider - Fournisseur d'actualités

// src/interfaces/INewsProvider.js
class INewsProvider {
  /**
   * Recherche d'actualités par critères
   * @param {SearchQuery} query - Critères de recherche
   * @returns {Promise<NewsItem[]>} - Articles trouvés
   */
  async searchNews(query) {
    throw new Error('Must implement searchNews()');
  }

  /**
   * Validation des résultats
   * @param {NewsItem[]} results - Articles à valider
   * @returns {Promise<NewsItem[]>} - Articles validés
   */
  async validateResults(results) {
    throw new Error('Must implement validateResults()');
  }

  /**
   * Métadonnées du provider
   * @returns {ProviderMetadata} - Infos provider
   */
  getMetadata() {
    throw new Error('Must implement getMetadata()');
  }
}

// Types
const SearchQuery = {
  raceCode: String,      // "352-1"
  keywords: [String],    // ["santé", "comportement"]
  maxAge: Number,        // Jours
  sources: [String],     // ["premium", "standard"]
  limit: Number          // Nombre max résultats
};

const NewsItem = {
  id: String,
  title: String,
  content: String,
  url: String,
  publishDate: Date,
  sourceType: String,    // "premium", "standard", "fallback"
  sourceDomain: String,
  metadata: Object
};

IStockRepository - Stockage d'articles

// src/interfaces/IStockRepository.js
class IStockRepository {
  async save(newsItem) {
    throw new Error('Must implement save()');
  }

  async findByRaceCode(raceCode, options = {}) {
    throw new Error('Must implement findByRaceCode()');
  }

  async findByScore(minScore, options = {}) {
    throw new Error('Must implement findByScore()');
  }

  async updateUsage(id, usageData) {
    throw new Error('Must implement updateUsage()');
  }

  async cleanup(criteria) {
    throw new Error('Must implement cleanup()');
  }

  async getStats() {
    throw new Error('Must implement getStats()');
  }
}

IScoringEngine - Moteur de scoring

// src/interfaces/IScoringEngine.js
class IScoringEngine {
  async scoreArticle(article, context) {
    throw new Error('Must implement scoreArticle()');
  }

  async batchScore(articles, context) {
    throw new Error('Must implement batchScore()');
  }

  getWeights() {
    throw new Error('Must implement getWeights()');
  }
}

🧠 Implémentation LLM (par défaut)

LLMNewsProvider - Recherche via LLM

// src/implementations/providers/LLMNewsProvider.js
const { INewsProvider } = require('../../interfaces/INewsProvider');
const OpenAI = require('openai');

class LLMNewsProvider extends INewsProvider {
  constructor(config) {
    super();
    this.openai = new OpenAI({ apiKey: config.apiKey });
    this.model = config.model || 'gpt-4o-mini';
    this.maxTokens = config.maxTokens || 2000;
  }

  async searchNews(query) {
    const prompt = this.buildSearchPrompt(query);

    const response = await this.openai.chat.completions.create({
      model: this.model,
      messages: [{ role: 'user', content: prompt }],
      max_tokens: this.maxTokens,
      temperature: 0.3
    });

    return this.parseResults(response.choices[0].message.content);
  }

  buildSearchPrompt(query) {
    return `
Recherche d'actualités canines spécialisées:

Race ciblée: ${query.raceCode} (code FCI)
Mots-clés: ${query.keywords.join(', ')}
Période: ${query.maxAge} derniers jours
Sources préférées: ${query.sources.join(', ')}

Trouve ${query.limit} articles récents et pertinents.

Retourne UNIQUEMENT du JSON valide:
[
  {
    "title": "Titre article",
    "content": "Résumé 200 mots",
    "url": "https://source.com/article",
    "publishDate": "2025-09-15",
    "sourceType": "premium|standard|fallback",
    "sourceDomain": "example.com",
    "metadata": {
      "relevanceScore": 0.9,
      "specialization": "health|behavior|legislation|general"
    }
  }
]
    `;
  }

  async parseResults(response) {
    try {
      const results = JSON.parse(response);
      return results.map(item => ({
        ...item,
        id: require('uuid').v4(),
        publishDate: new Date(item.publishDate),
        extractedAt: new Date()
      }));
    } catch (error) {
      console.error('Failed to parse LLM response:', error);
      return [];
    }
  }

  async validateResults(results) {
    // Anti-prompt injection sur résultats LLM
    return results.filter(result => {
      return this.isValidContent(result.content) &&
             this.isValidUrl(result.url) &&
             this.isRecentEnough(result.publishDate);
    });
  }

  getMetadata() {
    return {
      type: 'llm',
      provider: 'openai',
      model: this.model,
      capabilities: ['search', 'summarize', 'validate'],
      costPerRequest: 0.02,
      avgResponseTime: 3000
    };
  }
}

module.exports = LLMNewsProvider;

💾 Implémentation JSON (par défaut)

JSONStockRepository - Stockage fichiers JSON

// src/implementations/storage/JSONStockRepository.js
const { IStockRepository } = require('../../interfaces/IStockRepository');
const fs = require('fs').promises;
const path = require('path');

class JSONStockRepository extends IStockRepository {
  constructor(config) {
    super();
    this.dataPath = config.dataPath || './data/stock';
    this.indexPath = path.join(this.dataPath, 'index.json');
    this.memoryIndex = new Map(); // Performance cache
    this.initialized = false;
  }

  async init() {
    if (this.initialized) return;

    await fs.mkdir(this.dataPath, { recursive: true });

    try {
      const indexData = await fs.readFile(this.indexPath, 'utf8');
      const index = JSON.parse(indexData);

      // Charger index en mémoire
      for (const [key, value] of Object.entries(index)) {
        this.memoryIndex.set(key, value);
      }
    } catch (error) {
      // Créer nouvel index si inexistant
      await this.saveIndex();
    }

    this.initialized = true;
  }

  async save(newsItem) {
    await this.init();

    const id = newsItem.id || require('uuid').v4();
    const filePath = path.join(this.dataPath, `${id}.json`);

    // Sauvegarder article
    await fs.writeFile(filePath, JSON.stringify(newsItem, null, 2));

    // Mettre à jour index
    this.memoryIndex.set(id, {
      id,
      raceCode: newsItem.raceCode,
      sourceType: newsItem.sourceType,
      finalScore: newsItem.finalScore,
      publishDate: newsItem.publishDate,
      usageCount: newsItem.usageCount || 0,
      lastUsed: newsItem.lastUsed,
      filePath
    });

    await this.saveIndex();
    return { ...newsItem, id };
  }

  async findByRaceCode(raceCode, options = {}) {
    await this.init();

    const results = [];
    for (const [id, indexEntry] of this.memoryIndex.entries()) {
      if (indexEntry.raceCode === raceCode) {
        if (options.minScore && indexEntry.finalScore < options.minScore) {
          continue;
        }

        const article = await this.loadArticle(id);
        results.push(article);
      }
    }

    return this.sortAndLimit(results, options);
  }

  async findByScore(minScore, options = {}) {
    await this.init();

    const results = [];
    for (const [id, indexEntry] of this.memoryIndex.entries()) {
      if (indexEntry.finalScore >= minScore) {
        const article = await this.loadArticle(id);
        results.push(article);
      }
    }

    return this.sortAndLimit(results, options);
  }

  async loadArticle(id) {
    const indexEntry = this.memoryIndex.get(id);
    if (!indexEntry) return null;

    const data = await fs.readFile(indexEntry.filePath, 'utf8');
    return JSON.parse(data);
  }

  async saveIndex() {
    const indexObj = Object.fromEntries(this.memoryIndex);
    await fs.writeFile(this.indexPath, JSON.stringify(indexObj, null, 2));
  }

  sortAndLimit(results, options) {
    let sorted = results.sort((a, b) => b.finalScore - a.finalScore);

    if (options.limit) {
      sorted = sorted.slice(0, options.limit);
    }

    return sorted;
  }

  async getStats() {
    await this.init();

    const stats = {
      totalArticles: this.memoryIndex.size,
      bySourceType: {},
      byRaceCode: {},
      avgScore: 0
    };

    let totalScore = 0;
    for (const entry of this.memoryIndex.values()) {
      // Comptage par type source
      stats.bySourceType[entry.sourceType] =
        (stats.bySourceType[entry.sourceType] || 0) + 1;

      // Comptage par race
      stats.byRaceCode[entry.raceCode] =
        (stats.byRaceCode[entry.raceCode] || 0) + 1;

      totalScore += entry.finalScore || 0;
    }

    stats.avgScore = stats.totalArticles > 0 ?
      totalScore / stats.totalArticles : 0;

    return stats;
  }
}

module.exports = JSONStockRepository;

🎯 Container d'injection de dépendances

Dependency Injection Container

// src/container.js
const LLMNewsProvider = require('./implementations/providers/LLMNewsProvider');
const JSONStockRepository = require('./implementations/storage/JSONStockRepository');
const BasicScoringEngine = require('./implementations/scoring/BasicScoringEngine');

class Container {
  constructor() {
    this.services = new Map();
    this.config = this.loadConfig();
  }

  loadConfig() {
    return {
      newsProvider: {
        type: 'llm',
        llm: {
          apiKey: process.env.OPENAI_API_KEY,
          model: 'gpt-4o-mini',
          maxTokens: 2000
        }
      },
      stockRepository: {
        type: 'json',
        json: {
          dataPath: './data/stock'
        }
      },
      scoringEngine: {
        type: 'basic',
        weights: {
          freshness: 0.3,
          specificity: 0.4,
          quality: 0.2,
          reusability: 0.1
        }
      }
    };
  }

  register(name, factory) {
    this.services.set(name, factory);
  }

  get(name) {
    const factory = this.services.get(name);
    if (!factory) {
      throw new Error(`Service ${name} not registered`);
    }
    return factory();
  }

  init() {
    // News Provider
    this.register('newsProvider', () => {
      switch (this.config.newsProvider.type) {
        case 'llm':
          return new LLMNewsProvider(this.config.newsProvider.llm);
        // Futurs providers
        // case 'scraping':
        //   return new ScrapingNewsProvider(this.config.newsProvider.scraping);
        // case 'hybrid':
        //   return new HybridNewsProvider(this.config.newsProvider.hybrid);
        default:
          throw new Error(`Unknown news provider: ${this.config.newsProvider.type}`);
      }
    });

    // Stock Repository
    this.register('stockRepository', () => {
      switch (this.config.stockRepository.type) {
        case 'json':
          return new JSONStockRepository(this.config.stockRepository.json);
        // Futurs stockages
        // case 'mongodb':
        //   return new MongoStockRepository(this.config.stockRepository.mongodb);
        // case 'postgresql':
        //   return new PostgreSQLStockRepository(this.config.stockRepository.postgresql);
        default:
          throw new Error(`Unknown stock repository: ${this.config.stockRepository.type}`);
      }
    });

    // Scoring Engine
    this.register('scoringEngine', () => {
      return new BasicScoringEngine(this.config.scoringEngine);
    });
  }
}

// Singleton
const container = new Container();
container.init();

module.exports = container;

🏢 Services métier (stables)

NewsSearchService - Service principal

// src/services/NewsSearchService.js
class NewsSearchService {
  constructor(newsProvider, stockRepository, scoringEngine) {
    this.newsProvider = newsProvider;
    this.stockRepository = stockRepository;
    this.scoringEngine = scoringEngine;
  }

  async search(query) {
    // 1. Recherche en stock d'abord
    const stockResults = await this.searchInStock(query);

    // 2. Si insuffisant, recherche live
    let liveResults = [];
    if (stockResults.length < query.limit) {
      const remaining = query.limit - stockResults.length;
      liveResults = await this.searchLive({
        ...query,
        limit: remaining
      });
    }

    // 3. Scoring combiné
    const allResults = [...stockResults, ...liveResults];
    const scoredResults = await this.scoringEngine.batchScore(allResults, query);

    // 4. Tri et limite
    const finalResults = scoredResults
      .sort((a, b) => b.finalScore - a.finalScore)
      .slice(0, query.limit);

    // 5. Tracking usage
    await this.trackUsage(finalResults);

    return {
      results: finalResults,
      metadata: {
        fromStock: stockResults.length,
        fromLive: liveResults.length,
        totalFound: allResults.length,
        searchTime: Date.now() - query.startTime
      }
    };
  }

  async searchInStock(query) {
    return await this.stockRepository.findByRaceCode(query.raceCode, {
      minScore: query.minScore || 100,
      limit: query.limit
    });
  }

  async searchLive(query) {
    const results = await this.newsProvider.searchNews(query);
    const validated = await this.newsProvider.validateResults(results);

    // Sauvegarder en stock pour réutilisation
    for (const result of validated) {
      await this.stockRepository.save(result);
    }

    return validated;
  }

  async trackUsage(results) {
    for (const result of results) {
      await this.stockRepository.updateUsage(result.id, {
        lastUsed: new Date(),
        usageCount: (result.usageCount || 0) + 1
      });
    }
  }
}

module.exports = NewsSearchService;

🔧 Configuration modulaire

Changement de composant en 1 ligne

// config/environments/development.js
module.exports = {
  // Version actuelle : Full LLM + JSON
  newsProvider: { type: 'llm', llm: { model: 'gpt-4o-mini' }},
  stockRepository: { type: 'json', json: { dataPath: './data' }},

  // Migration facile vers d'autres composants :

  // Si on veut tester scraping :
  // newsProvider: { type: 'scraping', scraping: { antiBot: true }},

  // Si on veut MongoDB :
  // stockRepository: { type: 'mongodb', mongodb: { uri: '...' }},

  // Si on veut hybride :
  // newsProvider: {
  //   type: 'hybrid',
  //   hybrid: {
  //     primary: { type: 'llm' },
  //     fallback: { type: 'scraping' }
  //   }
  // }
};

Avantages architecture modulaire

  1. Flexibilité totale : Changer un composant = modifier 1 ligne config
  2. Tests isolés : Mocker chaque interface indépendamment
  3. Évolution sans risque : Nouveau composant n'impacte pas les autres
  4. Développement parallèle : Équipe peut travailler sur interfaces différentes
  5. Migration progressive : Pas de big bang, composant par composant
  6. Maintenance simplifiée : Bug isolé dans son composant
  7. Performance optimisable : Optimiser 1 composant sans casser les autres

Cette architecture permet de démarrer simple (LLM + JSON) et d'évoluer composant par composant selon les besoins.


Architecture finalisée pour version modulaire, gratuite, full LLM