Add ProjectOrganizer module specification

New GroveEngine module for project analysis and organization: - Recursive folder scanning and classification - Relation detection (markdown links, references) - Multiple output formats (JSON, Markdown, HTML, SVG graphs) - 4-phase roadmap: MVP (txt/md) → Office/PDF → LLM → Code analysis - IIO-based communication with GroveEngine 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 22:24:18 +08:00 · 2025-11-28 22:24:18 +08:00 · 43669548c0
commit 43669548c0
parent a141f01744
1 changed files with 659 additions and 0 deletions
--- a/Projects/WIP/ProjectOrganizer.md
+++ b/Projects/WIP/ProjectOrganizer.md
@ -0,0 +1,659 @@
+# ProjectOrganizer - GroveEngine Module
+
+## Vue d'ensemble
+
+Module GroveEngine C++ pour analyse, classification et organisation automatique de dossiers de projets avec génération de visualisations et rapports.
+
+### Objectif
+
+Créer un module GroveEngine qui :
+- Analyse récursivement un dossier projet
+- Classifie et organise tous les documents
+- Détecte et mappe les relations entre fichiers
+- Génère des visualisations (graphes de dépendances)
+- Produit des outputs structurés (JSON, Markdown, HTML)
+
+---
+
+## Spécifications Techniques
+
+### Intégration GroveEngine
+
+**Type** : IModule (DLL/SO hot-reloadable)
+
+**Communication** : Via IIO (IntraIO)
+
+**Architecture** :
+```
+Application
+    ↓
+GroveEngine (IEngine)
+    ↓
+ProjectOrganizerModule (IModule)
+    ├─ Reçoit requêtes via IIO
+    ├─ Analyse filesystem
+    ├─ Classifie/organise
+    ├─ Génère outputs
+    └─ Retourne résultats via IIO
+```
+
+### API IIO
+
+**Input messages** :
+```json
+{
+  "action": "analyze",
+  "source_path": "/path/to/project",
+  "output_path": "/path/to/results",
+  "options": {
+    "use_llm": false,
+    "file_types": ["txt", "md", "pdf", "doc", "docx", "ppt", "pptx"],
+    "max_depth": -1,
+    "generate_graphs": true,
+    "copy_source": true
+  }
+}
+```
+
+**Output messages** :
+```json
+{
+  "status": "success",
+  "output_path": "/path/to/results",
+  "stats": {
+    "files_processed": 142,
+    "files_classified": 138,
+    "relations_found": 67,
+    "duration_ms": 1243
+  },
+  "summary": {
+    "categories": {
+      "documentation": 45,
+      "code": 23,
+      "config": 12,
+      "data": 58
+    }
+  }
+}
+```
+
+---
+
+## Fonctionnalités
+
+### Phase 1 - MVP Sans LLM (Court terme)
+
+**Formats supportés** : `.txt`, `.md`
+
+**Capacités** :
+- [x] Copie récursive du dossier source
+- [x] Scan tous fichiers txt/md
+- [x] Classification basique par :
+  - Extension de fichier
+  - Localisation (dossier parent)
+  - Patterns dans nom fichier
+- [x] Détection relations :
+  - Liens markdown `[text](file.md)`
+  - Références explicites chemin fichier
+  - Mentions de noms de fichiers
+- [x] Génération outputs :
+  - JSON : Structure complète + métadonnées
+  - Markdown : Rapport lisible + index
+  - HTML : Visualisation interactive
+
+**Graphes générés** :
+- Graph de dépendances (qui référence qui)
+- Graph de catégories (regroupement thématique)
+- Graph de structure (arborescence dossiers)
+
+**Livrables** :
+```
+output/
+├── organized/           # Copie organisée du projet
+│   ├── documentation/
+│   ├── code/
+│   ├── config/
+│   └── data/
+├── analysis/
+│   ├── structure.json   # Données brutes
+│   ├── relations.json   # Graphe de relations
+│   ├── report.md        # Rapport markdown
+│   └── index.html       # Visualisation interactive
+└── graphs/
+    ├── dependencies.svg
+    ├── categories.svg
+    └── structure.svg
+```
+
+**Durée estimée** : 1-2 semaines
+
+### Phase 2 - Formats Avancés (Moyen terme)
+
+**Nouveaux formats** : `.pdf` (OCR), `.doc`, `.docx`, `.ppt`, `.pptx`
+
+**Capacités additionnelles** :
+- [x] Extraction texte PDF avec OCR (Tesseract/PaddleOCR)
+- [x] Parsing Office documents (LibreOffice SDK / Apache POI via JNI)
+- [x] Détection langage automatique
+- [x] Extraction métadonnées (auteur, date, tags)
+
+**Durée estimée** : 2-3 semaines
+
+### Phase 3 - Analyse LLM (Long terme)
+
+**Intégration LLM** : Via IIO vers module LLM externe
+
+**Capacités LLM** :
+- Classification sémantique profonde
+- Détection relations implicites
+- Extraction concepts clés
+- Génération résumés
+- Suggestions réorganisation
+
+**Architecture** :
+```
+ProjectOrganizerModule
+    ↓ (IIO message)
+LLMModule (Claude/GPT via API)
+    ↓ (response)
+ProjectOrganizerModule
+```
+
+**Durée estimée** : 3-4 semaines
+
+### Phase 4 - Analyse Code (Très long terme)
+
+**Langages supportés** : C++, Python, JavaScript, Java, etc.
+
+**Capacités** :
+- Parsing AST (Abstract Syntax Tree)
+- Détection dépendances imports
+- Call graph génération
+- Détection fonctions inutilisées
+- Analyse complexité
+
+**Outils** :
+- Clang LibTooling (C++)
+- Tree-sitter (multi-langage)
+- LSP integration possible
+
+**Note** : Pas prioritaire, code n'est pas la target principale
+
+**Durée estimée** : 4-6 semaines
+
+---
+
+## Workflows Typiques
+
+### Workflow 1 : Analyse Projet Existant
+
+```cpp
+// 1. User lance via IIO
+io->send("ProjectOrganizer", {
+  "action": "analyze",
+  "source_path": "E:/Projets/MyProject",
+  "output_path": "E:/Projets/MyProject_Analysis"
+});
+
+// 2. Module process
+// - Copie récursive
+// - Scan fichiers
+// - Classification
+// - Détection relations
+// - Génération outputs
+
+// 3. Module répond
+io->receive([](const Message& msg) {
+  if (msg.status == "success") {
+    // Ouvrir index.html
+    system("start E:/Projets/MyProject_Analysis/analysis/index.html");
+  }
+});
+```
+
+### Workflow 2 : Watch Mode (Future)
+
+```cpp
+// Mode surveillance continue
+io->send("ProjectOrganizer", {
+  "action": "watch",
+  "source_path": "E:/Projets/MyProject",
+  "update_interval_ms": 5000
+});
+
+// Module update analysis automatiquement toutes les 5s
+```
+
+---
+
+## Architecture Interne Module
+
+### Classes Principales
+
+```cpp
+// Module principal
+class ProjectOrganizerModule : public IModule {
+public:
+    void initialize(const IDataNode& config, IIO* io) override;
+    void update(float deltaTime) override;
+    void shutdown() override;
+
+private:
+    FileScanner scanner_;
+    Classifier classifier_;
+    RelationDetector relationDetector_;
+    OutputGenerator outputGenerator_;
+};
+
+// Scanner filesystem
+class FileScanner {
+public:
+    std::vector<FileInfo> scanRecursive(const std::string& path);
+};
+
+// Classification fichiers
+class Classifier {
+public:
+    Category classify(const FileInfo& file);
+};
+
+// Détection relations
+class RelationDetector {
+public:
+    std::vector<Relation> detect(const std::vector<FileInfo>& files);
+};
+
+// Génération outputs
+class OutputGenerator {
+public:
+    void generateJSON(const Analysis& analysis, const std::string& path);
+    void generateMarkdown(const Analysis& analysis, const std::string& path);
+    void generateHTML(const Analysis& analysis, const std::string& path);
+    void generateGraphs(const Analysis& analysis, const std::string& path);
+};
+```
+
+### Structures de Données
+
+```cpp
+struct FileInfo {
+    std::string path;
+    std::string name;
+    std::string extension;
+    size_t size;
+    time_t modified;
+    Category category;
+    std::string content;  // Si text-based
+};
+
+enum class Category {
+    Documentation,
+    Code,
+    Config,
+    Data,
+    Media,
+    Archive,
+    Unknown
+};
+
+struct Relation {
+    std::string from;     // Fichier source
+    std::string to;       // Fichier cible
+    RelationType type;    // Link, Import, Reference
+    std::string context;  // Ligne où trouvé
+};
+
+struct Analysis {
+    std::vector<FileInfo> files;
+    std::vector<Relation> relations;
+    std::map<Category, int> stats;
+    std::chrono::milliseconds duration;
+};
+```
+
+---
+
+## Dépendances
+
+### Requises (Phase 1)
+
+- **C++17** : Filesystem API (`<filesystem>`)
+- **nlohmann_json** : Génération JSON (déjà dans GroveEngine)
+- **Graphviz** : Génération graphes SVG (external process call)
+
+### Optionnelles (Phases 2+)
+
+- **Tesseract/PaddleOCR** : OCR pour PDFs
+- **LibreOffice SDK** : Parsing Office documents
+- **Tree-sitter** : Parsing code (Phase 4)
+
+### Note
+
+Toutes les dépendances lourdes (LLM, OCR) via IIO vers modules externes, pas intégrées directement.
+
+---
+
+## Génération Visualisations
+
+### Format Graphviz DOT
+
+**Example dependencies graph** :
+```dot
+digraph dependencies {
+    rankdir=LR;
+    node [shape=box];
+
+    "README.md" -> "docs/architecture.md";
+    "README.md" -> "docs/setup.md";
+    "src/main.cpp" -> "include/engine.h";
+    "include/engine.h" -> "include/module.h";
+}
+```
+
+**Génération** :
+```cpp
+void OutputGenerator::generateGraphs(const Analysis& analysis, const std::string& path) {
+    // 1. Créer fichier DOT
+    std::ofstream dot(path + "/dependencies.dot");
+    dot << "digraph dependencies {\n";
+    for (const auto& rel : analysis.relations) {
+        dot << "  \"" << rel.from << "\" -> \"" << rel.to << "\";\n";
+    }
+    dot << "}\n";
+    dot.close();
+
+    // 2. Appeler Graphviz
+    system("dot -Tsvg dependencies.dot -o dependencies.svg");
+}
+```
+
+### Format HTML Interactif
+
+**Technologies** :
+- **vis.js** : Graphes interactifs
+- **D3.js** : Alternative visualisations
+- **Bootstrap** : UI propre
+
+**Features** :
+- Zoom/pan sur graphe
+- Click fichier → Affiche infos
+- Filtres par catégorie
+- Search bar
+
+---
+
+## Performance
+
+### Cibles
+
+- **Scan 1000 fichiers** : < 1s
+- **Classification** : < 100ms (sans LLM)
+- **Détection relations** : < 500ms
+- **Génération outputs** : < 2s
+
+### Optimisations
+
+- Multi-threading scan filesystem
+- Cache résultats (inotify/FileSystemWatcher pour détecter changements)
+- Lazy loading contenu fichier (seulement si nécessaire)
+- Streaming génération HTML (pas tout en RAM)
+
+---
+
+## Configuration Module
+
+**Fichier config** : `config/project_organizer.json`
+
+```json
+{
+  "default_output_path": "~/.groveengine/project_analysis",
+  "file_types": {
+    "phase1": ["txt", "md"],
+    "phase2": ["pdf", "doc", "docx", "ppt", "pptx"],
+    "phase4": ["cpp", "h", "py", "js", "java"]
+  },
+  "classification_rules": {
+    "documentation": ["README", "docs/", ".md"],
+    "code": ["src/", "include/", ".cpp", ".h"],
+    "config": ["config/", ".json", ".yaml", ".toml"]
+  },
+  "graph_engine": "graphviz",
+  "max_file_size_mb": 10,
+  "use_cache": true
+}
+```
+
+---
+
+## Tests
+
+### Tests Unitaires
+
+```cpp
+TEST(FileScanner, ScanRecursive) {
+    FileScanner scanner;
+    auto files = scanner.scanRecursive("test_data/sample_project");
+    EXPECT_GT(files.size(), 0);
+}
+
+TEST(Classifier, ClassifyMarkdown) {
+    Classifier classifier;
+    FileInfo file{"README.md", "README.md", "md", 1024, 0, Category::Unknown, ""};
+    EXPECT_EQ(classifier.classify(file), Category::Documentation);
+}
+
+TEST(RelationDetector, DetectMarkdownLinks) {
+    RelationDetector detector;
+    FileInfo file{"test.md", "test.md", "md", 0, 0, Category::Unknown,
+                  "See [other](other.md) for details"};
+    auto relations = detector.detect({file});
+    EXPECT_EQ(relations.size(), 1);
+    EXPECT_EQ(relations[0].to, "other.md");
+}
+```
+
+### Tests Intégration
+
+- Test projet réel (couple_matters repo)
+- Validation outputs générés
+- Performance benchmarks
+
+---
+
+## Roadmap Développement
+
+### Phase 1 - MVP (1-2 semaines)
+- [ ] Setup module GroveEngine boilerplate
+- [ ] FileScanner implementation
+- [ ] Classifier basique (extension + path)
+- [ ] RelationDetector markdown links
+- [ ] OutputGenerator JSON + MD
+- [ ] Tests unitaires basiques
+- [ ] Exemple fonctionnel
+
+### Phase 2 - Formats Avancés (2-3 semaines)
+- [ ] Intégration OCR (PDF)
+- [ ] Parser Office docs
+- [ ] OutputGenerator HTML interactif
+- [ ] Graph generation (Graphviz)
+- [ ] Performance optimizations
+- [ ] Tests intégration
+
+### Phase 3 - LLM (3-4 semaines)
+- [ ] IIO protocol vers LLM module
+- [ ] Classification sémantique
+- [ ] Détection relations implicites
+- [ ] Génération résumés
+- [ ] A/B testing LLM vs non-LLM
+
+### Phase 4 - Code Analysis (4-6 semaines)
+- [ ] Tree-sitter integration
+- [ ] AST parsing multi-langage
+- [ ] Call graph generation
+- [ ] Unused code detection
+- [ ] Complexity metrics
+
+---
+
+## Problèmes Anticipés
+
+### 1. Performance Gros Projets
+
+**Problème** : Projet 10,000+ fichiers = scan lent
+
+**Solutions** :
+- Multi-threading (std::async)
+- Filtrage intelligent (ignorer node_modules, .git)
+- Incremental analysis (cache + watch mode)
+
+### 2. Encodings Fichiers
+
+**Problème** : Fichiers UTF-8, UTF-16, ISO-8859-1, etc.
+
+**Solutions** :
+- Auto-détection encoding (libiconv/ICU)
+- Fallback UTF-8 + ignore errors
+- Log fichiers non-parsables
+
+### 3. Graphes Trop Complexes
+
+**Problème** : 1000+ nodes = illisible
+
+**Solutions** :
+- Clustering par catégorie
+- Zoom levels (overview → détail)
+- Filtres interactifs HTML
+
+### 4. Faux Positifs Relations
+
+**Problème** : Détecte "test.md" dans commentaire comme lien
+
+**Solutions** :
+- Heuristiques strictes (markdown syntax only)
+- LLM validation (Phase 3)
+- Whitelist/blacklist user
+
+---
+
+## Exemples Utilisation
+
+### Exemple 1 : Analyse couple_matters Repo
+
+**Input** :
+```json
+{
+  "action": "analyze",
+  "source_path": "E:/Users/Alexis Trouvé/Documents/Projets/couple_matters",
+  "output_path": "E:/couple_matters_analysis"
+}
+```
+
+**Output attendu** :
+- 200+ fichiers markdown classifiés
+- Relations détectées :
+  - `CLAUDE.md` → `ToRemember/schema.md`
+  - `personnalités/Tingting.md` → `personnalités/TingtingWork.md`
+  - `Projects/Status_Projets.md` → tous les `Projects/*/projet.md`
+- Graphe catégories :
+  - Couple (50%)
+  - Projets (30%)
+  - Planning (10%)
+  - Autres (10%)
+
+### Exemple 2 : Analyse Codebase C++
+
+**Input** :
+```json
+{
+  "action": "analyze",
+  "source_path": "E:/Projets/GroveEngine",
+  "options": {
+    "file_types": ["cpp", "h", "md"]
+  }
+}
+```
+
+**Output attendu** :
+- Headers classifiés par module
+- Includes graph
+- Documentation liée au code
+
+---
+
+## Risques
+
+| Risque | Probabilité | Impact | Mitigation |
+|--------|-------------|--------|------------|
+| **Over-engineering** | Élevée | Moyen | MVP strict Phase 1, features après |
+| **Performance inacceptable gros projets** | Moyenne | Élevé | Benchmarks early, optimizations prioritaires |
+| **OCR/Office parsing complexe** | Moyenne | Moyen | Phase 2 optionnelle, focus txt/md d'abord |
+| **Graphviz dépendance externe** | Faible | Faible | Fallback text-based graph si absent |
+| **Scope creep vers IDE** | Élevée | Élevé | Focus analyse/organisation, pas édition |
+
+---
+
+## Statut Actuel
+
+### Code
+- ❌ Aucun code écrit
+- ❌ Module pas créé dans GroveEngine
+
+### Design
+- ✅ Spécifications complètes
+- ✅ API IIO définie
+- ✅ Architecture interne définie
+
+### Tests
+- ❌ Aucun test
+
+### Documentation
+- ✅ Ce fichier (fiche projet)
+- ⏳ Guide utilisateur à écrire
+
+---
+
+## Prochaines Étapes
+
+### Immédiat
+1. [ ] Créer `GroveEngine/modules/ProjectOrganizer/` folder structure
+2. [ ] Setup CMake pour module
+3. [ ] Créer squelette `ProjectOrganizerModule.cpp`
+4. [ ] Implement FileScanner basique
+5. [ ] Test scan sur petit projet
+
+### Court terme
+1. [ ] Classifier implementation
+2. [ ] RelationDetector markdown
+3. [ ] OutputGenerator JSON
+4. [ ] Test sur couple_matters repo
+
+### Moyen terme
+1. [ ] OutputGenerator Markdown + HTML
+2. [ ] Graphviz integration
+3. [ ] Performance optimization
+4. [ ] Documentation utilisateur
+
+---
+
+## Ressources
+
+### Repos
+- **GroveEngine** : `E:/Projets/GroveEngine/`
+- **Module location** : `E:/Projets/GroveEngine/modules/ProjectOrganizer/`
+- **Test data** : `E:/Users/Alexis Trouvé/Documents/Projets/couple_matters/`
+
+### Documentation
+- `GroveEngine/docs/architecture-modulaire.md` : Système IModule
+- `GroveEngine/docs/CLAUDE-HOT-RELOAD-GUIDE.md` : Hot-reload workflow
+
+### External Tools
+- **Graphviz** : https://graphviz.org/
+- **vis.js** : https://visjs.org/
+- **PaddleOCR** : https://github.com/PaddlePaddle/PaddleOCR (Phase 2)
+
+---
+
+*Créé : 28 novembre 2025*
+*Statut : CONCEPT → WIP (Phase 1)*
+*Stack : C++17, GroveEngine IModule, nlohmann_json, Graphviz*
+*Target : Analyse/organisation projets documentation-heavy*