From 89bcd9be2c725ea6eaa4258b1c17771306ae44d9 Mon Sep 17 00:00:00 2001
From: StillHammer <alexistrouve.pro@gmail.com>
Date: Sun, 30 Nov 2025 18:05:02 +0800
Subject: [PATCH] Add Reddit_Save_Scraper concept - Knowledge extraction from
 saved posts
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Vision
Scraper pour posts Reddit sauvegardés avec extraction intelligente de valeur.
Transformation passive saving → active knowledge management.

## Use Cases
1. Knowledge Base: Export Markdown structuré par thème
2. AI Digest: Résumé hebdomadaire + insights + action items (Claude API)
3. Search UI: Interface recherche full-text avec filters
4. Anki Generator: Conversion learning content → flashcards
5. Archive: Backup local si posts deleted

## Stack
- Python + PRAW (Reddit API) - Recommandé
- Alternative: Node.js + snoowrap
- Storage: SQLite (local-first)
- Optional: Claude API (analysis), Flask (web UI)

## MVP Timeline
- Phase 1 (Scraper): 1 jour
- Phase 2 (Storage): +1 jour
- Phase 3 (Feature au choix): +2-5 jours
Total: 2-7 jours selon scope

## Potentiel
- Quick win (low-hanging fruit)
- Utilité immédiate (saved posts existants)
- Scalable (valeur croît avec usage)
- Test case parfait pour AI_Team_System (later)
- Potentiel SaaS si validated

## Questions à Clarifier
- Combien de saved posts actuellement?
- Subreddits principaux?
- Use case prioritaire (archive, digest, search)?

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
---
 Projects/CONCEPT/Reddit_Save_Scraper.md | 620 ++++++++++++++++++++++++
 Projects/Status_Projets.md              |  16 +-
 2 files changed, 633 insertions(+), 3 deletions(-)
 create mode 100644 Projects/CONCEPT/Reddit_Save_Scraper.md

diff --git a/Projects/CONCEPT/Reddit_Save_Scraper.md b/Projects/CONCEPT/Reddit_Save_Scraper.md
new file mode 100644
index 0000000..01e0c9c
--- /dev/null
+++ b/Projects/CONCEPT/Reddit_Save_Scraper.md
@@ -0,0 +1,620 @@
+# Reddit Save Scraper - Personal Content Aggregator
+
+**Status**: CONCEPT
+**Created**: 30 novembre 2025
+**Type**: Productivity / Knowledge Management
+**Stack**: À définir (Python + Reddit API ou Node.js)
+
+---
+
+## Concept
+
+Scraper pour récupérer tous les posts sauvegardés sur Reddit et en faire quelque chose d'utile.
+
+**Problème** : Tu save des posts sur Reddit mais tu les revois jamais / c'est perdu dans le void.
+
+**Solution** : Extraire, organiser, et exploiter ce contenu de manière intelligente.
+
+---
+
+## Use Cases Potentiels
+
+### Option 1: Knowledge Base Personalisée
+
+**Flow** :
+```
+Reddit Saved Posts
+  ↓ Scrape
+Extract (titre, contenu, commentaires, subreddit, timestamp)
+  ↓ Categorize (IA)
+Store dans DB structurée
+  ↓ Output
+Obsidian vault / Notion database / Markdown files
+```
+
+**Bénéfice** :
+- Base de connaissances searchable
+- Organisée par thème (dev, gaming, lifestyle, etc.)
+- Accessible hors-ligne
+
+---
+
+### Option 2: AI-Powered Digest
+
+**Flow** :
+```
+Reddit Saved Posts (derniers 30 jours)
+  ↓ Scrape + Extract
+Claude API analyse
+  ↓ Génère
+Weekly digest (résumé + insights + action items)
+  ↓ Output
+Email ou Markdown ou Notion page
+```
+
+**Bénéfice** :
+- Résumé intelligent de ce que tu trouves intéressant
+- Patterns identifiés (sujets récurrents)
+- Action items extraits ("Try X", "Read Y", etc.)
+
+---
+
+### Option 3: Content Recommender
+
+**Flow** :
+```
+Reddit Saved Posts (historique complet)
+  ↓ Scrape
+Embeddings (OpenAI/Claude)
+  ↓ Vector search
+Recommandations similaires (nouveaux posts Reddit ou web)
+  ↓ Output
+Daily recommendations feed
+```
+
+**Bénéfice** :
+- Découverte contenu similaire à ce que tu kiffes
+- Anticipation intérêts
+- Serendipity boostée
+
+---
+
+### Option 4: Personal Archive + Search
+
+**Flow** :
+```
+Reddit Saved Posts
+  ↓ Scrape periodically
+Store locally (SQLite + full-text)
+  ↓ Web UI
+Search interface (keyword, subreddit, date range)
+  ↓ Features
+- Full-text search
+- Tag system
+- Export to PDF/Markdown
+- Link preservation (si post deleted)
+```
+
+**Bénéfice** :
+- Ownership du contenu (backup si post deleted)
+- Search puissant
+- Organisation custom (tags)
+
+---
+
+### Option 5: Anki Cards Generator
+
+**Flow** :
+```
+Reddit Saved Posts (dev/learning content)
+  ↓ Scrape
+Extract tips, tricks, code snippets
+  ↓ Claude API
+Generate Anki cards (Q&A format)
+  ↓ Output
+Anki deck importable
+```
+
+**Bénéfice** :
+- Learning actif au lieu de passive saving
+- Spaced repetition sur contenu Reddit
+- Rétention améliorée
+
+---
+
+## Architecture Technique
+
+### Stack Option 1: Python (Recommandé)
+
+**Pourquoi Python** :
+- PRAW (Python Reddit API Wrapper) - mature, bien documenté
+- Data processing facile (pandas, json)
+- IA/ML libs (OpenAI, embeddings, etc.)
+
+**Stack** :
+```
+PRAW (Reddit API)
+  ↓
+Python script (scraping + processing)
+  ↓
+SQLite / PostgreSQL (storage)
+  ↓
+Optional: Flask/FastAPI (web UI)
+  ↓
+Optional: OpenAI/Claude API (analysis/digest)
+```
+
+---
+
+### Stack Option 2: Node.js
+
+**Pourquoi Node.js** :
+- Familiarité Alexis
+- snoowrap (Reddit API wrapper Node.js)
+- Express pour web UI
+- Intégration facile avec autres tools JS
+
+**Stack** :
+```
+snoowrap (Reddit API)
+  ↓
+Node.js script (scraping + processing)
+  ↓
+SQLite / MongoDB (storage)
+  ↓
+Optional: Express (web UI)
+  ↓
+Optional: OpenAI/Claude API (analysis/digest)
+```
+
+---
+
+## MVP Scope
+
+### Phase 1: Basic Scraper (1-2 jours)
+
+**Features** :
+- ✅ Authenticate avec Reddit API (OAuth2)
+- ✅ Fetch all saved posts (pagination)
+- ✅ Extract data:
+  - Post title
+  - Post URL
+  - Subreddit
+  - Author
+  - Timestamp
+  - Content (self-post text si applicable)
+  - Top comments (optional)
+- ✅ Save to JSON file
+- ✅ Log progress (nombre de posts scraped)
+
+**Output** : `reddit_saved_posts.json`
+
+---
+
+### Phase 2: Storage + Organization (1-2 jours)
+
+**Features** :
+- ✅ SQLite database setup
+- ✅ Schema:
+  ```sql
+  CREATE TABLE posts (
+    id TEXT PRIMARY KEY,
+    title TEXT,
+    url TEXT,
+    subreddit TEXT,
+    author TEXT,
+    created_utc INTEGER,
+    content TEXT,
+    saved_at INTEGER,
+    category TEXT,  -- AI-generated or manual
+    tags TEXT       -- Comma-separated
+  );
+  ```
+- ✅ Import JSON → SQLite
+- ✅ Basic categorization (manual ou rule-based d'abord)
+
+**Output** : `reddit_saved.db`
+
+---
+
+### Phase 3: Choose Your Adventure (Variable)
+
+**Option A - Knowledge Base** (2-3 jours) :
+- Export to Markdown files (1 file per post)
+- Folder structure par subreddit ou category
+- Front-matter YAML (metadata)
+
+**Option B - AI Digest** (2-3 jours) :
+- Claude API integration
+- Weekly digest generator
+- Email ou Markdown output
+
+**Option C - Search UI** (3-5 jours) :
+- Flask/FastAPI web app
+- Full-text search
+- Filters (subreddit, date, tags)
+- Tag management
+
+**Option D - Anki Generator** (2-3 jours) :
+- Parse learning content
+- Claude API generate Q&A
+- Export Anki deck format
+
+---
+
+## Reddit API Setup
+
+### Prérequis
+
+1. **Reddit Account** (déjà fait)
+2. **Reddit App** :
+   - Aller sur https://www.reddit.com/prefs/apps
+   - Create App (script type)
+   - Get `client_id` + `client_secret`
+3. **OAuth2 Flow** :
+   - User agent: "RedditSaveScraper/1.0"
+   - Scopes: `history`, `read`
+
+### Rate Limits
+
+- **60 requests/minute** (standard)
+- Saved posts API endpoint: `/user/{username}/saved`
+- Pagination: 100 posts max per request
+- **Attention** : Si beaucoup de saved posts → plusieurs requêtes
+
+---
+
+## Example Code (Python + PRAW)
+
+```python
+import praw
+import json
+from datetime import datetime
+
+# Setup Reddit API
+reddit = praw.Reddit(
+    client_id="YOUR_CLIENT_ID",
+    client_secret="YOUR_CLIENT_SECRET",
+    user_agent="RedditSaveScraper/1.0",
+    username="YOUR_USERNAME",
+    password="YOUR_PASSWORD"
+)
+
+# Fetch saved posts
+saved_posts = []
+for post in reddit.user.me().saved(limit=None):
+    if isinstance(post, praw.models.Submission):  # Only posts, not comments
+        saved_posts.append({
+            "id": post.id,
+            "title": post.title,
+            "url": post.url,
+            "subreddit": str(post.subreddit),
+            "author": str(post.author),
+            "created_utc": int(post.created_utc),
+            "content": post.selftext if post.is_self else "",
+            "saved_at": int(datetime.now().timestamp())
+        })
+
+# Save to JSON
+with open("reddit_saved_posts.json", "w", encoding="utf-8") as f:
+    json.dump(saved_posts, f, indent=2, ensure_ascii=False)
+
+print(f"Scraped {len(saved_posts)} saved posts")
+```
+
+---
+
+## Use Cases - Deep Dive
+
+### Use Case 1: Dev Knowledge Base
+
+**Alexis save beaucoup de posts dev** (probablement).
+
+**Pipeline** :
+1. Scrape saved posts
+2. Filter subreddits: r/programming, r/Python, r/cpp, r/gamedev, etc.
+3. Categorize par topic:
+   - C++ tips
+   - Python tricks
+   - Game engine design
+   - Architecture patterns
+4. Export Markdown:
+   ```
+   dev_knowledge/
+   ├── cpp/
+   │   ├── hot_reload_techniques.md
+   │   └── cmake_best_practices.md
+   ├── python/
+   │   └── async_patterns.md
+   └── gamedev/
+       └── ecs_architecture.md
+   ```
+5. Searchable via Obsidian ou VSCode
+
+**Bénéfice** :
+- Base de référence personnelle
+- Évite de re-googler les mêmes trucs
+- Knowledge compound effect
+
+---
+
+### Use Case 2: Learning Digest
+
+**Flow hebdomadaire** :
+1. Scrape new saved posts (dernière semaine)
+2. Claude API analyse:
+   ```
+   Prompt:
+   "Voici 15 posts Reddit que j'ai sauvegardés cette semaine.
+   Génère un digest structuré:
+   - Thèmes principaux
+   - 3 insights clés
+   - 3 action items concrets
+   - Ressources à approfondir"
+   ```
+3. Output Markdown:
+   ```markdown
+   # Weekly Reddit Digest - 30 Nov 2025
+
+   ## Thèmes Principaux
+   - Hot-reload techniques (3 posts)
+   - Multi-agent AI systems (2 posts)
+   - Game asset pipelines (2 posts)
+
+   ## Insights Clés
+   1. Hot-reload sous 1ms possible avec mmap + symbol table cache
+   2. Multi-agent debate améliore qualité décisions (research papers)
+   3. Procedural generation + IA = sweet spot pour game assets
+
+   ## Action Items
+   - [ ] Tester mmap approach pour GroveEngine hot-reload
+   - [ ] Read paper "Constitutional AI via Debate"
+   - [ ] Prototype MCP asset pipeline POC
+
+   ## Ressources
+   - [Article] Advanced Hot-Reload Techniques (link)
+   - [Repo] Multi-Agent Framework Example (link)
+   ```
+
+**Bénéfice** :
+- Transformation passive saving → active learning
+- Accountability (action items trackés)
+- Patterns émergent (thèmes récurrents)
+
+---
+
+### Use Case 3: Content Archive (Backup)
+
+**Problème Reddit** : Posts peuvent être deleted/removed.
+
+**Solution** :
+1. Scrape + save contenu complet localement
+2. Screenshots des images (si applicable)
+3. Archive comments (top 10 comments)
+4. Preservation des liens
+
+**Bénéfice** :
+- Ownership du contenu
+- Accessible même si original deleted
+- Offline access
+
+---
+
+## Monétisation / Business Potential ?
+
+### SaaS Potential
+
+**Reddit Save Manager** :
+- Freemium service
+- Features:
+  - Auto-sync saved posts
+  - AI digest weekly
+  - Search interface
+  - Export to Notion/Obsidian
+  - Mobile app
+
+**Market** :
+- Reddit power users (millions)
+- Knowledge workers qui save beaucoup
+- Students, researchers, devs
+
+**Competitors** :
+- Rien de vraiment solide actuellement (niche vide)
+
+**Monétisation** :
+- Free: 100 saved posts max, basic export
+- Pro ($5/mois): Unlimited, AI digest, advanced search
+- Teams ($20/mois): Shared knowledge base, collaboration
+
+**Viabilité** : Moyenne (niche, mais potentiel SaaS récurrent)
+
+---
+
+## Risques & Challenges
+
+| Risque | Impact | Mitigation |
+|--------|--------|------------|
+| **Reddit API changes** | Moyen | Use official PRAW, monitor API updates |
+| **Rate limiting strict** | Faible | Respect 60 req/min, implement backoff |
+| **Saved posts = private data** | Moyen | Local-first, optional cloud sync |
+| **Posts deleted** | Faible | Archive content locally (backup) |
+| **Pas assez de saved posts** | Faible | Tool marchera quand même, valeur croît avec usage |
+
+---
+
+## Timeline Estimée
+
+### MVP Basic (Phase 1-2)
+
+**Scope** : Scraper + JSON export + SQLite storage
+
+**Timeline** :
+- Setup Reddit API: 1h
+- Scraper code: 2-3h
+- SQLite schema + import: 2h
+- Testing: 1h
+- **Total**: 1 jour
+
+---
+
+### MVP + Feature (Phase 3)
+
+**Option A - Knowledge Base Export** : +2 jours
+**Option B - AI Digest** : +2 jours
+**Option C - Search UI** : +3-5 jours
+**Option D - Anki Generator** : +2 jours
+
+**Total MVP complet** : 2-6 jours selon option choisie
+
+---
+
+## Lien Projets Existants
+
+### Database Cours Chinois
+
+**Synergie potentielle** :
+- Scrape saved posts r/ChineseLanguage, r/Hanzi
+- Export to Anki deck
+- Intégration avec pipeline d'apprentissage
+
+---
+
+### AI_Team_System
+
+**Test case parfait** :
+- Brief Alexis: "Reddit Save Scraper avec AI digest"
+- AI Team débat + implémente
+- Livré en 24-48h
+- **Premier projet test pour AI Team System** (après POC)
+
+---
+
+### AISSIA
+
+**Potentiel** :
+- AISSIA pourrait intégrer Reddit monitoring
+- "Dis-moi quand quelqu'un mentionne GroveEngine sur Reddit"
+- Auto-save posts intéressants
+
+---
+
+## Questions à Clarifier
+
+### Utilisation
+
+1. ⚠️ **Combien de saved posts actuellement ?** (10 ? 100 ? 1000 ?)
+2. ⚠️ **Subreddits principaux ?** (dev, gaming, lifestyle, autre ?)
+3. ⚠️ **Fréquence de save ?** (daily, weekly ?)
+4. ⚠️ **But principal ?** (archive, learning, search, autre ?)
+
+### Technique
+
+1. ⚠️ **Stack préférée ?** (Python PRAW ou Node.js snoowrap ?)
+2. ⚠️ **Output souhaité ?** (Markdown files, SQLite, web UI ?)
+3. ⚠️ **IA integration ?** (digest, categorization, ou pas besoin ?)
+
+### Priorité
+
+1. ⚠️ **Quand ce projet ?** (maintenant, après WeChat Bot, ou backlog ?)
+2. ⚠️ **MVP scope ?** (just scraper, ou scraper + feature ?)
+3. ⚠️ **Time investment acceptable ?** (1 jour, 1 semaine ?)
+
+---
+
+## Next Steps
+
+### Si GO Immédiat
+
+**Phase 0 - Exploration** (1-2h) :
+1. Check combien de saved posts tu as actuellement
+2. Voir les subreddits principaux
+3. Identifier use case principal (knowledge base, digest, search ?)
+4. **Décision** : Python ou Node.js ?
+
+**Phase 1 - MVP Scraper** (1 jour) :
+1. Setup Reddit API credentials
+2. Code scraper (PRAW ou snoowrap)
+3. Test avec tes saved posts réels
+4. Output JSON validé
+
+**Phase 2 - Feature** (1-5 jours selon choix) :
+1. Choisir option (A/B/C/D)
+2. Implémenter
+3. Test + iteration
+4. **DONE**
+
+---
+
+### Si PAUSE / Concept Only
+
+**Garder en concept** :
+- Attendre d'avoir plus de saved posts (si peu actuellement)
+- Ou attendre AI_Team_System (test case parfait)
+- Ou attendre besoin réel identifié
+
+---
+
+## Alternatives Existantes
+
+### Tools à Check Avant de Build
+
+1. **Reddit Enhancement Suite (RES)** - Browser extension
+   - Saved posts management ?
+   - Export features ?
+
+2. **IFTTT / Zapier** - Automation
+   - Reddit saved → Notion/Google Sheets ?
+
+3. **Pushshift.io** - Reddit archive
+   - API pour historique posts
+   - Complément à Reddit API officiel
+
+**Action** : Test ces tools d'abord, build custom si pas satisfaisant
+
+---
+
+## Decision Tree
+
+```
+Tu as combien de saved posts ?
+├─ < 50 → Peut-être trop tôt, sauf si tu veux préparer le système
+├─ 50-200 → Sweet spot pour MVP test
+└─ > 200 → Definitiely worth it, beaucoup de valeur à extraire
+
+Quel est ton use case principal ?
+├─ Archive / Backup → Basic scraper + SQLite + Markdown export
+├─ Learning / Digest → Scraper + Claude API analysis
+├─ Search / Discovery → Scraper + Web UI + Full-text search
+└─ Pas sûr → Start avec scraper basique, décider après
+
+Quand tu veux ce projet ?
+├─ Maintenant → GO Phase 0 (exploration 1-2h)
+├─ Après WeChat Bot → PAUSE, noter en backlog
+└─ Quand AI_Team_System ready → Perfect test case
+```
+
+---
+
+## Conclusion
+
+**Reddit Save Scraper = Low-hanging fruit avec potentiel élevé**
+
+**Pourquoi intéressant** :
+- ✅ Quick win (1-2 jours MVP)
+- ✅ Utilité immédiate (tes saved posts actuels)
+- ✅ Scalable (plus tu save, plus de valeur)
+- ✅ Learning opportunity (Reddit API, data processing)
+- ✅ Potentiel SaaS (si tu veux later)
+- ✅ Test case parfait pour AI_Team_System (later)
+
+**Décision requise** :
+1. Exploration (1-2h) pour clarifier use case ?
+2. GO pour MVP (1 jour) ?
+3. Ou PAUSE en concept jusqu'à besoin clair ?
+
+---
+
+*Créé : 30 novembre 2025*
+*Statut : CONCEPT - Exploration requise*
+*Estimated MVP time : 1-2 jours*
+*Stack préférée : Python + PRAW (recommandé) ou Node.js + snoowrap*
diff --git a/Projects/Status_Projets.md b/Projects/Status_Projets.md
index f75ccfa..218c30f 100644
--- a/Projects/Status_Projets.md
+++ b/Projects/Status_Projets.md
@@ -132,7 +132,7 @@
 
 ---
 
-## 💡 CONCEPT - 7 projets
+## 💡 CONCEPT - 8 projets
 
 ### 1. AI_Team_System
 **Fiche** : `CONCEPT/AI_Team_System.md` (created 30 nov 2025)
@@ -178,7 +178,16 @@
 **Idées** : Hooks, slash commands, coordination multi-instances
 **Status** : Idée initiale, besoins à préciser
 
-### 7. LeBonCoup (dossier)
+### 7. Reddit_Save_Scraper
+**Fiche** : `CONCEPT/Reddit_Save_Scraper.md` (created 30 nov 2025)
+**Description** : Scraper posts Reddit sauvegardés + extraction valeur (knowledge base, digest, search)
+**Stack** : Python + PRAW (Reddit API) ou Node.js + snoowrap
+**Use cases** : Archive/backup, AI digest weekly, search UI, Anki cards generator
+**Status** : Concept - Exploration requise (combien de saved posts ?)
+**MVP timeline** : 1-2 jours
+**Potentiel** : Test case parfait pour AI_Team_System
+
+### 8. LeBonCoup (dossier)
 **Status** : À examiner
 
 ---
@@ -211,7 +220,7 @@
 ## 📊 Statistiques
 
 **Total projets actifs** : 16 (5 WIP + 6 PAUSE + 4 CONSTANT)
-**Projets concepts** : 7 (dont AI_Team_System - meta-projet)
+**Projets concepts** : 8 (dont AI_Team_System - meta-projet, Reddit_Save_Scraper)
 **Projets DONE** : 1 (videotoMP3Transcriptor) 🎉
 **Archivés** : 2 docs + 2 candidats
 
@@ -224,6 +233,7 @@
 - Services : 3 (videotoMP3 DONE, OCR PDF, VPS Tunnel)
 - Workflow : 2 (Social Network, Claude Workflow)
 - Education : 1 (WeChat Homework Bot)
+- Productivity : 1 (Reddit Save Scraper)
 - Meta : 1 (AI_Team_System - multiplicateur de force)
 
 ---