Add Reddit_Save_Scraper concept - Knowledge extraction from saved posts
## Vision Scraper pour posts Reddit sauvegardés avec extraction intelligente de valeur. Transformation passive saving → active knowledge management. ## Use Cases 1. Knowledge Base: Export Markdown structuré par thème 2. AI Digest: Résumé hebdomadaire + insights + action items (Claude API) 3. Search UI: Interface recherche full-text avec filters 4. Anki Generator: Conversion learning content → flashcards 5. Archive: Backup local si posts deleted ## Stack - Python + PRAW (Reddit API) - Recommandé - Alternative: Node.js + snoowrap - Storage: SQLite (local-first) - Optional: Claude API (analysis), Flask (web UI) ## MVP Timeline - Phase 1 (Scraper): 1 jour - Phase 2 (Storage): +1 jour - Phase 3 (Feature au choix): +2-5 jours Total: 2-7 jours selon scope ## Potentiel - Quick win (low-hanging fruit) - Utilité immédiate (saved posts existants) - Scalable (valeur croît avec usage) - Test case parfait pour AI_Team_System (later) - Potentiel SaaS si validated ## Questions à Clarifier - Combien de saved posts actuellement? - Subreddits principaux? - Use case prioritaire (archive, digest, search)? 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
parent
fab9fad9c5
commit
89bcd9be2c
620
Projects/CONCEPT/Reddit_Save_Scraper.md
Normal file
620
Projects/CONCEPT/Reddit_Save_Scraper.md
Normal file
@ -0,0 +1,620 @@
|
|||||||
|
# Reddit Save Scraper - Personal Content Aggregator
|
||||||
|
|
||||||
|
**Status**: CONCEPT
|
||||||
|
**Created**: 30 novembre 2025
|
||||||
|
**Type**: Productivity / Knowledge Management
|
||||||
|
**Stack**: À définir (Python + Reddit API ou Node.js)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Concept
|
||||||
|
|
||||||
|
Scraper pour récupérer tous les posts sauvegardés sur Reddit et en faire quelque chose d'utile.
|
||||||
|
|
||||||
|
**Problème** : Tu save des posts sur Reddit mais tu les revois jamais / c'est perdu dans le void.
|
||||||
|
|
||||||
|
**Solution** : Extraire, organiser, et exploiter ce contenu de manière intelligente.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Use Cases Potentiels
|
||||||
|
|
||||||
|
### Option 1: Knowledge Base Personalisée
|
||||||
|
|
||||||
|
**Flow** :
|
||||||
|
```
|
||||||
|
Reddit Saved Posts
|
||||||
|
↓ Scrape
|
||||||
|
Extract (titre, contenu, commentaires, subreddit, timestamp)
|
||||||
|
↓ Categorize (IA)
|
||||||
|
Store dans DB structurée
|
||||||
|
↓ Output
|
||||||
|
Obsidian vault / Notion database / Markdown files
|
||||||
|
```
|
||||||
|
|
||||||
|
**Bénéfice** :
|
||||||
|
- Base de connaissances searchable
|
||||||
|
- Organisée par thème (dev, gaming, lifestyle, etc.)
|
||||||
|
- Accessible hors-ligne
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Option 2: AI-Powered Digest
|
||||||
|
|
||||||
|
**Flow** :
|
||||||
|
```
|
||||||
|
Reddit Saved Posts (derniers 30 jours)
|
||||||
|
↓ Scrape + Extract
|
||||||
|
Claude API analyse
|
||||||
|
↓ Génère
|
||||||
|
Weekly digest (résumé + insights + action items)
|
||||||
|
↓ Output
|
||||||
|
Email ou Markdown ou Notion page
|
||||||
|
```
|
||||||
|
|
||||||
|
**Bénéfice** :
|
||||||
|
- Résumé intelligent de ce que tu trouves intéressant
|
||||||
|
- Patterns identifiés (sujets récurrents)
|
||||||
|
- Action items extraits ("Try X", "Read Y", etc.)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Option 3: Content Recommender
|
||||||
|
|
||||||
|
**Flow** :
|
||||||
|
```
|
||||||
|
Reddit Saved Posts (historique complet)
|
||||||
|
↓ Scrape
|
||||||
|
Embeddings (OpenAI/Claude)
|
||||||
|
↓ Vector search
|
||||||
|
Recommandations similaires (nouveaux posts Reddit ou web)
|
||||||
|
↓ Output
|
||||||
|
Daily recommendations feed
|
||||||
|
```
|
||||||
|
|
||||||
|
**Bénéfice** :
|
||||||
|
- Découverte contenu similaire à ce que tu kiffes
|
||||||
|
- Anticipation intérêts
|
||||||
|
- Serendipity boostée
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Option 4: Personal Archive + Search
|
||||||
|
|
||||||
|
**Flow** :
|
||||||
|
```
|
||||||
|
Reddit Saved Posts
|
||||||
|
↓ Scrape periodically
|
||||||
|
Store locally (SQLite + full-text)
|
||||||
|
↓ Web UI
|
||||||
|
Search interface (keyword, subreddit, date range)
|
||||||
|
↓ Features
|
||||||
|
- Full-text search
|
||||||
|
- Tag system
|
||||||
|
- Export to PDF/Markdown
|
||||||
|
- Link preservation (si post deleted)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Bénéfice** :
|
||||||
|
- Ownership du contenu (backup si post deleted)
|
||||||
|
- Search puissant
|
||||||
|
- Organisation custom (tags)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Option 5: Anki Cards Generator
|
||||||
|
|
||||||
|
**Flow** :
|
||||||
|
```
|
||||||
|
Reddit Saved Posts (dev/learning content)
|
||||||
|
↓ Scrape
|
||||||
|
Extract tips, tricks, code snippets
|
||||||
|
↓ Claude API
|
||||||
|
Generate Anki cards (Q&A format)
|
||||||
|
↓ Output
|
||||||
|
Anki deck importable
|
||||||
|
```
|
||||||
|
|
||||||
|
**Bénéfice** :
|
||||||
|
- Learning actif au lieu de passive saving
|
||||||
|
- Spaced repetition sur contenu Reddit
|
||||||
|
- Rétention améliorée
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture Technique
|
||||||
|
|
||||||
|
### Stack Option 1: Python (Recommandé)
|
||||||
|
|
||||||
|
**Pourquoi Python** :
|
||||||
|
- PRAW (Python Reddit API Wrapper) - mature, bien documenté
|
||||||
|
- Data processing facile (pandas, json)
|
||||||
|
- IA/ML libs (OpenAI, embeddings, etc.)
|
||||||
|
|
||||||
|
**Stack** :
|
||||||
|
```
|
||||||
|
PRAW (Reddit API)
|
||||||
|
↓
|
||||||
|
Python script (scraping + processing)
|
||||||
|
↓
|
||||||
|
SQLite / PostgreSQL (storage)
|
||||||
|
↓
|
||||||
|
Optional: Flask/FastAPI (web UI)
|
||||||
|
↓
|
||||||
|
Optional: OpenAI/Claude API (analysis/digest)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Stack Option 2: Node.js
|
||||||
|
|
||||||
|
**Pourquoi Node.js** :
|
||||||
|
- Familiarité Alexis
|
||||||
|
- snoowrap (Reddit API wrapper Node.js)
|
||||||
|
- Express pour web UI
|
||||||
|
- Intégration facile avec autres tools JS
|
||||||
|
|
||||||
|
**Stack** :
|
||||||
|
```
|
||||||
|
snoowrap (Reddit API)
|
||||||
|
↓
|
||||||
|
Node.js script (scraping + processing)
|
||||||
|
↓
|
||||||
|
SQLite / MongoDB (storage)
|
||||||
|
↓
|
||||||
|
Optional: Express (web UI)
|
||||||
|
↓
|
||||||
|
Optional: OpenAI/Claude API (analysis/digest)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## MVP Scope
|
||||||
|
|
||||||
|
### Phase 1: Basic Scraper (1-2 jours)
|
||||||
|
|
||||||
|
**Features** :
|
||||||
|
- ✅ Authenticate avec Reddit API (OAuth2)
|
||||||
|
- ✅ Fetch all saved posts (pagination)
|
||||||
|
- ✅ Extract data:
|
||||||
|
- Post title
|
||||||
|
- Post URL
|
||||||
|
- Subreddit
|
||||||
|
- Author
|
||||||
|
- Timestamp
|
||||||
|
- Content (self-post text si applicable)
|
||||||
|
- Top comments (optional)
|
||||||
|
- ✅ Save to JSON file
|
||||||
|
- ✅ Log progress (nombre de posts scraped)
|
||||||
|
|
||||||
|
**Output** : `reddit_saved_posts.json`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 2: Storage + Organization (1-2 jours)
|
||||||
|
|
||||||
|
**Features** :
|
||||||
|
- ✅ SQLite database setup
|
||||||
|
- ✅ Schema:
|
||||||
|
```sql
|
||||||
|
CREATE TABLE posts (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
title TEXT,
|
||||||
|
url TEXT,
|
||||||
|
subreddit TEXT,
|
||||||
|
author TEXT,
|
||||||
|
created_utc INTEGER,
|
||||||
|
content TEXT,
|
||||||
|
saved_at INTEGER,
|
||||||
|
category TEXT, -- AI-generated or manual
|
||||||
|
tags TEXT -- Comma-separated
|
||||||
|
);
|
||||||
|
```
|
||||||
|
- ✅ Import JSON → SQLite
|
||||||
|
- ✅ Basic categorization (manual ou rule-based d'abord)
|
||||||
|
|
||||||
|
**Output** : `reddit_saved.db`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 3: Choose Your Adventure (Variable)
|
||||||
|
|
||||||
|
**Option A - Knowledge Base** (2-3 jours) :
|
||||||
|
- Export to Markdown files (1 file per post)
|
||||||
|
- Folder structure par subreddit ou category
|
||||||
|
- Front-matter YAML (metadata)
|
||||||
|
|
||||||
|
**Option B - AI Digest** (2-3 jours) :
|
||||||
|
- Claude API integration
|
||||||
|
- Weekly digest generator
|
||||||
|
- Email ou Markdown output
|
||||||
|
|
||||||
|
**Option C - Search UI** (3-5 jours) :
|
||||||
|
- Flask/FastAPI web app
|
||||||
|
- Full-text search
|
||||||
|
- Filters (subreddit, date, tags)
|
||||||
|
- Tag management
|
||||||
|
|
||||||
|
**Option D - Anki Generator** (2-3 jours) :
|
||||||
|
- Parse learning content
|
||||||
|
- Claude API generate Q&A
|
||||||
|
- Export Anki deck format
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Reddit API Setup
|
||||||
|
|
||||||
|
### Prérequis
|
||||||
|
|
||||||
|
1. **Reddit Account** (déjà fait)
|
||||||
|
2. **Reddit App** :
|
||||||
|
- Aller sur https://www.reddit.com/prefs/apps
|
||||||
|
- Create App (script type)
|
||||||
|
- Get `client_id` + `client_secret`
|
||||||
|
3. **OAuth2 Flow** :
|
||||||
|
- User agent: "RedditSaveScraper/1.0"
|
||||||
|
- Scopes: `history`, `read`
|
||||||
|
|
||||||
|
### Rate Limits
|
||||||
|
|
||||||
|
- **60 requests/minute** (standard)
|
||||||
|
- Saved posts API endpoint: `/user/{username}/saved`
|
||||||
|
- Pagination: 100 posts max per request
|
||||||
|
- **Attention** : Si beaucoup de saved posts → plusieurs requêtes
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Example Code (Python + PRAW)
|
||||||
|
|
||||||
|
```python
|
||||||
|
import praw
|
||||||
|
import json
|
||||||
|
from datetime import datetime
|
||||||
|
|
||||||
|
# Setup Reddit API
|
||||||
|
reddit = praw.Reddit(
|
||||||
|
client_id="YOUR_CLIENT_ID",
|
||||||
|
client_secret="YOUR_CLIENT_SECRET",
|
||||||
|
user_agent="RedditSaveScraper/1.0",
|
||||||
|
username="YOUR_USERNAME",
|
||||||
|
password="YOUR_PASSWORD"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Fetch saved posts
|
||||||
|
saved_posts = []
|
||||||
|
for post in reddit.user.me().saved(limit=None):
|
||||||
|
if isinstance(post, praw.models.Submission): # Only posts, not comments
|
||||||
|
saved_posts.append({
|
||||||
|
"id": post.id,
|
||||||
|
"title": post.title,
|
||||||
|
"url": post.url,
|
||||||
|
"subreddit": str(post.subreddit),
|
||||||
|
"author": str(post.author),
|
||||||
|
"created_utc": int(post.created_utc),
|
||||||
|
"content": post.selftext if post.is_self else "",
|
||||||
|
"saved_at": int(datetime.now().timestamp())
|
||||||
|
})
|
||||||
|
|
||||||
|
# Save to JSON
|
||||||
|
with open("reddit_saved_posts.json", "w", encoding="utf-8") as f:
|
||||||
|
json.dump(saved_posts, f, indent=2, ensure_ascii=False)
|
||||||
|
|
||||||
|
print(f"Scraped {len(saved_posts)} saved posts")
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Use Cases - Deep Dive
|
||||||
|
|
||||||
|
### Use Case 1: Dev Knowledge Base
|
||||||
|
|
||||||
|
**Alexis save beaucoup de posts dev** (probablement).
|
||||||
|
|
||||||
|
**Pipeline** :
|
||||||
|
1. Scrape saved posts
|
||||||
|
2. Filter subreddits: r/programming, r/Python, r/cpp, r/gamedev, etc.
|
||||||
|
3. Categorize par topic:
|
||||||
|
- C++ tips
|
||||||
|
- Python tricks
|
||||||
|
- Game engine design
|
||||||
|
- Architecture patterns
|
||||||
|
4. Export Markdown:
|
||||||
|
```
|
||||||
|
dev_knowledge/
|
||||||
|
├── cpp/
|
||||||
|
│ ├── hot_reload_techniques.md
|
||||||
|
│ └── cmake_best_practices.md
|
||||||
|
├── python/
|
||||||
|
│ └── async_patterns.md
|
||||||
|
└── gamedev/
|
||||||
|
└── ecs_architecture.md
|
||||||
|
```
|
||||||
|
5. Searchable via Obsidian ou VSCode
|
||||||
|
|
||||||
|
**Bénéfice** :
|
||||||
|
- Base de référence personnelle
|
||||||
|
- Évite de re-googler les mêmes trucs
|
||||||
|
- Knowledge compound effect
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Use Case 2: Learning Digest
|
||||||
|
|
||||||
|
**Flow hebdomadaire** :
|
||||||
|
1. Scrape new saved posts (dernière semaine)
|
||||||
|
2. Claude API analyse:
|
||||||
|
```
|
||||||
|
Prompt:
|
||||||
|
"Voici 15 posts Reddit que j'ai sauvegardés cette semaine.
|
||||||
|
Génère un digest structuré:
|
||||||
|
- Thèmes principaux
|
||||||
|
- 3 insights clés
|
||||||
|
- 3 action items concrets
|
||||||
|
- Ressources à approfondir"
|
||||||
|
```
|
||||||
|
3. Output Markdown:
|
||||||
|
```markdown
|
||||||
|
# Weekly Reddit Digest - 30 Nov 2025
|
||||||
|
|
||||||
|
## Thèmes Principaux
|
||||||
|
- Hot-reload techniques (3 posts)
|
||||||
|
- Multi-agent AI systems (2 posts)
|
||||||
|
- Game asset pipelines (2 posts)
|
||||||
|
|
||||||
|
## Insights Clés
|
||||||
|
1. Hot-reload sous 1ms possible avec mmap + symbol table cache
|
||||||
|
2. Multi-agent debate améliore qualité décisions (research papers)
|
||||||
|
3. Procedural generation + IA = sweet spot pour game assets
|
||||||
|
|
||||||
|
## Action Items
|
||||||
|
- [ ] Tester mmap approach pour GroveEngine hot-reload
|
||||||
|
- [ ] Read paper "Constitutional AI via Debate"
|
||||||
|
- [ ] Prototype MCP asset pipeline POC
|
||||||
|
|
||||||
|
## Ressources
|
||||||
|
- [Article] Advanced Hot-Reload Techniques (link)
|
||||||
|
- [Repo] Multi-Agent Framework Example (link)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Bénéfice** :
|
||||||
|
- Transformation passive saving → active learning
|
||||||
|
- Accountability (action items trackés)
|
||||||
|
- Patterns émergent (thèmes récurrents)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Use Case 3: Content Archive (Backup)
|
||||||
|
|
||||||
|
**Problème Reddit** : Posts peuvent être deleted/removed.
|
||||||
|
|
||||||
|
**Solution** :
|
||||||
|
1. Scrape + save contenu complet localement
|
||||||
|
2. Screenshots des images (si applicable)
|
||||||
|
3. Archive comments (top 10 comments)
|
||||||
|
4. Preservation des liens
|
||||||
|
|
||||||
|
**Bénéfice** :
|
||||||
|
- Ownership du contenu
|
||||||
|
- Accessible même si original deleted
|
||||||
|
- Offline access
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Monétisation / Business Potential ?
|
||||||
|
|
||||||
|
### SaaS Potential
|
||||||
|
|
||||||
|
**Reddit Save Manager** :
|
||||||
|
- Freemium service
|
||||||
|
- Features:
|
||||||
|
- Auto-sync saved posts
|
||||||
|
- AI digest weekly
|
||||||
|
- Search interface
|
||||||
|
- Export to Notion/Obsidian
|
||||||
|
- Mobile app
|
||||||
|
|
||||||
|
**Market** :
|
||||||
|
- Reddit power users (millions)
|
||||||
|
- Knowledge workers qui save beaucoup
|
||||||
|
- Students, researchers, devs
|
||||||
|
|
||||||
|
**Competitors** :
|
||||||
|
- Rien de vraiment solide actuellement (niche vide)
|
||||||
|
|
||||||
|
**Monétisation** :
|
||||||
|
- Free: 100 saved posts max, basic export
|
||||||
|
- Pro ($5/mois): Unlimited, AI digest, advanced search
|
||||||
|
- Teams ($20/mois): Shared knowledge base, collaboration
|
||||||
|
|
||||||
|
**Viabilité** : Moyenne (niche, mais potentiel SaaS récurrent)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Risques & Challenges
|
||||||
|
|
||||||
|
| Risque | Impact | Mitigation |
|
||||||
|
|--------|--------|------------|
|
||||||
|
| **Reddit API changes** | Moyen | Use official PRAW, monitor API updates |
|
||||||
|
| **Rate limiting strict** | Faible | Respect 60 req/min, implement backoff |
|
||||||
|
| **Saved posts = private data** | Moyen | Local-first, optional cloud sync |
|
||||||
|
| **Posts deleted** | Faible | Archive content locally (backup) |
|
||||||
|
| **Pas assez de saved posts** | Faible | Tool marchera quand même, valeur croît avec usage |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Timeline Estimée
|
||||||
|
|
||||||
|
### MVP Basic (Phase 1-2)
|
||||||
|
|
||||||
|
**Scope** : Scraper + JSON export + SQLite storage
|
||||||
|
|
||||||
|
**Timeline** :
|
||||||
|
- Setup Reddit API: 1h
|
||||||
|
- Scraper code: 2-3h
|
||||||
|
- SQLite schema + import: 2h
|
||||||
|
- Testing: 1h
|
||||||
|
- **Total**: 1 jour
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### MVP + Feature (Phase 3)
|
||||||
|
|
||||||
|
**Option A - Knowledge Base Export** : +2 jours
|
||||||
|
**Option B - AI Digest** : +2 jours
|
||||||
|
**Option C - Search UI** : +3-5 jours
|
||||||
|
**Option D - Anki Generator** : +2 jours
|
||||||
|
|
||||||
|
**Total MVP complet** : 2-6 jours selon option choisie
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Lien Projets Existants
|
||||||
|
|
||||||
|
### Database Cours Chinois
|
||||||
|
|
||||||
|
**Synergie potentielle** :
|
||||||
|
- Scrape saved posts r/ChineseLanguage, r/Hanzi
|
||||||
|
- Export to Anki deck
|
||||||
|
- Intégration avec pipeline d'apprentissage
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### AI_Team_System
|
||||||
|
|
||||||
|
**Test case parfait** :
|
||||||
|
- Brief Alexis: "Reddit Save Scraper avec AI digest"
|
||||||
|
- AI Team débat + implémente
|
||||||
|
- Livré en 24-48h
|
||||||
|
- **Premier projet test pour AI Team System** (après POC)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### AISSIA
|
||||||
|
|
||||||
|
**Potentiel** :
|
||||||
|
- AISSIA pourrait intégrer Reddit monitoring
|
||||||
|
- "Dis-moi quand quelqu'un mentionne GroveEngine sur Reddit"
|
||||||
|
- Auto-save posts intéressants
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Questions à Clarifier
|
||||||
|
|
||||||
|
### Utilisation
|
||||||
|
|
||||||
|
1. ⚠️ **Combien de saved posts actuellement ?** (10 ? 100 ? 1000 ?)
|
||||||
|
2. ⚠️ **Subreddits principaux ?** (dev, gaming, lifestyle, autre ?)
|
||||||
|
3. ⚠️ **Fréquence de save ?** (daily, weekly ?)
|
||||||
|
4. ⚠️ **But principal ?** (archive, learning, search, autre ?)
|
||||||
|
|
||||||
|
### Technique
|
||||||
|
|
||||||
|
1. ⚠️ **Stack préférée ?** (Python PRAW ou Node.js snoowrap ?)
|
||||||
|
2. ⚠️ **Output souhaité ?** (Markdown files, SQLite, web UI ?)
|
||||||
|
3. ⚠️ **IA integration ?** (digest, categorization, ou pas besoin ?)
|
||||||
|
|
||||||
|
### Priorité
|
||||||
|
|
||||||
|
1. ⚠️ **Quand ce projet ?** (maintenant, après WeChat Bot, ou backlog ?)
|
||||||
|
2. ⚠️ **MVP scope ?** (just scraper, ou scraper + feature ?)
|
||||||
|
3. ⚠️ **Time investment acceptable ?** (1 jour, 1 semaine ?)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
### Si GO Immédiat
|
||||||
|
|
||||||
|
**Phase 0 - Exploration** (1-2h) :
|
||||||
|
1. Check combien de saved posts tu as actuellement
|
||||||
|
2. Voir les subreddits principaux
|
||||||
|
3. Identifier use case principal (knowledge base, digest, search ?)
|
||||||
|
4. **Décision** : Python ou Node.js ?
|
||||||
|
|
||||||
|
**Phase 1 - MVP Scraper** (1 jour) :
|
||||||
|
1. Setup Reddit API credentials
|
||||||
|
2. Code scraper (PRAW ou snoowrap)
|
||||||
|
3. Test avec tes saved posts réels
|
||||||
|
4. Output JSON validé
|
||||||
|
|
||||||
|
**Phase 2 - Feature** (1-5 jours selon choix) :
|
||||||
|
1. Choisir option (A/B/C/D)
|
||||||
|
2. Implémenter
|
||||||
|
3. Test + iteration
|
||||||
|
4. **DONE**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Si PAUSE / Concept Only
|
||||||
|
|
||||||
|
**Garder en concept** :
|
||||||
|
- Attendre d'avoir plus de saved posts (si peu actuellement)
|
||||||
|
- Ou attendre AI_Team_System (test case parfait)
|
||||||
|
- Ou attendre besoin réel identifié
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Alternatives Existantes
|
||||||
|
|
||||||
|
### Tools à Check Avant de Build
|
||||||
|
|
||||||
|
1. **Reddit Enhancement Suite (RES)** - Browser extension
|
||||||
|
- Saved posts management ?
|
||||||
|
- Export features ?
|
||||||
|
|
||||||
|
2. **IFTTT / Zapier** - Automation
|
||||||
|
- Reddit saved → Notion/Google Sheets ?
|
||||||
|
|
||||||
|
3. **Pushshift.io** - Reddit archive
|
||||||
|
- API pour historique posts
|
||||||
|
- Complément à Reddit API officiel
|
||||||
|
|
||||||
|
**Action** : Test ces tools d'abord, build custom si pas satisfaisant
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Decision Tree
|
||||||
|
|
||||||
|
```
|
||||||
|
Tu as combien de saved posts ?
|
||||||
|
├─ < 50 → Peut-être trop tôt, sauf si tu veux préparer le système
|
||||||
|
├─ 50-200 → Sweet spot pour MVP test
|
||||||
|
└─ > 200 → Definitiely worth it, beaucoup de valeur à extraire
|
||||||
|
|
||||||
|
Quel est ton use case principal ?
|
||||||
|
├─ Archive / Backup → Basic scraper + SQLite + Markdown export
|
||||||
|
├─ Learning / Digest → Scraper + Claude API analysis
|
||||||
|
├─ Search / Discovery → Scraper + Web UI + Full-text search
|
||||||
|
└─ Pas sûr → Start avec scraper basique, décider après
|
||||||
|
|
||||||
|
Quand tu veux ce projet ?
|
||||||
|
├─ Maintenant → GO Phase 0 (exploration 1-2h)
|
||||||
|
├─ Après WeChat Bot → PAUSE, noter en backlog
|
||||||
|
└─ Quand AI_Team_System ready → Perfect test case
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Conclusion
|
||||||
|
|
||||||
|
**Reddit Save Scraper = Low-hanging fruit avec potentiel élevé**
|
||||||
|
|
||||||
|
**Pourquoi intéressant** :
|
||||||
|
- ✅ Quick win (1-2 jours MVP)
|
||||||
|
- ✅ Utilité immédiate (tes saved posts actuels)
|
||||||
|
- ✅ Scalable (plus tu save, plus de valeur)
|
||||||
|
- ✅ Learning opportunity (Reddit API, data processing)
|
||||||
|
- ✅ Potentiel SaaS (si tu veux later)
|
||||||
|
- ✅ Test case parfait pour AI_Team_System (later)
|
||||||
|
|
||||||
|
**Décision requise** :
|
||||||
|
1. Exploration (1-2h) pour clarifier use case ?
|
||||||
|
2. GO pour MVP (1 jour) ?
|
||||||
|
3. Ou PAUSE en concept jusqu'à besoin clair ?
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Créé : 30 novembre 2025*
|
||||||
|
*Statut : CONCEPT - Exploration requise*
|
||||||
|
*Estimated MVP time : 1-2 jours*
|
||||||
|
*Stack préférée : Python + PRAW (recommandé) ou Node.js + snoowrap*
|
||||||
@ -132,7 +132,7 @@
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 💡 CONCEPT - 7 projets
|
## 💡 CONCEPT - 8 projets
|
||||||
|
|
||||||
### 1. AI_Team_System
|
### 1. AI_Team_System
|
||||||
**Fiche** : `CONCEPT/AI_Team_System.md` (created 30 nov 2025)
|
**Fiche** : `CONCEPT/AI_Team_System.md` (created 30 nov 2025)
|
||||||
@ -178,7 +178,16 @@
|
|||||||
**Idées** : Hooks, slash commands, coordination multi-instances
|
**Idées** : Hooks, slash commands, coordination multi-instances
|
||||||
**Status** : Idée initiale, besoins à préciser
|
**Status** : Idée initiale, besoins à préciser
|
||||||
|
|
||||||
### 7. LeBonCoup (dossier)
|
### 7. Reddit_Save_Scraper
|
||||||
|
**Fiche** : `CONCEPT/Reddit_Save_Scraper.md` (created 30 nov 2025)
|
||||||
|
**Description** : Scraper posts Reddit sauvegardés + extraction valeur (knowledge base, digest, search)
|
||||||
|
**Stack** : Python + PRAW (Reddit API) ou Node.js + snoowrap
|
||||||
|
**Use cases** : Archive/backup, AI digest weekly, search UI, Anki cards generator
|
||||||
|
**Status** : Concept - Exploration requise (combien de saved posts ?)
|
||||||
|
**MVP timeline** : 1-2 jours
|
||||||
|
**Potentiel** : Test case parfait pour AI_Team_System
|
||||||
|
|
||||||
|
### 8. LeBonCoup (dossier)
|
||||||
**Status** : À examiner
|
**Status** : À examiner
|
||||||
|
|
||||||
---
|
---
|
||||||
@ -211,7 +220,7 @@
|
|||||||
## 📊 Statistiques
|
## 📊 Statistiques
|
||||||
|
|
||||||
**Total projets actifs** : 16 (5 WIP + 6 PAUSE + 4 CONSTANT)
|
**Total projets actifs** : 16 (5 WIP + 6 PAUSE + 4 CONSTANT)
|
||||||
**Projets concepts** : 7 (dont AI_Team_System - meta-projet)
|
**Projets concepts** : 8 (dont AI_Team_System - meta-projet, Reddit_Save_Scraper)
|
||||||
**Projets DONE** : 1 (videotoMP3Transcriptor) 🎉
|
**Projets DONE** : 1 (videotoMP3Transcriptor) 🎉
|
||||||
**Archivés** : 2 docs + 2 candidats
|
**Archivés** : 2 docs + 2 candidats
|
||||||
|
|
||||||
@ -224,6 +233,7 @@
|
|||||||
- Services : 3 (videotoMP3 DONE, OCR PDF, VPS Tunnel)
|
- Services : 3 (videotoMP3 DONE, OCR PDF, VPS Tunnel)
|
||||||
- Workflow : 2 (Social Network, Claude Workflow)
|
- Workflow : 2 (Social Network, Claude Workflow)
|
||||||
- Education : 1 (WeChat Homework Bot)
|
- Education : 1 (WeChat Homework Bot)
|
||||||
|
- Productivity : 1 (Reddit Save Scraper)
|
||||||
- Meta : 1 (AI_Team_System - multiplicateur de force)
|
- Meta : 1 (AI_Team_System - multiplicateur de force)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user