Document channel feeds
This commit is contained in:
209
README-FEED-MASTER.md
Normal file
209
README-FEED-MASTER.md
Normal file
@@ -0,0 +1,209 @@
|
||||
# TLC Search + Feed Master Integration
|
||||
|
||||
This directory contains an integrated setup combining:
|
||||
- **TLC Search**: Flask app for searching YouTube transcripts (Elasticsearch/Qdrant)
|
||||
- **Feed Master**: RSS aggregator for YouTube channels
|
||||
- **RSS Bridge**: Converts YouTube channels to RSS feeds
|
||||
|
||||
All services share the same source of truth for YouTube channels from `channels.yml` and the adjacent
|
||||
`urls.txt` in this repository.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────┐
|
||||
│ channels.yml │ Source of truth (this repo)
|
||||
│ (python_app repo) │
|
||||
└──────────┬──────────┘
|
||||
│
|
||||
├─────────────────────────────┬────────────────────────┐
|
||||
│ │ │
|
||||
v v v
|
||||
┌──────────────┐ ┌──────────────┐ ┌─────────────────┐
|
||||
│ TLC Search │ │ RSS Bridge │ │ Feed Master │
|
||||
│ (Flask App) │ │ (Port 3001) │───────>│ (Port 8097) │
|
||||
│ Port 8080 │ └──────────────┘ └─────────────────┘
|
||||
│ │ │
|
||||
│ Elasticsearch│ │
|
||||
│ Qdrant │ │
|
||||
└──────────────┘ │
|
||||
v
|
||||
http://localhost:8097/rss/youtube-unified
|
||||
```
|
||||
|
||||
## Services
|
||||
|
||||
### 1. TLC Search (Port 8080)
|
||||
- Indexes and searches YouTube transcripts
|
||||
- Uses Elasticsearch for metadata and Qdrant for vector search
|
||||
- Connects to remote Elasticsearch/Qdrant instances
|
||||
|
||||
### 2. RSS Bridge (Port 3001)
|
||||
- Converts YouTube channels to RSS feeds
|
||||
- Supports both channel IDs and @handles
|
||||
- Used by Feed Master to aggregate feeds
|
||||
|
||||
### 3. Feed Master (Port 8097)
|
||||
- Aggregates all YouTube channel RSS feeds into one unified feed
|
||||
- Updates every 5 minutes
|
||||
- Keeps the most recent 200 items from all channels
|
||||
|
||||
## Setup
|
||||
|
||||
### Prerequisites
|
||||
- Docker and Docker Compose
|
||||
- Python 3.x
|
||||
|
||||
### Configuration
|
||||
|
||||
1. **Environment Variables**: Create `.env` file with:
|
||||
```bash
|
||||
# Elasticsearch
|
||||
ELASTIC_URL=https://your-elasticsearch-url
|
||||
ELASTIC_INDEX=this_little_corner_py
|
||||
ELASTIC_USERNAME=your_username
|
||||
ELASTIC_PASSWORD=your_password
|
||||
|
||||
# Qdrant
|
||||
QDRANT_URL=https://your-qdrant-url
|
||||
QDRANT_COLLECTION=tlc-captions-full
|
||||
|
||||
# Optional UI links
|
||||
RSS_FEED_URL=/rss/youtube-unified
|
||||
CHANNELS_PATH=/app/python_app/channels.yml
|
||||
RSS_FEED_UPSTREAM=http://feed-master:8080
|
||||
```
|
||||
|
||||
2. **Generate Feed Configuration**:
|
||||
```bash
|
||||
# Regenerate feed-master config from the channels list
|
||||
python3 -m python_app.generate_feed_config_simple
|
||||
```
|
||||
|
||||
This reads `channels.yml` and generates `feed-master-config/fm.yml`.
|
||||
|
||||
### Starting Services
|
||||
|
||||
```bash
|
||||
# Start all services
|
||||
docker compose up -d
|
||||
|
||||
# View logs
|
||||
docker compose logs -f
|
||||
|
||||
# View specific service logs
|
||||
docker compose logs -f feed-master
|
||||
docker compose logs -f rss-bridge
|
||||
docker compose logs -f app
|
||||
```
|
||||
|
||||
### Stopping Services
|
||||
|
||||
```bash
|
||||
# Stop all services
|
||||
docker compose down
|
||||
|
||||
# Stop specific service
|
||||
docker compose stop feed-master
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Unified RSS Feed
|
||||
Access the aggregated feed through the TLC app (recommended):
|
||||
- **URL**: http://localhost:8080/rss
|
||||
- **Format**: RSS/Atom XML
|
||||
- **Behavior**: Filters RSS-Bridge error items and prefixes titles with channel name
|
||||
- **Updates**: Every 5 minutes (feed-master schedule)
|
||||
- **Items**: Most recent 200 items across all channels
|
||||
|
||||
Direct feed-master access still works:
|
||||
- **URL**: http://localhost:8097/rss/youtube-unified
|
||||
|
||||
### TLC Search
|
||||
Access the search interface at:
|
||||
- **URL**: http://localhost:8080
|
||||
|
||||
### Channel List Endpoints
|
||||
- **Plain text list**: http://localhost:8080/channels.txt
|
||||
- **JSON metadata**: http://localhost:8080/api/channel-list
|
||||
|
||||
### RSS Bridge
|
||||
Access individual channel feeds or the web interface at:
|
||||
- **URL**: http://localhost:3001
|
||||
|
||||
## Updating Channel List
|
||||
|
||||
When channels are added/removed from `channels.yml`:
|
||||
|
||||
```bash
|
||||
# 1. Regenerate feed configuration
|
||||
cd /var/core/this-little-corner/src/python_app
|
||||
python3 -m python_app.generate_feed_config_simple
|
||||
|
||||
# 2. Restart feed-master to pick up changes
|
||||
docker compose restart feed-master
|
||||
```
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
python_app/
|
||||
├── docker-compose.yml # All services configuration
|
||||
├── channels.yml # Canonical YouTube channel list
|
||||
├── urls.txt # URL list kept in sync with channels.yml
|
||||
├── generate_feed_config_simple.py # Config generator script (run via python -m)
|
||||
├── feed-master-config/
|
||||
│ ├── fm.yml # Feed Master configuration (auto-generated)
|
||||
│ ├── var/ # Feed Master database
|
||||
│ └── images/ # Cached images
|
||||
├── data/ # TLC Search data (read-only)
|
||||
└── README-FEED-MASTER.md # This file
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Feed Master not updating
|
||||
```bash
|
||||
# Check if RSS Bridge is accessible
|
||||
curl http://localhost:3001
|
||||
|
||||
# Restart both services in order
|
||||
docker compose restart rss-bridge
|
||||
sleep 10
|
||||
docker compose restart feed-master
|
||||
```
|
||||
|
||||
### Configuration issues
|
||||
```bash
|
||||
# Regenerate configuration
|
||||
python -m python_app.generate_feed_config_simple
|
||||
|
||||
# Validate the YAML
|
||||
cat feed-master-config/fm.yml
|
||||
|
||||
# Restart feed-master
|
||||
docker compose restart feed-master
|
||||
```
|
||||
|
||||
### View feed-master logs
|
||||
```bash
|
||||
docker compose logs -f feed-master | grep -E "(ERROR|WARN|youtube)"
|
||||
```
|
||||
|
||||
## Integration Notes
|
||||
|
||||
- **Single Source of Truth**: All channel URLs come from `channels.yml` and `urls.txt` in this repo
|
||||
- **Automatic Regeneration**: Run `python3 -m python_app.generate_feed_config_simple` when `channels.yml` changes
|
||||
- **No Manual Editing**: Don't edit `fm.yml` directly - regenerate it from the script
|
||||
- **Handle Support**: Supports both `/channel/ID` and `/@handle` URL formats
|
||||
- **Shared Channels**: Same channels used for transcript indexing (TLC Search) and RSS aggregation (Feed Master)
|
||||
- **Skip Broken RSS**: Set `rss: false` in `channels.yml` to exclude a channel from RSS aggregation
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
- [ ] Automated config regeneration on git pull
|
||||
- [ ] Channel name lookup from YouTube API
|
||||
- [ ] Integration with TLC Search for unified UI
|
||||
- [ ] Webhook notifications for new videos
|
||||
- [ ] OPML export for other RSS readers
|
||||
Reference in New Issue
Block a user