TLC-Search/README-FEED-MASTER.md
2026-01-08 22:46:30 -05:00

210 lines
6.8 KiB
Markdown

# TLC Search + Feed Master Integration
This directory contains an integrated setup combining:
- **TLC Search**: Flask app for searching YouTube transcripts (Elasticsearch/Qdrant)
- **Feed Master**: RSS aggregator for YouTube channels
- **RSS Bridge**: Converts YouTube channels to RSS feeds
All services share the same source of truth for YouTube channels from `channels.yml` and the adjacent
`urls.txt` in this repository.
## Architecture
```
┌─────────────────────┐
│ channels.yml │ Source of truth (this repo)
│ (python_app repo) │
└──────────┬──────────┘
├─────────────────────────────┬────────────────────────┐
│ │ │
v v v
┌──────────────┐ ┌──────────────┐ ┌─────────────────┐
│ TLC Search │ │ RSS Bridge │ │ Feed Master │
│ (Flask App) │ │ (Port 3001) │───────>│ (Port 8097) │
│ Port 8080 │ └──────────────┘ └─────────────────┘
│ │ │
│ Elasticsearch│ │
│ Qdrant │ │
└──────────────┘ │
v
http://localhost:8097/rss/youtube-unified
```
## Services
### 1. TLC Search (Port 8080)
- Indexes and searches YouTube transcripts
- Uses Elasticsearch for metadata and Qdrant for vector search
- Connects to remote Elasticsearch/Qdrant instances
### 2. RSS Bridge (Port 3001)
- Converts YouTube channels to RSS feeds
- Supports both channel IDs and @handles
- Used by Feed Master to aggregate feeds
### 3. Feed Master (Port 8097)
- Aggregates all YouTube channel RSS feeds into one unified feed
- Updates every 5 minutes
- Keeps the most recent 200 items from all channels
## Setup
### Prerequisites
- Docker and Docker Compose
- Python 3.x
### Configuration
1. **Environment Variables**: Create `.env` file with:
```bash
# Elasticsearch
ELASTIC_URL=https://your-elasticsearch-url
ELASTIC_INDEX=this_little_corner_py
ELASTIC_USERNAME=your_username
ELASTIC_PASSWORD=your_password
# Qdrant
QDRANT_URL=https://your-qdrant-url
QDRANT_COLLECTION=tlc-captions-full
# Optional UI links
RSS_FEED_URL=/rss/youtube-unified
CHANNELS_PATH=/app/python_app/channels.yml
RSS_FEED_UPSTREAM=http://feed-master:8080
```
2. **Generate Feed Configuration**:
```bash
# Regenerate feed-master config from the channels list
python3 -m python_app.generate_feed_config_simple
```
This reads `channels.yml` and generates `feed-master-config/fm.yml`.
### Starting Services
```bash
# Start all services
docker compose up -d
# View logs
docker compose logs -f
# View specific service logs
docker compose logs -f feed-master
docker compose logs -f rss-bridge
docker compose logs -f app
```
### Stopping Services
```bash
# Stop all services
docker compose down
# Stop specific service
docker compose stop feed-master
```
## Usage
### Unified RSS Feed
Access the aggregated feed through the TLC app (recommended):
- **URL**: http://localhost:8080/rss
- **Format**: RSS/Atom XML
- **Behavior**: Filters RSS-Bridge error items and prefixes titles with channel name
- **Updates**: Every 5 minutes (feed-master schedule)
- **Items**: Most recent 200 items across all channels
Direct feed-master access still works:
- **URL**: http://localhost:8097/rss/youtube-unified
### TLC Search
Access the search interface at:
- **URL**: http://localhost:8080
### Channel List Endpoints
- **Plain text list**: http://localhost:8080/channels.txt
- **JSON metadata**: http://localhost:8080/api/channel-list
### RSS Bridge
Access individual channel feeds or the web interface at:
- **URL**: http://localhost:3001
## Updating Channel List
When channels are added/removed from `channels.yml`:
```bash
# 1. Regenerate feed configuration
cd /var/core/this-little-corner/src/python_app
python3 -m python_app.generate_feed_config_simple
# 2. Restart feed-master to pick up changes
docker compose restart feed-master
```
## File Structure
```
python_app/
├── docker-compose.yml # All services configuration
├── channels.yml # Canonical YouTube channel list
├── urls.txt # URL list kept in sync with channels.yml
├── generate_feed_config_simple.py # Config generator script (run via python -m)
├── feed-master-config/
│ ├── fm.yml # Feed Master configuration (auto-generated)
│ ├── var/ # Feed Master database
│ └── images/ # Cached images
├── data/ # TLC Search data (read-only)
└── README-FEED-MASTER.md # This file
```
## Troubleshooting
### Feed Master not updating
```bash
# Check if RSS Bridge is accessible
curl http://localhost:3001
# Restart both services in order
docker compose restart rss-bridge
sleep 10
docker compose restart feed-master
```
### Configuration issues
```bash
# Regenerate configuration
python -m python_app.generate_feed_config_simple
# Validate the YAML
cat feed-master-config/fm.yml
# Restart feed-master
docker compose restart feed-master
```
### View feed-master logs
```bash
docker compose logs -f feed-master | grep -E "(ERROR|WARN|youtube)"
```
## Integration Notes
- **Single Source of Truth**: All channel URLs come from `channels.yml` and `urls.txt` in this repo
- **Automatic Regeneration**: Run `python3 -m python_app.generate_feed_config_simple` when `channels.yml` changes
- **No Manual Editing**: Don't edit `fm.yml` directly - regenerate it from the script
- **Handle Support**: Supports both `/channel/ID` and `/@handle` URL formats
- **Shared Channels**: Same channels used for transcript indexing (TLC Search) and RSS aggregation (Feed Master)
- **Skip Broken RSS**: Set `rss: false` in `channels.yml` to exclude a channel from RSS aggregation
## Future Enhancements
- [ ] Automated config regeneration on git pull
- [ ] Channel name lookup from YouTube API
- [ ] Integration with TLC Search for unified UI
- [ ] Webhook notifications for new videos
- [ ] OPML export for other RSS readers