210 lines
6.8 KiB
Markdown
210 lines
6.8 KiB
Markdown
# TLC Search + Feed Master Integration
|
|
|
|
This directory contains an integrated setup combining:
|
|
- **TLC Search**: Flask app for searching YouTube transcripts (Elasticsearch/Qdrant)
|
|
- **Feed Master**: RSS aggregator for YouTube channels
|
|
- **RSS Bridge**: Converts YouTube channels to RSS feeds
|
|
|
|
All services share the same source of truth for YouTube channels from `channels.yml` and the adjacent
|
|
`urls.txt` in this repository.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────┐
|
|
│ channels.yml │ Source of truth (this repo)
|
|
│ (python_app repo) │
|
|
└──────────┬──────────┘
|
|
│
|
|
├─────────────────────────────┬────────────────────────┐
|
|
│ │ │
|
|
v v v
|
|
┌──────────────┐ ┌──────────────┐ ┌─────────────────┐
|
|
│ TLC Search │ │ RSS Bridge │ │ Feed Master │
|
|
│ (Flask App) │ │ (Port 3001) │───────>│ (Port 8097) │
|
|
│ Port 8080 │ └──────────────┘ └─────────────────┘
|
|
│ │ │
|
|
│ Elasticsearch│ │
|
|
│ Qdrant │ │
|
|
└──────────────┘ │
|
|
v
|
|
http://localhost:8097/rss/youtube-unified
|
|
```
|
|
|
|
## Services
|
|
|
|
### 1. TLC Search (Port 8080)
|
|
- Indexes and searches YouTube transcripts
|
|
- Uses Elasticsearch for metadata and Qdrant for vector search
|
|
- Connects to remote Elasticsearch/Qdrant instances
|
|
|
|
### 2. RSS Bridge (Port 3001)
|
|
- Converts YouTube channels to RSS feeds
|
|
- Supports both channel IDs and @handles
|
|
- Used by Feed Master to aggregate feeds
|
|
|
|
### 3. Feed Master (Port 8097)
|
|
- Aggregates all YouTube channel RSS feeds into one unified feed
|
|
- Updates every 5 minutes
|
|
- Keeps the most recent 200 items from all channels
|
|
|
|
## Setup
|
|
|
|
### Prerequisites
|
|
- Docker and Docker Compose
|
|
- Python 3.x
|
|
|
|
### Configuration
|
|
|
|
1. **Environment Variables**: Create `.env` file with:
|
|
```bash
|
|
# Elasticsearch
|
|
ELASTIC_URL=https://your-elasticsearch-url
|
|
ELASTIC_INDEX=this_little_corner_py
|
|
ELASTIC_USERNAME=your_username
|
|
ELASTIC_PASSWORD=your_password
|
|
|
|
# Qdrant
|
|
QDRANT_URL=https://your-qdrant-url
|
|
QDRANT_COLLECTION=tlc-captions-full
|
|
|
|
# Optional UI links
|
|
RSS_FEED_URL=/rss/youtube-unified
|
|
CHANNELS_PATH=/app/python_app/channels.yml
|
|
RSS_FEED_UPSTREAM=http://feed-master:8080
|
|
```
|
|
|
|
2. **Generate Feed Configuration**:
|
|
```bash
|
|
# Regenerate feed-master config from the channels list
|
|
python3 -m python_app.generate_feed_config_simple
|
|
```
|
|
|
|
This reads `channels.yml` and generates `feed-master-config/fm.yml`.
|
|
|
|
### Starting Services
|
|
|
|
```bash
|
|
# Start all services
|
|
docker compose up -d
|
|
|
|
# View logs
|
|
docker compose logs -f
|
|
|
|
# View specific service logs
|
|
docker compose logs -f feed-master
|
|
docker compose logs -f rss-bridge
|
|
docker compose logs -f app
|
|
```
|
|
|
|
### Stopping Services
|
|
|
|
```bash
|
|
# Stop all services
|
|
docker compose down
|
|
|
|
# Stop specific service
|
|
docker compose stop feed-master
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Unified RSS Feed
|
|
Access the aggregated feed through the TLC app (recommended):
|
|
- **URL**: http://localhost:8080/rss
|
|
- **Format**: RSS/Atom XML
|
|
- **Behavior**: Filters RSS-Bridge error items and prefixes titles with channel name
|
|
- **Updates**: Every 5 minutes (feed-master schedule)
|
|
- **Items**: Most recent 200 items across all channels
|
|
|
|
Direct feed-master access still works:
|
|
- **URL**: http://localhost:8097/rss/youtube-unified
|
|
|
|
### TLC Search
|
|
Access the search interface at:
|
|
- **URL**: http://localhost:8080
|
|
|
|
### Channel List Endpoints
|
|
- **Plain text list**: http://localhost:8080/channels.txt
|
|
- **JSON metadata**: http://localhost:8080/api/channel-list
|
|
|
|
### RSS Bridge
|
|
Access individual channel feeds or the web interface at:
|
|
- **URL**: http://localhost:3001
|
|
|
|
## Updating Channel List
|
|
|
|
When channels are added/removed from `channels.yml`:
|
|
|
|
```bash
|
|
# 1. Regenerate feed configuration
|
|
cd /var/core/this-little-corner/src/python_app
|
|
python3 -m python_app.generate_feed_config_simple
|
|
|
|
# 2. Restart feed-master to pick up changes
|
|
docker compose restart feed-master
|
|
```
|
|
|
|
## File Structure
|
|
|
|
```
|
|
python_app/
|
|
├── docker-compose.yml # All services configuration
|
|
├── channels.yml # Canonical YouTube channel list
|
|
├── urls.txt # URL list kept in sync with channels.yml
|
|
├── generate_feed_config_simple.py # Config generator script (run via python -m)
|
|
├── feed-master-config/
|
|
│ ├── fm.yml # Feed Master configuration (auto-generated)
|
|
│ ├── var/ # Feed Master database
|
|
│ └── images/ # Cached images
|
|
├── data/ # TLC Search data (read-only)
|
|
└── README-FEED-MASTER.md # This file
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Feed Master not updating
|
|
```bash
|
|
# Check if RSS Bridge is accessible
|
|
curl http://localhost:3001
|
|
|
|
# Restart both services in order
|
|
docker compose restart rss-bridge
|
|
sleep 10
|
|
docker compose restart feed-master
|
|
```
|
|
|
|
### Configuration issues
|
|
```bash
|
|
# Regenerate configuration
|
|
python -m python_app.generate_feed_config_simple
|
|
|
|
# Validate the YAML
|
|
cat feed-master-config/fm.yml
|
|
|
|
# Restart feed-master
|
|
docker compose restart feed-master
|
|
```
|
|
|
|
### View feed-master logs
|
|
```bash
|
|
docker compose logs -f feed-master | grep -E "(ERROR|WARN|youtube)"
|
|
```
|
|
|
|
## Integration Notes
|
|
|
|
- **Single Source of Truth**: All channel URLs come from `channels.yml` and `urls.txt` in this repo
|
|
- **Automatic Regeneration**: Run `python3 -m python_app.generate_feed_config_simple` when `channels.yml` changes
|
|
- **No Manual Editing**: Don't edit `fm.yml` directly - regenerate it from the script
|
|
- **Handle Support**: Supports both `/channel/ID` and `/@handle` URL formats
|
|
- **Shared Channels**: Same channels used for transcript indexing (TLC Search) and RSS aggregation (Feed Master)
|
|
- **Skip Broken RSS**: Set `rss: false` in `channels.yml` to exclude a channel from RSS aggregation
|
|
|
|
## Future Enhancements
|
|
|
|
- [ ] Automated config regeneration on git pull
|
|
- [ ] Channel name lookup from YouTube API
|
|
- [ ] Integration with TLC Search for unified UI
|
|
- [ ] Webhook notifications for new videos
|
|
- [ ] OPML export for other RSS readers
|