TLC-Search/README-FEED-MASTER.md
2026-01-08 22:46:30 -05:00

6.8 KiB

TLC Search + Feed Master Integration

This directory contains an integrated setup combining:

  • TLC Search: Flask app for searching YouTube transcripts (Elasticsearch/Qdrant)
  • Feed Master: RSS aggregator for YouTube channels
  • RSS Bridge: Converts YouTube channels to RSS feeds

All services share the same source of truth for YouTube channels from channels.yml and the adjacent urls.txt in this repository.

Architecture

┌─────────────────────┐
│   channels.yml      │  Source of truth (this repo)
│  (python_app repo)  │
└──────────┬──────────┘
           │
           ├─────────────────────────────┬────────────────────────┐
           │                             │                        │
           v                             v                        v
    ┌──────────────┐            ┌──────────────┐        ┌─────────────────┐
    │ TLC Search   │            │ RSS Bridge   │        │  Feed Master    │
    │ (Flask App)  │            │ (Port 3001)  │───────>│  (Port 8097)    │
    │ Port 8080    │            └──────────────┘        └─────────────────┘
    │              │                                              │
    │ Elasticsearch│                                              │
    │ Qdrant       │                                              │
    └──────────────┘                                              │
                                                                  v
                                                    http://localhost:8097/rss/youtube-unified

Services

1. TLC Search (Port 8080)

  • Indexes and searches YouTube transcripts
  • Uses Elasticsearch for metadata and Qdrant for vector search
  • Connects to remote Elasticsearch/Qdrant instances

2. RSS Bridge (Port 3001)

  • Converts YouTube channels to RSS feeds
  • Supports both channel IDs and @handles
  • Used by Feed Master to aggregate feeds

3. Feed Master (Port 8097)

  • Aggregates all YouTube channel RSS feeds into one unified feed
  • Updates every 5 minutes
  • Keeps the most recent 200 items from all channels

Setup

Prerequisites

  • Docker and Docker Compose
  • Python 3.x

Configuration

  1. Environment Variables: Create .env file with:
# Elasticsearch
ELASTIC_URL=https://your-elasticsearch-url
ELASTIC_INDEX=this_little_corner_py
ELASTIC_USERNAME=your_username
ELASTIC_PASSWORD=your_password

# Qdrant
QDRANT_URL=https://your-qdrant-url
QDRANT_COLLECTION=tlc-captions-full

# Optional UI links
RSS_FEED_URL=/rss/youtube-unified
CHANNELS_PATH=/app/python_app/channels.yml
RSS_FEED_UPSTREAM=http://feed-master:8080
  1. Generate Feed Configuration:
# Regenerate feed-master config from the channels list
python3 -m python_app.generate_feed_config_simple

This reads channels.yml and generates feed-master-config/fm.yml.

Starting Services

# Start all services
docker compose up -d

# View logs
docker compose logs -f

# View specific service logs
docker compose logs -f feed-master
docker compose logs -f rss-bridge
docker compose logs -f app

Stopping Services

# Stop all services
docker compose down

# Stop specific service
docker compose stop feed-master

Usage

Unified RSS Feed

Access the aggregated feed through the TLC app (recommended):

  • URL: http://localhost:8080/rss
  • Format: RSS/Atom XML
  • Behavior: Filters RSS-Bridge error items and prefixes titles with channel name
  • Updates: Every 5 minutes (feed-master schedule)
  • Items: Most recent 200 items across all channels

Direct feed-master access still works:

Access the search interface at:

Channel List Endpoints

RSS Bridge

Access individual channel feeds or the web interface at:

Updating Channel List

When channels are added/removed from channels.yml:

# 1. Regenerate feed configuration
cd /var/core/this-little-corner/src/python_app
    python3 -m python_app.generate_feed_config_simple

# 2. Restart feed-master to pick up changes
docker compose restart feed-master

File Structure

python_app/
├── docker-compose.yml              # All services configuration
├── channels.yml                    # Canonical YouTube channel list
├── urls.txt                        # URL list kept in sync with channels.yml
├── generate_feed_config_simple.py  # Config generator script (run via python -m)
├── feed-master-config/
│   ├── fm.yml                      # Feed Master configuration (auto-generated)
│   ├── var/                        # Feed Master database
│   └── images/                     # Cached images
├── data/                           # TLC Search data (read-only)
└── README-FEED-MASTER.md          # This file

Troubleshooting

Feed Master not updating

# Check if RSS Bridge is accessible
curl http://localhost:3001

# Restart both services in order
docker compose restart rss-bridge
sleep 10
docker compose restart feed-master

Configuration issues

# Regenerate configuration
python -m python_app.generate_feed_config_simple

# Validate the YAML
cat feed-master-config/fm.yml

# Restart feed-master
docker compose restart feed-master

View feed-master logs

docker compose logs -f feed-master | grep -E "(ERROR|WARN|youtube)"

Integration Notes

  • Single Source of Truth: All channel URLs come from channels.yml and urls.txt in this repo
  • Automatic Regeneration: Run python3 -m python_app.generate_feed_config_simple when channels.yml changes
  • No Manual Editing: Don't edit fm.yml directly - regenerate it from the script
  • Handle Support: Supports both /channel/ID and /@handle URL formats
  • Shared Channels: Same channels used for transcript indexing (TLC Search) and RSS aggregation (Feed Master)
  • Skip Broken RSS: Set rss: false in channels.yml to exclude a channel from RSS aggregation

Future Enhancements

  • Automated config regeneration on git pull
  • Channel name lookup from YouTube API
  • Integration with TLC Search for unified UI
  • Webhook notifications for new videos
  • OPML export for other RSS readers