Compare commits

..

34 Commits

Author SHA1 Message Date
d23888c68d Add last_posted date to /api/channel-list from Elasticsearch
Some checks failed
docker-build / build (push) Has been cancelled
Queries the latest video date per channel and includes it in the
channel-list JSON response.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 12:14:53 -04:00
c019730666 Fix remaining placeholder channel names
Some checks failed
docker-build / build (push) Has been cancelled
- UCCebR16tXbv5Ykk9_WtCCug -> Christian T. Golden
- UC4YwC5zA9S_2EwthE27Xlew -> CMA

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 12:04:50 -04:00
bb2850ef98 Add /channels HTML page and fix placeholder channel names
Some checks failed
docker-build / build (push) Has been cancelled
- Add /channels route serving a simple HTML page with channel names
  linked to their YouTube pages
- Fix names for UCehAungJpAeC (Wholly Unfocused) and UCiJmdXTb76i
  (Bridges of Meaning Hub) from Elasticsearch data

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 12:01:45 -04:00
7fdb31bf18 Add 3 missing channels from jet-alone to channels.yml source of truth
Some checks failed
docker-build / build (push) Has been cancelled
Syncs channels.yml (canonical) and urls.txt with channels that existed
only on jet-alone: LeviathanForPlay, UCehAungJpAeC-F3R5FwvvCQ,
UC4YwC5zA9S_2EwthE27Xlew.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 11:39:06 -04:00
Ubuntu
090f5943c3 Add notes page
Some checks failed
docker-build / build (push) Has been cancelled
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 20:40:53 +00:00
d168287636 Add Rigel Windsong Thurston
Some checks failed
docker-build / build (push) Has been cancelled
2026-01-10 13:36:10 -05:00
6534db6f64 Ignore .gemini artifacts
Some checks failed
docker-build / build (push) Has been cancelled
2026-01-08 22:55:33 -05:00
30503628b5 Add unified channel feed 2026-01-08 22:53:30 -05:00
63fe922860 Document channel feeds 2026-01-08 22:46:30 -05:00
1ac076e5f2 Harden search responses 2026-01-08 15:42:21 -05:00
1c95f47766 Add API rate limits 2026-01-08 15:24:05 -05:00
6a3d1ee491 Disable vector search 2026-01-08 15:20:06 -05:00
8e4c57a93a Security: add security headers, CSP, request size limits 2026-01-08 14:53:44 -05:00
1565c8db38 Security: disable debug mode, sanitize query input, validate Qdrant filters, add size/offset bounds 2026-01-08 14:41:42 -05:00
d26edda029 Add graph traversal endpoints and sort metrics by channel name 2026-01-08 14:22:01 -05:00
9dd74111e7 Change default sort to newer first 2026-01-08 14:12:15 -05:00
93774c025f Respect external filter in metrics and graph
Some checks failed
docker-build / build (push) Has been cancelled
2025-11-20 09:54:41 -05:00
b0c9d319ef Remove full graph node cap
Some checks failed
docker-build / build (push) Has been cancelled
2025-11-20 09:42:14 -05:00
82c334b131 Add full reference graph mode
Some checks failed
docker-build / build (push) Has been cancelled
2025-11-19 15:23:21 -05:00
7f74aaced8 Persist search settings locally
Some checks failed
docker-build / build (push) Has been cancelled
2025-11-19 10:20:00 -05:00
c88d1886c9 Fix backlink badge query to target referencing videos
Some checks failed
docker-build / build (push) Has been cancelled
2025-11-18 23:47:07 -05:00
c6b46edacc Default external off and filter channels/backlink queries
Some checks failed
docker-build / build (push) Has been cancelled
2025-11-18 23:42:49 -05:00
4c20329f36 Add external reference toggle and badges
Some checks failed
docker-build / build (push) Has been cancelled
2025-11-18 23:07:13 -05:00
b267a0ecc6 Add Gitea workflow for Docker image builds
Some checks failed
docker-build / build (push) Has been cancelled
2025-11-18 19:14:20 -05:00
f299126ab2 Point compose to remote Elasticsearch and Qdrant 2025-11-18 13:25:41 -05:00
86fd017f3c Add Docker and compose setup 2025-11-18 13:21:14 -05:00
40d4f41f6e Add graph and vector search features 2025-11-09 14:24:50 -05:00
14d37f23e4 Add clickable reference badges and improve UI layout
- Add clickable badges for backlinks and references that trigger query string searches
- Improve toggle checkbox layout with better styling
- Add description block styling with scrollable container
- Update results styling with bordered cards and shadows
- Add favicon support across pages
- Enhance .env loading with logging for debugging

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 14:56:43 -05:00
d8d2c5e34c Fix results overflow and add debug logging for reference badges
CSS Changes:
- Added max-width and overflow handling to .badge-row
- Added word-wrap and overflow protection to .item
- Added overflow-x: hidden to .window-body
- Badges now use white-space: nowrap to prevent text wrapping
- Item titles now break words properly with word-break

JavaScript Changes:
- Added console.log debugging for reference counts
- Logs show whether fields are present and their values
- Helps diagnose why badges aren't appearing

This should fix the overflow issue and help debug badge visibility.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 11:18:17 -05:00
595b19f7c7 Fix sorting by referenced_by_count with unmapped_type handling
- Added unmapped_type parameter to referenced_by_count sort
- This handles documents that don't have the field yet
- Updated ingest.py to include reference fields when indexing:
  * internal_references
  * internal_references_count
  * referenced_by
  * referenced_by_count
- Updated index mapping to include reference fields
- Documents without the field will sort as 0 (appear last)

Fixes BadRequestError: No mapping found for [referenced_by_count]

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 11:10:56 -05:00
d616b87701 Add python-dotenv support for automatic .env loading
- Added python-dotenv to requirements.txt
- Config now automatically loads .env file if present
- Allows local development without manually exporting env vars
- Gracefully falls back if python-dotenv not installed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 11:03:42 -05:00
7988e2751a Add video reference tracking and display
- Add "Most referenced" sort option to sort by backlink count
- Backend now supports sorting by referenced_by_count field
- Search results now display reference counts as badges:
  - Shows number of backlinks (videos linking to this one)
  - Shows number of internal references (outbound links)
- Reference badges appear alongside transcript source badges

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 10:52:00 -05:00
2846e13a81 Fix timestamp parsing for string format timestamps
Both primary and secondary transcripts use 'timestamp' field
with string format "HH:MM:SS.mmm" instead of numeric seconds.

Changes:
- Add parseTimestampToSeconds() to handle string timestamps
- Parse "HH:MM:SS.mmm" format (e.g., "00:00:39.480")
- Also handle "MM:SS" format
- Still support numeric timestamps (seconds or milliseconds)
- Check 'timestamp' field first (primary format in data)

This fixes the NaN issue and displays correct timestamps
for both primary and secondary transcripts.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 01:16:26 -05:00
e241d206c5 Fix NaN timestamps with proper type checking
Previous || chain could pass through invalid values causing NaN.
Now explicitly checks each possible timestamp field with:
- null check (field != null)
- NaN check (!isNaN(parseFloat(field)))
- Takes first valid numeric value found

This ensures timestamps always have a valid number, defaulting
to 0 if no valid timestamp field is found.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 01:09:21 -05:00
28 changed files with 5411 additions and 391 deletions

13
.dockerignore Normal file
View File

@@ -0,0 +1,13 @@
.git
.gitignore
.venv
__pycache__
*.pyc
*.pyo
.DS_Store
node_modules
data
videos
*.log
feed-master-config/var
feed-master-config/images

View File

@@ -0,0 +1,37 @@
# Build and push the TLC Search Docker image whenever changes land on master.
name: docker-build
on:
push:
branches:
- master
env:
IMAGE_NAME: gitea.ghost.tel/knight/tlc-search
jobs:
build:
runs-on: docker
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to registry
uses: docker/login-action@v2
with:
registry: gitea.ghost.tel
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Build and push image
uses: docker/build-push-action@v5
with:
context: .
file: Dockerfile
push: true
tags: |
${{ env.IMAGE_NAME }}:latest
${{ env.IMAGE_NAME }}:${{ github.sha }}

5
.gitignore vendored
View File

@@ -33,6 +33,7 @@ env/
# IDE
.vscode/
.idea/
.gemini/
*.swp
*.swo
*~
@@ -51,6 +52,10 @@ Thumbs.db
# Logs
*.log
# Feed Master runtime cache
feed-master-config/var/
feed-master-config/images/
# Testing
.pytest_cache/
.coverage

31
AGENTS.md Normal file
View File

@@ -0,0 +1,31 @@
# Repository Guidelines
## Project Structure & Module Organization
- Core modules live under `python_app/`: `config.py` centralizes settings, `transcript_collector.py` gathers transcripts, `ingest.py` handles Elasticsearch bulk loads, and `search_app.py` exposes the Flask UI.
- Static assets belong in `static/` (`index.html`, `frequency.html`, companion JS/CSS). Keep HTML here and wire it up through Flask routes.
- Runtime artifacts land in `data/` (`raw/` for downloads, `video_metadata/` for cleaned payloads). Preserve the JSON schema emitted by the collector.
- When adding utilities, place them in `python_app/` and use package-relative imports so scripts continue to run via `python -m`.
## Build, Test, and Development Commands
- `python -m venv .venv && source .venv/bin/activate`: bootstrap the virtualenv used by all scripts.
- `pip install -r requirements.txt`: install Flask, Elasticsearch tooling, Google API clients, and dotenv support.
- `python -m python_app.transcript_collector --channel UC... --output data/raw`: fetch transcript JSON for a channel; rerun to refresh cached data.
- `python -m python_app.ingest --source data/video_metadata --index this_little_corner_py`: index prepared metadata and auto-create mappings when needed.
- `python -m python_app.search_app`: launch the Flask server on port 8080 for UI smoke tests.
## Coding Style & Naming Conventions
- Follow PEP 8 with 4-space indentation, `snake_case` for functions/modules, and `CamelCase` for classes; reserve UPPER_SNAKE_CASE for configuration constants.
- Keep Elasticsearch payload keys lower-case with underscores, and centralize shared values in `config.py` rather than scattering literals.
## Testing Guidelines
- No automated suite is committed yet; when adding coverage, create `tests/` modules using `pytest` with files named `test_*.py`.
- Focus tests on collector pagination, ingest transformations, and Flask route helpers, and run `python -m pytest` locally before opening a PR.
- Manually verify by ingesting a small sample into a local Elasticsearch node and checking facets, highlights, and transcript retrieval via the UI.
## Commit & Pull Request Guidelines
- Mirror the existing history: short, imperative commit subjects (e.g. “Fix results overflow”, “Add video reference tracking”).
- PRs should describe scope, list environment variables or indices touched, link issues, and attach before/after screenshots whenever UI output changes. Highlight Elasticsearch mapping or data migration impacts for both search and frontend reviewers.
## Configuration & Security Tips
- Load credentials through environment variables (`ELASTIC_URL`, `ELASTIC_USERNAME`, `ELASTIC_PASSWORD`, `ELASTIC_API_KEY`, `YOUTUBE_API_KEY`) or a `.env` file, and keep secrets out of version control.
- Adjust `ELASTIC_VERIFY_CERTS`, `ELASTIC_CA_CERT`, and `ELASTIC_DEBUG` only while debugging, and prefer branch-specific indices (`this_little_corner_py_<initials>`) to avoid clobbering shared data.

32
Dockerfile Normal file
View File

@@ -0,0 +1,32 @@
FROM python:3.11-slim
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
WORKDIR /app
# System deps kept lean to support torch/sentence-transformers wheels.
RUN apt-get update \
&& apt-get install -y --no-install-recommends build-essential git curl \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt /app/requirements.txt
RUN pip install --no-cache-dir -r /app/requirements.txt
# Copy the package into /app/python_app so `python -m python_app.search_app` works.
COPY . /app/python_app
ENV ELASTIC_URL=http://elasticsearch:9200 \
ELASTIC_INDEX=this_little_corner_py \
ELASTIC_VERIFY_CERTS=0 \
QDRANT_URL=http://qdrant:6333 \
QDRANT_COLLECTION=tlc-captions-full \
QDRANT_VECTOR_NAME= \
QDRANT_VECTOR_SIZE=1024 \
QDRANT_EMBED_MODEL=BAAI/bge-large-en-v1.5 \
LOCAL_DATA_DIR=/app/data/video_metadata
EXPOSE 8080
WORKDIR /app
CMD ["python", "-m", "python_app.search_app"]

87
Makefile Normal file
View File

@@ -0,0 +1,87 @@
# Makefile for TLC Search + Feed Master
.PHONY: help config up down restart logs status update-channels
help:
@echo "TLC Search + Feed Master - Management Commands"
@echo ""
@echo "Configuration:"
@echo " make config - Regenerate feed-master configuration from channels.yml"
@echo ""
@echo "Service Management:"
@echo " make up - Start all services"
@echo " make down - Stop all services"
@echo " make restart - Restart all services"
@echo " make logs - View all service logs"
@echo " make status - Check service status"
@echo ""
@echo "Updates:"
@echo " make update-channels - Regenerate config and restart feed-master"
@echo ""
@echo "Individual Services:"
@echo " make logs-feed - View feed-master logs"
@echo " make logs-bridge - View rss-bridge logs"
@echo " make logs-app - View TLC Search logs"
@echo " make restart-feed - Restart feed-master only"
# Generate feed-master configuration from channels.yml
config:
@echo "Generating feed-master configuration..."
python3 -m python_app.generate_feed_config_simple
@echo "Configuration updated!"
# Start all services
up:
docker compose up -d
@echo ""
@echo "Services started!"
@echo " - RSS Bridge: http://localhost:3001"
@echo " - Feed Master: http://localhost:8097/rss/youtube-unified"
@echo " - TLC Search: http://localhost:8080"
# Stop all services
down:
docker compose down
# Restart all services
restart:
docker compose restart
# View all logs
logs:
docker compose logs -f
# View feed-master logs
logs-feed:
docker compose logs -f feed-master
# View rss-bridge logs
logs-bridge:
docker compose logs -f rss-bridge
# View TLC Search logs
logs-app:
docker compose logs -f app
# Check service status
status:
@docker compose ps
@echo ""
@echo "Endpoints:"
@echo " - RSS Bridge: http://localhost:3001"
@echo " - Feed Master: http://localhost:8097/rss/youtube-unified"
@echo " - TLC Search: http://localhost:8080"
# Restart only feed-master
restart-feed:
docker compose restart feed-master
# Pull latest channel URLs and regenerate configuration
update-channels:
@echo "Regenerating feed-master configuration..."
python3 -m python_app.generate_feed_config_simple
@echo ""
@echo "Restarting feed-master..."
docker compose restart feed-master
@echo ""
@echo "Update complete!"

209
README-FEED-MASTER.md Normal file
View File

@@ -0,0 +1,209 @@
# TLC Search + Feed Master Integration
This directory contains an integrated setup combining:
- **TLC Search**: Flask app for searching YouTube transcripts (Elasticsearch/Qdrant)
- **Feed Master**: RSS aggregator for YouTube channels
- **RSS Bridge**: Converts YouTube channels to RSS feeds
All services share the same source of truth for YouTube channels from `channels.yml` and the adjacent
`urls.txt` in this repository.
## Architecture
```
┌─────────────────────┐
│ channels.yml │ Source of truth (this repo)
│ (python_app repo) │
└──────────┬──────────┘
├─────────────────────────────┬────────────────────────┐
│ │ │
v v v
┌──────────────┐ ┌──────────────┐ ┌─────────────────┐
│ TLC Search │ │ RSS Bridge │ │ Feed Master │
│ (Flask App) │ │ (Port 3001) │───────>│ (Port 8097) │
│ Port 8080 │ └──────────────┘ └─────────────────┘
│ │ │
│ Elasticsearch│ │
│ Qdrant │ │
└──────────────┘ │
v
http://localhost:8097/rss/youtube-unified
```
## Services
### 1. TLC Search (Port 8080)
- Indexes and searches YouTube transcripts
- Uses Elasticsearch for metadata and Qdrant for vector search
- Connects to remote Elasticsearch/Qdrant instances
### 2. RSS Bridge (Port 3001)
- Converts YouTube channels to RSS feeds
- Supports both channel IDs and @handles
- Used by Feed Master to aggregate feeds
### 3. Feed Master (Port 8097)
- Aggregates all YouTube channel RSS feeds into one unified feed
- Updates every 5 minutes
- Keeps the most recent 200 items from all channels
## Setup
### Prerequisites
- Docker and Docker Compose
- Python 3.x
### Configuration
1. **Environment Variables**: Create `.env` file with:
```bash
# Elasticsearch
ELASTIC_URL=https://your-elasticsearch-url
ELASTIC_INDEX=this_little_corner_py
ELASTIC_USERNAME=your_username
ELASTIC_PASSWORD=your_password
# Qdrant
QDRANT_URL=https://your-qdrant-url
QDRANT_COLLECTION=tlc-captions-full
# Optional UI links
RSS_FEED_URL=/rss/youtube-unified
CHANNELS_PATH=/app/python_app/channels.yml
RSS_FEED_UPSTREAM=http://feed-master:8080
```
2. **Generate Feed Configuration**:
```bash
# Regenerate feed-master config from the channels list
python3 -m python_app.generate_feed_config_simple
```
This reads `channels.yml` and generates `feed-master-config/fm.yml`.
### Starting Services
```bash
# Start all services
docker compose up -d
# View logs
docker compose logs -f
# View specific service logs
docker compose logs -f feed-master
docker compose logs -f rss-bridge
docker compose logs -f app
```
### Stopping Services
```bash
# Stop all services
docker compose down
# Stop specific service
docker compose stop feed-master
```
## Usage
### Unified RSS Feed
Access the aggregated feed through the TLC app (recommended):
- **URL**: http://localhost:8080/rss
- **Format**: RSS/Atom XML
- **Behavior**: Filters RSS-Bridge error items and prefixes titles with channel name
- **Updates**: Every 5 minutes (feed-master schedule)
- **Items**: Most recent 200 items across all channels
Direct feed-master access still works:
- **URL**: http://localhost:8097/rss/youtube-unified
### TLC Search
Access the search interface at:
- **URL**: http://localhost:8080
### Channel List Endpoints
- **Plain text list**: http://localhost:8080/channels.txt
- **JSON metadata**: http://localhost:8080/api/channel-list
### RSS Bridge
Access individual channel feeds or the web interface at:
- **URL**: http://localhost:3001
## Updating Channel List
When channels are added/removed from `channels.yml`:
```bash
# 1. Regenerate feed configuration
cd /var/core/this-little-corner/src/python_app
python3 -m python_app.generate_feed_config_simple
# 2. Restart feed-master to pick up changes
docker compose restart feed-master
```
## File Structure
```
python_app/
├── docker-compose.yml # All services configuration
├── channels.yml # Canonical YouTube channel list
├── urls.txt # URL list kept in sync with channels.yml
├── generate_feed_config_simple.py # Config generator script (run via python -m)
├── feed-master-config/
│ ├── fm.yml # Feed Master configuration (auto-generated)
│ ├── var/ # Feed Master database
│ └── images/ # Cached images
├── data/ # TLC Search data (read-only)
└── README-FEED-MASTER.md # This file
```
## Troubleshooting
### Feed Master not updating
```bash
# Check if RSS Bridge is accessible
curl http://localhost:3001
# Restart both services in order
docker compose restart rss-bridge
sleep 10
docker compose restart feed-master
```
### Configuration issues
```bash
# Regenerate configuration
python -m python_app.generate_feed_config_simple
# Validate the YAML
cat feed-master-config/fm.yml
# Restart feed-master
docker compose restart feed-master
```
### View feed-master logs
```bash
docker compose logs -f feed-master | grep -E "(ERROR|WARN|youtube)"
```
## Integration Notes
- **Single Source of Truth**: All channel URLs come from `channels.yml` and `urls.txt` in this repo
- **Automatic Regeneration**: Run `python3 -m python_app.generate_feed_config_simple` when `channels.yml` changes
- **No Manual Editing**: Don't edit `fm.yml` directly - regenerate it from the script
- **Handle Support**: Supports both `/channel/ID` and `/@handle` URL formats
- **Shared Channels**: Same channels used for transcript indexing (TLC Search) and RSS aggregation (Feed Master)
- **Skip Broken RSS**: Set `rss: false` in `channels.yml` to exclude a channel from RSS aggregation
## Future Enhancements
- [ ] Automated config regeneration on git pull
- [ ] Channel name lookup from YouTube API
- [ ] Integration with TLC Search for unified UI
- [ ] Webhook notifications for new videos
- [ ] OPML export for other RSS readers

View File

@@ -85,3 +85,34 @@ Visit <http://localhost:8080/> and youll see a barebones UI that:
Feel free to expand on this scaffold—add proper logging, schedule transcript
updates, or flesh out the UI—once youre happy with the baseline behaviour.
## Run with Docker Compose (App Only; Remote ES/Qdrant)
The provided compose file builds/runs only the Flask app and expects **remote** Elasticsearch/Qdrant endpoints. Supply them via environment variables (directly or a `.env` alongside `docker-compose.yml`):
```bash
ELASTIC_URL=https://your-es-host:9200 \
QDRANT_URL=https://your-qdrant-host:6333 \
docker compose up --build
```
Other tunables (defaults shown in compose):
- `ELASTIC_INDEX` (default `this_little_corner_py`)
- `ELASTIC_USERNAME` / `ELASTIC_PASSWORD` or `ELASTIC_API_KEY`
- `ELASTIC_VERIFY_CERTS` (set to `1` for real TLS verification)
- `QDRANT_COLLECTION` (default `tlc-captions-full`)
- `QDRANT_VECTOR_NAME` / `QDRANT_VECTOR_SIZE` / `QDRANT_EMBED_MODEL`
- `RATE_LIMIT_ENABLED` (default `1`)
- `RATE_LIMIT_REQUESTS` (default `60`)
- `RATE_LIMIT_WINDOW_SECONDS` (default `60`)
Port 8080 on the host is forwarded to the app. Mount `./data` (read-only) if you want local fallbacks for metrics (`LOCAL_DATA_DIR=/app/data/video_metadata`); otherwise the app will rely purely on the remote backends. Stop the container with `docker compose down`.
## CI (Docker build)
A Gitea Actions workflow (`.gitea/workflows/docker-build.yml`) builds and pushes the Docker image on every push to `master`. Configure the following repository secrets in Gitea:
- `DOCKER_USERNAME`
- `DOCKER_PASSWORD`
The image is tagged as `gitea.ghost.tel/knight/tlc-search:latest` and with the commit SHA. Adjust `IMAGE_NAME` in the workflow if you need a different registry/repo.

162
channel_config.py Normal file
View File

@@ -0,0 +1,162 @@
from __future__ import annotations
import json
import re
from pathlib import Path
from typing import Any, Dict, List, Optional
_CHANNEL_ID_PATTERN = re.compile(r"(?:https?://)?(?:www\.)?youtube\.com/channel/([^/?#]+)")
_HANDLE_PATTERN = re.compile(r"(?:https?://)?(?:www\.)?youtube\.com/@([^/?#]+)")
def _strip_quotes(value: str) -> str:
if len(value) >= 2 and value[0] == value[-1] and value[0] in {"'", '"'}:
return value[1:-1]
return value
def _parse_yaml_channels(text: str) -> List[Dict[str, str]]:
channels: List[Dict[str, str]] = []
current: Dict[str, str] = {}
for raw_line in text.splitlines():
line = raw_line.strip()
if not line or line.startswith("#"):
continue
if line == "channels:":
continue
if line.startswith("- "):
if current:
channels.append(current)
current = {}
line = line[2:].strip()
if not line:
continue
if ":" not in line:
continue
key, value = line.split(":", 1)
current[key.strip()] = _strip_quotes(value.strip())
if current:
channels.append(current)
return channels
def _extract_from_url(url: str) -> Dict[str, Optional[str]]:
channel_id = None
handle = None
channel_match = _CHANNEL_ID_PATTERN.search(url)
if channel_match:
channel_id = channel_match.group(1)
handle_match = _HANDLE_PATTERN.search(url)
if handle_match:
handle = handle_match.group(1)
return {"id": channel_id, "handle": handle}
def _normalize_handle(handle: Optional[str]) -> Optional[str]:
if not handle:
return None
return handle.lstrip("@").strip() or None
def _parse_bool(value: Optional[object]) -> Optional[bool]:
if isinstance(value, bool):
return value
if value is None:
return None
text = str(value).strip().lower()
if text in {"1", "true", "yes", "y"}:
return True
if text in {"0", "false", "no", "n"}:
return False
return None
def _normalize_entry(entry: Dict[str, Any]) -> Optional[Dict[str, Any]]:
channel_id = entry.get("id") or entry.get("channel_id")
handle = _normalize_handle(entry.get("handle") or entry.get("username"))
url = entry.get("url")
name = entry.get("name")
rss_flag = _parse_bool(
entry.get("rss_enabled") or entry.get("rss") or entry.get("include_in_feed")
)
if url:
extracted = _extract_from_url(url)
channel_id = channel_id or extracted.get("id")
handle = handle or extracted.get("handle")
if not url:
if channel_id:
url = f"https://www.youtube.com/channel/{channel_id}"
elif handle:
url = f"https://www.youtube.com/@{handle}"
if not name:
name = handle or channel_id
if not name or not url:
return None
normalized = {
"id": channel_id or "",
"handle": handle or "",
"name": name,
"url": url,
"rss_enabled": True if rss_flag is None else rss_flag,
}
return normalized
def load_channel_entries(path: Path) -> List[Dict[str, str]]:
if not path.exists():
raise FileNotFoundError(path)
if path.suffix.lower() == ".json":
payload = json.loads(path.read_text(encoding="utf-8"))
if isinstance(payload, dict):
raw_entries = payload.get("channels", [])
else:
raw_entries = payload
else:
raw_entries = _parse_yaml_channels(path.read_text(encoding="utf-8"))
entries: List[Dict[str, str]] = []
for raw in raw_entries:
if not isinstance(raw, dict):
continue
raw_payload: Dict[str, Any] = {}
for key, value in raw.items():
if value is None:
continue
if isinstance(value, bool):
raw_payload[str(key).strip()] = value
else:
raw_payload[str(key).strip()] = str(value).strip()
normalized = _normalize_entry(raw_payload)
if normalized:
entries.append(normalized)
entries.sort(key=lambda item: item["name"].lower())
return entries
def build_rss_bridge_url(entry: Dict[str, str], rss_bridge_host: str = "rss-bridge") -> Optional[str]:
channel_id = entry.get("id") or ""
handle = _normalize_handle(entry.get("handle"))
if channel_id:
return (
f"http://{rss_bridge_host}/?action=display&bridge=YoutubeBridge"
f"&context=By+channel+id&c={channel_id}&format=Mrss"
)
if handle:
return (
f"http://{rss_bridge_host}/?action=display&bridge=YoutubeBridge"
f"&context=By+username&u={handle}&format=Mrss"
)
return None

271
channels.yml Normal file
View File

@@ -0,0 +1,271 @@
# Shared YouTube Channel Configuration
# Used by both TLC Search (transcript collection) and Feed Master (RSS aggregation)
channels:
- id: UCCebR16tXbv5Ykk9_WtCCug
name: Christian T. Golden
url: https://www.youtube.com/channel/UCCebR16tXbv5Ykk9_WtCCug/videos
- id: UC6vg0HkKKlgsWk-3HfV-vnw
name: A Quality Existence
url: https://www.youtube.com/channel/UC6vg0HkKKlgsWk-3HfV-vnw/videos
- id: UCeWWxwzgLYUbfjWowXhVdYw
name: Andrea with the Bangs
url: https://www.youtube.com/channel/UCeWWxwzgLYUbfjWowXhVdYw/videos
- id: UC952hDf_C4nYJdqwK7VzTxA
name: Charlie's Little Corner
url: https://www.youtube.com/channel/UC952hDf_C4nYJdqwK7VzTxA/videos
- id: UCU5SNBfTo4umhjYz6M0Jsmg
name: Christian Baxter
url: https://www.youtube.com/channel/UCU5SNBfTo4umhjYz6M0Jsmg/videos
- id: UC6Tvr9mBXNaAxLGRA_sUSRA
name: Finding Ideas
url: https://www.youtube.com/channel/UC6Tvr9mBXNaAxLGRA_sUSRA/videos
- id: UC4Rmxg7saTfwIpvq3QEzylQ
name: Ein Sof - Infinite Reflections
url: https://www.youtube.com/channel/UC4Rmxg7saTfwIpvq3QEzylQ/videos
- id: UCTdH4nh6JTcfKUAWvmnPoIQ
name: Eric Seitz
url: https://www.youtube.com/channel/UCTdH4nh6JTcfKUAWvmnPoIQ/videos
- id: UCsi_x8c12NW9FR7LL01QXKA
name: Grail Country
url: https://www.youtube.com/channel/UCsi_x8c12NW9FR7LL01QXKA/videos
- id: UCAqTQ5yLHHH44XWwWXLkvHQ
name: Grizwald Grim
url: https://www.youtube.com/channel/UCAqTQ5yLHHH44XWwWXLkvHQ/videos
- id: UCprytROeCztMOMe8plyJRMg
name: faturechi
url: https://www.youtube.com/channel/UCprytROeCztMOMe8plyJRMg/videos
- id: UCpqDUjTsof-kTNpnyWper_Q
name: John Vervaeke
url: https://www.youtube.com/channel/UCpqDUjTsof-kTNpnyWper_Q/videos
- id: UCL_f53ZEJxp8TtlOkHwMV9Q
name: Jordan B Peterson
url: https://www.youtube.com/channel/UCL_f53ZEJxp8TtlOkHwMV9Q/videos
- id: UCez1fzMRGctojfis2lfRYug
name: Lucas Vos
url: https://www.youtube.com/channel/UCez1fzMRGctojfis2lfRYug/videos
- id: UC2leFZRD0ZlQDQxpR2Zd8oA
name: Mary Kochan
url: https://www.youtube.com/channel/UC2leFZRD0ZlQDQxpR2Zd8oA/videos
- id: UC8SErJkYnDsYGh1HxoZkl-g
name: Sartori Studios
url: https://www.youtube.com/channel/UC8SErJkYnDsYGh1HxoZkl-g/videos
- id: UCEPOn4cgvrrerg_-q_Ygw1A
name: More Christ
url: https://www.youtube.com/channel/UCEPOn4cgvrrerg_-q_Ygw1A/videos
- id: UC2yCyOMUeem-cYwliC-tLJg
name: Paul Anleitner
url: https://www.youtube.com/channel/UC2yCyOMUeem-cYwliC-tLJg/videos
- id: UCGsDIP_K6J6VSTqlq-9IPlg
name: Paul VanderKlay
url: https://www.youtube.com/channel/UCGsDIP_K6J6VSTqlq-9IPlg/videos
- id: UCEzWTLDYmL8soRdQec9Fsjw
name: Randos United
url: https://www.youtube.com/channel/UCEzWTLDYmL8soRdQec9Fsjw/videos
- id: UC1KgNsMdRoIA_njVmaDdHgA
name: Randos United 2
url: https://www.youtube.com/channel/UC1KgNsMdRoIA_njVmaDdHgA/videos
- id: UCFQ6Gptuq-sLflbJ4YY3Umw
name: Rebel Wisdom
url: https://www.youtube.com/channel/UCFQ6Gptuq-sLflbJ4YY3Umw/videos
- id: UCEY1vGNBPsC3dCatZyK3Jkw
name: Strange Theology
url: https://www.youtube.com/channel/UCEY1vGNBPsC3dCatZyK3Jkw/videos
- id: UCIAtCuzdvgNJvSYILnHtdWA
name: The Anadromist
url: https://www.youtube.com/channel/UCIAtCuzdvgNJvSYILnHtdWA/videos
- id: UClIDP7_Kzv_7tDQjTv9EhrA
name: The Chris Show
url: https://www.youtube.com/channel/UClIDP7_Kzv_7tDQjTv9EhrA/videos
- id: UC-QiBn6GsM3JZJAeAQpaGAA
name: TheCommonToad
url: https://www.youtube.com/channel/UC-QiBn6GsM3JZJAeAQpaGAA/videos
- id: UCiJmdXTb76i8eIPXdJyf8ZQ
name: Bridges of Meaning Hub
url: https://www.youtube.com/channel/UCiJmdXTb76i8eIPXdJyf8ZQ/videos
- id: UCM9Z05vuQhMEwsV03u6DrLA
name: Cassidy van der Kamp
url: https://www.youtube.com/channel/UCM9Z05vuQhMEwsV03u6DrLA/videos
- id: UCgp_r6WlBwDSJrP43Mz07GQ
name: The Meaning Code
url: https://www.youtube.com/channel/UCgp_r6WlBwDSJrP43Mz07GQ/videos
- id: UC5uv-BxzCrN93B_5qbOdRWw
name: TheScrollersPodcast
url: https://www.youtube.com/channel/UC5uv-BxzCrN93B_5qbOdRWw/videos
- id: UCtCTSf3UwRU14nYWr_xm-dQ
name: Jonathan Pageau
url: https://www.youtube.com/channel/UCtCTSf3UwRU14nYWr_xm-dQ/videos
- id: UC1a4VtU_SMSfdRiwMJR33YQ
name: The Young Levite
url: https://www.youtube.com/channel/UC1a4VtU_SMSfdRiwMJR33YQ/videos
- id: UCg7Ed0lecvko58ibuX1XHng
name: Transfigured
url: https://www.youtube.com/channel/UCg7Ed0lecvko58ibuX1XHng/videos
- id: UCMVG5eqpYFVEB-a9IqAOuHA
name: President Foxman
url: https://www.youtube.com/channel/UCMVG5eqpYFVEB-a9IqAOuHA/videos
- id: UC8mJqpS_EBbMcyuzZDF0TEw
name: Neal Daedalus
url: https://www.youtube.com/channel/UC8mJqpS_EBbMcyuzZDF0TEw/videos
- id: UCGHuURJ1XFHzPSeokf6510A
name: Aphrael Pilotson
url: https://www.youtube.com/channel/UCGHuURJ1XFHzPSeokf6510A/videos
- id: UC704NVL2DyzYg3rMU9r1f7A
handle: chrishoward8473
name: Chris Howard
url: https://www.youtube.com/@chrishoward8473/videos
- id: UChptV-kf8lnncGh7DA2m8Pw
name: Shoulder Serf
url: https://www.youtube.com/channel/UChptV-kf8lnncGh7DA2m8Pw/videos
- id: UCzX6R3ZLQh5Zma_5AsPcqPA
name: Restoring Meaning
url: https://www.youtube.com/channel/UCzX6R3ZLQh5Zma_5AsPcqPA/videos
- id: UCiukuaNd_qzRDTW9qe2OC1w
name: Kale Zelden
url: https://www.youtube.com/channel/UCiukuaNd_qzRDTW9qe2OC1w/videos
- id: UC5yLuFQCms4nb9K2bGQLqIw
name: Ron Copperman
url: https://www.youtube.com/channel/UC5yLuFQCms4nb9K2bGQLqIw/videos
- id: UCVdSgEf9bLXFMBGSMhn7x4Q
name: Mark D Parker
url: https://www.youtube.com/channel/UCVdSgEf9bLXFMBGSMhn7x4Q/videos
- id: UC_dnk5D4tFCRYCrKIcQlcfw
name: Luke Thompson
url: https://www.youtube.com/channel/UC_dnk5D4tFCRYCrKIcQlcfw/videos
- id: UCT8Lq3ufaGEnCSS8WpFatqw
handle: Freerilian
name: Free Rilian
url: https://www.youtube.com/@Freerilian/videos
- id: UC977g6oGYIJDQnsZOGjQBBA
handle: marks.-ry7bm
name: Mark S
url: https://www.youtube.com/@marks.-ry7bm/videos
- id: UCbD1Pm0TOcRK2zaCrwgcTTg
handle: Adams-Fall
name: Adams Fall
url: https://www.youtube.com/@Adams-Fall/videos
- id: UCnojyPW0IgLWTQ0SaDQ1KBA
handle: mcmosav
name: mcmosav
url: https://www.youtube.com/@mcmosav/videos
- id: UCiOZYvBGHw1Y6wyzffwEp9g
handle: Landbeorht
name: Joseph Lambrecht
url: https://www.youtube.com/@Landbeorht/videos
- id: UCAXyF_HFeMgwS8nkGVeroAA
handle: Corner_Citizen
name: Corner Citizen
url: https://www.youtube.com/@Corner_Citizen/videos
- id: UCv2Qft5mZrmA9XAwnl9PU-g
handle: ethan.caughey
name: Ethan Caughey
url: https://www.youtube.com/@ethan.caughey/videos
- id: UCMJCtS8jKouJ2d8UIYzW3vg
handle: MarcInTbilisi
name: Marc Jackson
url: https://www.youtube.com/@MarcInTbilisi/videos
- id: UCk9O91WwruXmgu1NQrKZZEw
handle: climbingmt.sophia
name: Climbing Mt Sophia
url: https://www.youtube.com/@climbingmt.sophia/videos
- id: UCUSyTPWW4JaG1YfUPddw47Q
handle: Skankenstein
name: Skankenstein
url: https://www.youtube.com/@Skankenstein/videos
- id: UCzw2FNI3IRphcAoVcUENOgQ
handle: UpCycleClub
name: UpCycleClub
url: https://www.youtube.com/@UpCycleClub/videos
- id: UCQ7rVoApmYIpcmU7fB9RPyw
handle: JessPurviance
name: Jesspurviance
url: https://www.youtube.com/@JessPurviance/videos
- id: UCrZyTWGMdRM9_P26RKPvh3A
handle: greyhamilton52
name: Grey Hamilton
url: https://www.youtube.com/@greyhamilton52/videos
- id: UCDCfI162vhPvwdxW6X4nmiw
handle: paulrenenichols
name: Paul Rene Nichols
url: https://www.youtube.com/@paulrenenichols/videos
- id: UCFLovlJ8RFApfjrf2y157xg
handle: OfficialSecularKoranism
name: Secular Koranism
url: https://www.youtube.com/@OfficialSecularKoranism/videos
- id: UC_-YQbnPfBbIezMr1adZZiQ
handle: FromWhomAllBlessingsFlow
name: From Whom All Blessings Flow
url: https://www.youtube.com/@FromWhomAllBlessingsFlow/videos
- id: UCn5mf-fcpBmkepIpZ8eFRng
handle: FoodTruckEmily
name: Emily Rajeh
url: https://www.youtube.com/@FoodTruckEmily/videos
- id: UC6zHDj4D323xJkblnPTvY3Q
handle: O.G.Rose.Michelle.and.Daniel
name: OG Rose
url: https://www.youtube.com/@O.G.Rose.Michelle.and.Daniel/videos
- id: UC4GiA5Hnwy415uVRymxPK-w
handle: JonathanDumeer
name: Jonathan Dumeer
url: https://www.youtube.com/@JonathanDumeer/videos
- id: UCMzT-mdCqoyEv_-YZVtE7MQ
handle: JordanGreenhall
name: Jordan Hall
url: https://www.youtube.com/@JordanGreenhall/videos
- id: UC5goUoFM4LPim4eY4pwRXYw
handle: NechamaGluck
name: Nechama Gluck
url: https://www.youtube.com/@NechamaGluck/videos
- id: UCPUVeoQYyq8cndWwyczX6RA
handle: justinsmorningcoffee
name: Justinsmorningcoffee
url: https://www.youtube.com/@justinsmorningcoffee/videos
- id: UCB0C8DEIQlQzvSGuGriBxtA
handle: grahampardun
name: Grahampardun
url: https://www.youtube.com/@grahampardun/videos
- id: UCpLJJLVB_7v4Igq-9arja1A
handle: michaelmartin8681
name: Michaelmartin8681
url: https://www.youtube.com/@michaelmartin8681/videos
- id: UCxV18lwwh29DiWuooz7UCvg
handle: davidbusuttil9086
name: Davidbusuttil9086
url: https://www.youtube.com/@davidbusuttil9086/videos
- id: UCosBhpwwGh_ueYq4ZSi5dGw
handle: matthewparlato5626
name: Matthewparlato5626
url: https://www.youtube.com/@matthewparlato5626/videos
- id: UCwF5LWNOFou_50bT65bq4Bg
handle: lancecleaver227
name: Lancecleaver227
url: https://www.youtube.com/@lancecleaver227/videos
- id: UCaJ0CqiiMSTq4X0rycUOIjw
handle: theplebistocrat
name: the plebistocrat
url: https://www.youtube.com/@theplebistocrat/videos
- id: UCWehDXDEdUpB58P7-Bg1cHg
handle: rigelwindsongthurston
name: Rigel Windsong Thurston
url: https://www.youtube.com/@rigelwindsongthurston/videos
- id: UCZA5mUAyYcCL1kYgxbeMNrA
handle: RightInChrist
name: Rightinchrist
url: https://www.youtube.com/@RightInChrist/videos
- id: UCDIPXp88qjAV3TiaR5Uo3iQ
handle: RafeKelley
name: Rafekelley
url: https://www.youtube.com/@RafeKelley/videos
- id: UCedgru6YCto3zyXjlbuQuqA
handle: WavesOfObsession
name: Wavesofobsession
url: https://www.youtube.com/@WavesOfObsession/videos
- handle: LeviathanForPlay
name: LeviathanForPlay
url: https://www.youtube.com/@LeviathanForPlay/videos
- id: UCehAungJpAeC-F3R5FwvvCQ
name: Wholly Unfocused
url: https://www.youtube.com/channel/UCehAungJpAeC-F3R5FwvvCQ/videos
- id: UC4YwC5zA9S_2EwthE27Xlew
name: CMA
url: https://www.youtube.com/channel/UC4YwC5zA9S_2EwthE27Xlew/videos

View File

@@ -6,7 +6,13 @@ Environment Variables:
ELASTIC_USERNAME / ELASTIC_PASSWORD: Optional basic auth credentials.
ELASTIC_INDEX: Target index name (default: this_little_corner_py).
LOCAL_DATA_DIR: Root folder containing JSON metadata (default: ../data/video_metadata).
CHANNELS_PATH: Path to the canonical channel list (default: ./channels.yml).
RSS_FEED_URL: Public URL/path for the unified RSS feed (default: /rss/youtube-unified).
RSS_FEED_UPSTREAM: Base URL to proxy feed requests (default: http://localhost:8097).
YOUTUBE_API_KEY: Optional API key for pulling metadata directly from YouTube.
RATE_LIMIT_ENABLED: Toggle API rate limiting (default: 1).
RATE_LIMIT_REQUESTS: Max requests per window per client (default: 60).
RATE_LIMIT_WINDOW_SECONDS: Window size in seconds (default: 60).
"""
from __future__ import annotations
@@ -16,6 +22,20 @@ from dataclasses import dataclass
from pathlib import Path
from typing import Optional
# Load .env file if it exists
try:
from dotenv import load_dotenv
import logging
_logger = logging.getLogger(__name__)
_env_path = Path(__file__).parent / ".env"
if _env_path.exists():
_logger.info("Loading .env from: %s", _env_path)
result = load_dotenv(_env_path, override=True)
_logger.info("load_dotenv result: %s", result)
except ImportError:
pass # python-dotenv not installed
@dataclass(frozen=True)
class ElasticSettings:
@@ -39,11 +59,27 @@ class YoutubeSettings:
api_key: Optional[str]
@dataclass(frozen=True)
class RateLimitSettings:
enabled: bool
requests: int
window_seconds: int
@dataclass(frozen=True)
class AppConfig:
elastic: ElasticSettings
data: DataSettings
youtube: YoutubeSettings
rate_limit: RateLimitSettings
qdrant_url: str
qdrant_collection: str
qdrant_vector_name: Optional[str]
qdrant_vector_size: int
qdrant_embed_model: str
channels_path: Path
rss_feed_url: str
rss_feed_upstream: str
def _env(name: str, default: Optional[str] = None) -> Optional[str]:
@@ -75,7 +111,30 @@ def load_config() -> AppConfig:
)
data = DataSettings(root=data_root)
youtube = YoutubeSettings(api_key=_env("YOUTUBE_API_KEY"))
return AppConfig(elastic=elastic, data=data, youtube=youtube)
rate_limit = RateLimitSettings(
enabled=_env("RATE_LIMIT_ENABLED", "1") in {"1", "true", "True"},
requests=max(int(_env("RATE_LIMIT_REQUESTS", "60")), 0),
window_seconds=max(int(_env("RATE_LIMIT_WINDOW_SECONDS", "60")), 1),
)
channels_path = Path(
_env("CHANNELS_PATH", str(Path(__file__).parent / "channels.yml"))
).expanduser()
rss_feed_url = _env("RSS_FEED_URL", "/rss/youtube-unified")
rss_feed_upstream = _env("RSS_FEED_UPSTREAM", "http://localhost:8097")
return AppConfig(
elastic=elastic,
data=data,
youtube=youtube,
rate_limit=rate_limit,
qdrant_url=_env("QDRANT_URL", "http://localhost:6333"),
qdrant_collection=_env("QDRANT_COLLECTION", "tlc_embeddings"),
qdrant_vector_name=_env("QDRANT_VECTOR_NAME"),
qdrant_vector_size=int(_env("QDRANT_VECTOR_SIZE", "1024")),
qdrant_embed_model=_env("QDRANT_EMBED_MODEL", "BAAI/bge-large-en-v1.5"),
channels_path=channels_path,
rss_feed_url=rss_feed_url or "",
rss_feed_upstream=rss_feed_upstream or "",
)
CONFIG = load_config()

69
docker-compose.yml Normal file
View File

@@ -0,0 +1,69 @@
version: "3.9"
# TLC Search + Feed Master - Complete YouTube content indexing & RSS aggregation
# Provide ELASTIC_URL / QDRANT_URL (and related) via environment or a .env file.
services:
# RSS Bridge - Converts YouTube channels to RSS feeds
rss-bridge:
image: rssbridge/rss-bridge:latest
container_name: tlc-rss-bridge
hostname: rss-bridge
restart: unless-stopped
logging:
driver: json-file
options:
max-size: "10m"
max-file: "5"
ports:
- "3001:80"
# Feed Master - Aggregates multiple RSS feeds into unified feed
feed-master:
image: umputun/feed-master:latest
container_name: tlc-feed-master
hostname: feed-master
restart: unless-stopped
depends_on:
- rss-bridge
logging:
driver: json-file
options:
max-size: "10m"
max-file: "5"
environment:
- DEBUG=false
- FM_DB=/srv/var/feed-master.bdb
- FM_CONF=/srv/etc/fm.yml
volumes:
- ./feed-master-config:/srv/etc
- ./feed-master-config/var:/srv/var
- ./feed-master-config/images:/srv/images
ports:
- "8097:8080"
# TLC Search - Flask app for searching YouTube transcripts
app:
build:
context: .
dockerfile: Dockerfile
ports:
- "8080:8080"
environment:
ELASTIC_URL: ${ELASTIC_URL:?set ELASTIC_URL to your remote Elasticsearch URL}
ELASTIC_INDEX: ${ELASTIC_INDEX:-this_little_corner_py}
ELASTIC_USERNAME: ${ELASTIC_USERNAME:-}
ELASTIC_PASSWORD: ${ELASTIC_PASSWORD:-}
ELASTIC_API_KEY: ${ELASTIC_API_KEY:-}
ELASTIC_VERIFY_CERTS: ${ELASTIC_VERIFY_CERTS:-0}
CHANNELS_PATH: ${CHANNELS_PATH:-/app/python_app/channels.yml}
RSS_FEED_URL: ${RSS_FEED_URL:-/rss/youtube-unified}
RSS_FEED_UPSTREAM: ${RSS_FEED_UPSTREAM:-http://feed-master:8080}
QDRANT_URL: ${QDRANT_URL:?set QDRANT_URL to your remote Qdrant URL}
QDRANT_COLLECTION: ${QDRANT_COLLECTION:-tlc-captions-full}
QDRANT_VECTOR_NAME: ${QDRANT_VECTOR_NAME:-}
QDRANT_VECTOR_SIZE: ${QDRANT_VECTOR_SIZE:-1024}
QDRANT_EMBED_MODEL: ${QDRANT_EMBED_MODEL:-BAAI/bge-large-en-v1.5}
LOCAL_DATA_DIR: ${LOCAL_DATA_DIR:-/app/data/video_metadata}
volumes:
- ./channels.yml:/app/python_app/channels.yml:ro
- ./data:/app/data:ro

168
feed-master-config/fm.yml Normal file
View File

@@ -0,0 +1,168 @@
# Feed Master Configuration
# Auto-generated from channels.yml
# Do not edit manually - regenerate using generate_feed_config_simple.py
feeds:
youtube-unified:
title: YouTube Unified Feed
description: Aggregated feed from all YouTube channels
link: https://youtube.com
language: "en-us"
sources:
- name: A Quality Existence
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC6vg0HkKKlgsWk-3HfV-vnw&format=Mrss
- name: Adams Fall
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCbD1Pm0TOcRK2zaCrwgcTTg&format=Mrss
- name: Andrea with the Bangs
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCeWWxwzgLYUbfjWowXhVdYw&format=Mrss
- name: Aphrael Pilotson
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCGHuURJ1XFHzPSeokf6510A&format=Mrss
- name: Cassidy van der Kamp
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCM9Z05vuQhMEwsV03u6DrLA&format=Mrss
- name: Channel UCCebR16tXbv
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCCebR16tXbv5Ykk9_WtCCug&format=Mrss
- name: Channel UCiJmdXTb76i
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCiJmdXTb76i8eIPXdJyf8ZQ&format=Mrss
- name: Charlie's Little Corner
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC952hDf_C4nYJdqwK7VzTxA&format=Mrss
- name: Chris Howard
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC704NVL2DyzYg3rMU9r1f7A&format=Mrss
- name: Christian Baxter
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCU5SNBfTo4umhjYz6M0Jsmg&format=Mrss
- name: Climbing Mt Sophia
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCk9O91WwruXmgu1NQrKZZEw&format=Mrss
- name: Corner Citizen
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCAXyF_HFeMgwS8nkGVeroAA&format=Mrss
- name: Davidbusuttil9086
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCxV18lwwh29DiWuooz7UCvg&format=Mrss
- name: Ein Sof - Infinite Reflections
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC4Rmxg7saTfwIpvq3QEzylQ&format=Mrss
- name: Emily Rajeh
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCn5mf-fcpBmkepIpZ8eFRng&format=Mrss
- name: Eric Seitz
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCTdH4nh6JTcfKUAWvmnPoIQ&format=Mrss
- name: Ethan Caughey
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCv2Qft5mZrmA9XAwnl9PU-g&format=Mrss
- name: faturechi
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCprytROeCztMOMe8plyJRMg&format=Mrss
- name: Finding Ideas
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC6Tvr9mBXNaAxLGRA_sUSRA&format=Mrss
- name: Free Rilian
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCT8Lq3ufaGEnCSS8WpFatqw&format=Mrss
- name: From Whom All Blessings Flow
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC_-YQbnPfBbIezMr1adZZiQ&format=Mrss
- name: Grahampardun
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCB0C8DEIQlQzvSGuGriBxtA&format=Mrss
- name: Grail Country
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCsi_x8c12NW9FR7LL01QXKA&format=Mrss
- name: Grey Hamilton
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCrZyTWGMdRM9_P26RKPvh3A&format=Mrss
- name: Grizwald Grim
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCAqTQ5yLHHH44XWwWXLkvHQ&format=Mrss
- name: Jesspurviance
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCQ7rVoApmYIpcmU7fB9RPyw&format=Mrss
- name: John Vervaeke
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCpqDUjTsof-kTNpnyWper_Q&format=Mrss
- name: Jonathan Dumeer
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC4GiA5Hnwy415uVRymxPK-w&format=Mrss
- name: Jonathan Pageau
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCtCTSf3UwRU14nYWr_xm-dQ&format=Mrss
- name: Jordan B Peterson
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCL_f53ZEJxp8TtlOkHwMV9Q&format=Mrss
- name: Jordan Hall
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCMzT-mdCqoyEv_-YZVtE7MQ&format=Mrss
- name: Joseph Lambrecht
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCiOZYvBGHw1Y6wyzffwEp9g&format=Mrss
- name: Justinsmorningcoffee
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCPUVeoQYyq8cndWwyczX6RA&format=Mrss
- name: Kale Zelden
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCiukuaNd_qzRDTW9qe2OC1w&format=Mrss
- name: Lancecleaver227
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCwF5LWNOFou_50bT65bq4Bg&format=Mrss
- name: Lucas Vos
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCez1fzMRGctojfis2lfRYug&format=Mrss
- name: Luke Thompson
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC_dnk5D4tFCRYCrKIcQlcfw&format=Mrss
- name: Marc Jackson
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCMJCtS8jKouJ2d8UIYzW3vg&format=Mrss
- name: Mark D Parker
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCVdSgEf9bLXFMBGSMhn7x4Q&format=Mrss
- name: Mark S
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC977g6oGYIJDQnsZOGjQBBA&format=Mrss
- name: Mary Kochan
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC2leFZRD0ZlQDQxpR2Zd8oA&format=Mrss
- name: Matthewparlato5626
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCosBhpwwGh_ueYq4ZSi5dGw&format=Mrss
- name: mcmosav
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCnojyPW0IgLWTQ0SaDQ1KBA&format=Mrss
- name: Michaelmartin8681
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCpLJJLVB_7v4Igq-9arja1A&format=Mrss
- name: More Christ
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCEPOn4cgvrrerg_-q_Ygw1A&format=Mrss
- name: Neal Daedalus
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC8mJqpS_EBbMcyuzZDF0TEw&format=Mrss
- name: Nechama Gluck
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC5goUoFM4LPim4eY4pwRXYw&format=Mrss
- name: OG Rose
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC6zHDj4D323xJkblnPTvY3Q&format=Mrss
- name: Paul Anleitner
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC2yCyOMUeem-cYwliC-tLJg&format=Mrss
- name: Paul Rene Nichols
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCDCfI162vhPvwdxW6X4nmiw&format=Mrss
- name: Paul VanderKlay
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCGsDIP_K6J6VSTqlq-9IPlg&format=Mrss
- name: President Foxman
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCMVG5eqpYFVEB-a9IqAOuHA&format=Mrss
- name: Rafekelley
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCDIPXp88qjAV3TiaR5Uo3iQ&format=Mrss
- name: Randos United
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCEzWTLDYmL8soRdQec9Fsjw&format=Mrss
- name: Randos United 2
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC1KgNsMdRoIA_njVmaDdHgA&format=Mrss
- name: Rebel Wisdom
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCFQ6Gptuq-sLflbJ4YY3Umw&format=Mrss
- name: Restoring Meaning
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCzX6R3ZLQh5Zma_5AsPcqPA&format=Mrss
- name: Rigel Windsong Thurston
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCWehDXDEdUpB58P7-Bg1cHg&format=Mrss
- name: Rightinchrist
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCZA5mUAyYcCL1kYgxbeMNrA&format=Mrss
- name: Ron Copperman
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC5yLuFQCms4nb9K2bGQLqIw&format=Mrss
- name: Sartori Studios
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC8SErJkYnDsYGh1HxoZkl-g&format=Mrss
- name: Secular Koranism
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCFLovlJ8RFApfjrf2y157xg&format=Mrss
- name: Shoulder Serf
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UChptV-kf8lnncGh7DA2m8Pw&format=Mrss
- name: Skankenstein
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCUSyTPWW4JaG1YfUPddw47Q&format=Mrss
- name: Strange Theology
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCEY1vGNBPsC3dCatZyK3Jkw&format=Mrss
- name: The Anadromist
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCIAtCuzdvgNJvSYILnHtdWA&format=Mrss
- name: The Chris Show
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UClIDP7_Kzv_7tDQjTv9EhrA&format=Mrss
- name: The Meaning Code
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCgp_r6WlBwDSJrP43Mz07GQ&format=Mrss
- name: the plebistocrat
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCaJ0CqiiMSTq4X0rycUOIjw&format=Mrss
- name: The Young Levite
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC1a4VtU_SMSfdRiwMJR33YQ&format=Mrss
- name: TheCommonToad
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC-QiBn6GsM3JZJAeAQpaGAA&format=Mrss
- name: TheScrollersPodcast
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC5uv-BxzCrN93B_5qbOdRWw&format=Mrss
- name: Transfigured
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCg7Ed0lecvko58ibuX1XHng&format=Mrss
- name: UpCycleClub
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCzw2FNI3IRphcAoVcUENOgQ&format=Mrss
- name: Wavesofobsession
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCedgru6YCto3zyXjlbuQuqA&format=Mrss
system:
update: 5m
max_per_feed: 5
max_total: 200
max_keep: 1000
base_url: http://localhost:8097

91
generate_feed_config.py Normal file
View File

@@ -0,0 +1,91 @@
#!/usr/bin/env python3
"""
Generate feed-master configuration from channels.yml.
This ensures a single source of truth for the YouTube channels.
"""
import sys
from pathlib import Path
from .channel_config import build_rss_bridge_url, load_channel_entries
def generate_fm_config(channels_file, output_file, rss_bridge_host="rss-bridge"):
"""Generate feed-master YAML configuration from channels.yml"""
print(f"Reading channels from {channels_file}")
channels = load_channel_entries(Path(channels_file))
print(f"Found {len(channels)} channels")
# Generate feed configuration
config = []
config.append("# Feed Master Configuration")
config.append("# Auto-generated from channels.yml")
config.append("# Do not edit manually - regenerate using generate_feed_config.py")
config.append("")
config.append("feeds:")
config.append(" youtube-unified:")
config.append(" title: YouTube Unified Feed")
config.append(" description: Aggregated feed from all YouTube channels")
config.append(" link: https://youtube.com")
config.append(' language: "en-us"')
config.append(" sources:")
processed = 0
skipped = 0
for channel in channels:
if not channel.get("rss_enabled", True):
skipped += 1
continue
bridge_url = build_rss_bridge_url(channel, rss_bridge_host=rss_bridge_host)
if not bridge_url:
skipped += 1
continue
name = channel.get("name", "Unknown")
config.append(f" - name: {name}")
config.append(f" url: {bridge_url}")
processed += 1
# Add system configuration
config.append("")
config.append("system:")
config.append(" update: 5m")
config.append(" max_per_feed: 5")
config.append(" max_total: 200")
config.append(" max_keep: 1000")
config.append(" base_url: http://localhost:8097")
# Write output
print(f"\nProcessed {processed} channels, skipped {skipped}")
with open(output_file, 'w') as f:
f.write('\n'.join(config))
print(f"Configuration written to {output_file}")
print(f"\nTo apply this configuration:")
print(f" 1. Copy {output_file} to feed-master/etc/fm.yml")
print(f" 2. Restart the feed-master service")
if __name__ == "__main__":
# Default paths
script_dir = Path(__file__).parent
channels_file = script_dir / "channels.yml"
output_file = script_dir / "feed-master-config" / "fm.yml"
# Allow overriding via command line
if len(sys.argv) > 1:
channels_file = Path(sys.argv[1])
if len(sys.argv) > 2:
output_file = Path(sys.argv[2])
if not channels_file.exists():
print(f"Error: {channels_file} not found", file=sys.stderr)
print(f"\nUsage: {sys.argv[0]} [channels.yml] [output.yml]", file=sys.stderr)
sys.exit(1)
# Ensure output directory exists
output_file.parent.mkdir(parents=True, exist_ok=True)
generate_fm_config(channels_file, output_file)

88
generate_feed_config_simple.py Executable file
View File

@@ -0,0 +1,88 @@
#!/usr/bin/env python3
"""
Generate feed-master configuration from channels.yml.
Simplified version that doesn't require RSS-Bridge to be running.
"""
import sys
from pathlib import Path
from .channel_config import build_rss_bridge_url, load_channel_entries
def generate_fm_config(channels_file, output_file, rss_bridge_host="rss-bridge"):
"""Generate feed-master YAML configuration from channels.yml"""
print(f"Reading channels from {channels_file}")
channels = load_channel_entries(Path(channels_file))
print(f"Found {len(channels)} channels")
# Generate feed configuration
config = []
config.append("# Feed Master Configuration")
config.append("# Auto-generated from channels.yml")
config.append("# Do not edit manually - regenerate using generate_feed_config_simple.py")
config.append("")
config.append("feeds:")
config.append(" youtube-unified:")
config.append(" title: YouTube Unified Feed")
config.append(" description: Aggregated feed from all YouTube channels")
config.append(" link: https://youtube.com")
config.append(' language: "en-us"')
config.append(" sources:")
processed = 0
skipped = 0
for channel in channels:
if not channel.get("rss_enabled", True):
skipped += 1
continue
bridge_url = build_rss_bridge_url(channel, rss_bridge_host=rss_bridge_host)
if not bridge_url:
skipped += 1
continue
name = channel.get("name", "Unknown")
config.append(f" - name: {name}")
config.append(f" url: {bridge_url}")
processed += 1
# Add system configuration
config.append("")
config.append("system:")
config.append(" update: 5m")
config.append(" max_per_feed: 5")
config.append(" max_total: 200")
config.append(" max_keep: 1000")
config.append(" base_url: http://localhost:8097")
# Write output
print(f"\nProcessed {processed} channels, skipped {skipped}")
with open(output_file, 'w') as f:
f.write('\n'.join(config))
print(f"Configuration written to {output_file}")
if __name__ == "__main__":
# Default paths
script_dir = Path(__file__).parent
channels_file = script_dir / "channels.yml"
output_file = script_dir / "feed-master-config" / "fm.yml"
# Allow overriding via command line
if len(sys.argv) > 1:
channels_file = Path(sys.argv[1])
if len(sys.argv) > 2:
output_file = Path(sys.argv[2])
if not channels_file.exists():
print(f"Error: {channels_file} not found", file=sys.stderr)
print(f"\nUsage: {sys.argv[0]} [channels.yml] [output.yml]", file=sys.stderr)
sys.exit(1)
# Ensure output directory exists
output_file.parent.mkdir(parents=True, exist_ok=True)
generate_fm_config(channels_file, output_file)

View File

@@ -90,6 +90,10 @@ def build_bulk_actions(
"transcript_full": transcript_full,
"transcript_secondary_full": doc.get("transcript_secondary_full"),
"transcript_parts": parts,
"internal_references": doc.get("internal_references", []),
"internal_references_count": doc.get("internal_references_count", 0),
"referenced_by": doc.get("referenced_by", []),
"referenced_by_count": doc.get("referenced_by_count", 0),
},
}
@@ -121,6 +125,10 @@ def ensure_index(client: "Elasticsearch", index: str) -> None:
"text": {"type": "text"},
},
},
"internal_references": {"type": "keyword"},
"internal_references_count": {"type": "integer"},
"referenced_by": {"type": "keyword"},
"referenced_by_count": {"type": "integer"},
}
},
)

View File

@@ -2,3 +2,5 @@ Flask>=2.3
elasticsearch>=7.0.0,<9.0.0
youtube-transcript-api>=0.6
google-api-python-client>=2.0.0
python-dotenv>=0.19.0
requests>=2.31.0

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

BIN
static/favicon.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.9 KiB

View File

@@ -4,6 +4,7 @@
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>Term Frequency Explorer</title>
<link rel="icon" href="/static/favicon.png" type="image/png" />
<link rel="stylesheet" href="/static/style.css" />
<style>
#chart {
@@ -65,4 +66,3 @@
<script src="/static/frequency.js"></script>
</body>
</html>

96
static/graph.html Normal file
View File

@@ -0,0 +1,96 @@
<!doctype html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>TLC Reference Graph</title>
<link rel="icon" href="/static/favicon.png" type="image/png" />
<link rel="stylesheet" href="https://unpkg.com/xp.css" />
<link rel="stylesheet" href="/static/style.css" />
<script src="https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js"></script>
</head>
<body>
<div class="window graph-window" style="max-width: 1100px; margin: 20px auto;">
<div class="title-bar">
<div class="title-bar-text">Reference Graph</div>
<div class="title-bar-controls">
<a class="title-bar-link" href="/">⬅ Search</a>
</div>
</div>
<div class="window-body">
<p>
Explore how videos reference each other. Enter a <code>video_id</code> to see its immediate
neighbors (referenced and referencing videos). Choose a larger depth to expand the graph.
</p>
<form id="graphForm" class="graph-controls">
<div class="field-group">
<label for="graphVideoId">Video ID</label>
<input
id="graphVideoId"
name="video_id"
type="text"
placeholder="e.g. dQw4w9WgXcQ"
required
/>
</div>
<div class="field-group">
<label for="graphDepth">Depth</label>
<select id="graphDepth" name="depth">
<option value="1">1 hop</option>
<option value="2">2 hops</option>
<option value="3">3 hops</option>
</select>
</div>
<div class="field-group">
<label for="graphMaxNodes">Max nodes</label>
<select id="graphMaxNodes" name="max_nodes">
<option value="100">100</option>
<option value="150">150</option>
<option value="200" selected>200</option>
<option value="300">300</option>
</select>
</div>
<div class="field-group">
<label class="checkbox">
<input type="checkbox" id="graphFullToggle" name="full_graph" />
Attempt entire reference graph
</label>
<p class="field-hint">
Includes every video that references another (ignores depth; may be slow). Max nodes still
applies.
</p>
</div>
<div class="field-group">
<label for="graphLabelSize">Labels</label>
<select id="graphLabelSize" name="label_size">
<option value="off">Off</option>
<option value="tiny" selected>Tiny</option>
<option value="small">Small</option>
<option value="normal">Normal</option>
<option value="medium">Medium</option>
<option value="large">Large</option>
<option value="xlarge">Extra large</option>
</select>
</div>
<button type="submit">Build graph</button>
</form>
<div id="graphStatus" class="graph-status">Enter a video ID to begin.</div>
<div id="graphContainer" class="graph-container"></div>
</div>
<div class="status-bar">
<p class="status-bar-field">Click nodes to open the video on YouTube</p>
<p class="status-bar-field">Colors represent channels</p>
</div>
</div>
<script src="/static/graph.js"></script>
</body>
</html>

842
static/graph.js Normal file
View File

@@ -0,0 +1,842 @@
(() => {
const global = window;
const GraphUI = (global.GraphUI = global.GraphUI || {});
GraphUI.ready = false;
const form = document.getElementById("graphForm");
const videoInput = document.getElementById("graphVideoId");
const depthInput = document.getElementById("graphDepth");
const maxNodesInput = document.getElementById("graphMaxNodes");
const labelSizeInput = document.getElementById("graphLabelSize");
const fullGraphToggle = document.getElementById("graphFullToggle");
const statusEl = document.getElementById("graphStatus");
const container = document.getElementById("graphContainer");
const isEmbedded =
container && container.dataset && container.dataset.embedded === "true";
if (!form || !videoInput || !depthInput || !maxNodesInput || !labelSizeInput || !container) {
console.error("Graph: required DOM elements missing.");
return;
}
const color = d3.scaleOrdinal(d3.schemeTableau10);
const colorRange = typeof color.range === "function" ? color.range() : [];
const paletteSizeDefault = colorRange.length || 10;
const PATTERN_TYPES = [
{ key: "none", legendClass: "none" },
{ key: "diag-forward", legendClass: "diag-forward" },
{ key: "diag-back", legendClass: "diag-back" },
{ key: "cross", legendClass: "cross" },
{ key: "dots", legendClass: "dots" },
];
const ADDITIONAL_PATTERNS = PATTERN_TYPES.filter((pattern) => pattern.key !== "none");
const sanitizeDepth = (value) => {
const parsed = parseInt(value, 10);
if (Number.isNaN(parsed)) return 1;
return Math.max(0, Math.min(parsed, 3));
};
const sanitizeMaxNodes = (value) => {
const parsed = parseInt(value, 10);
if (Number.isNaN(parsed)) return 200;
return Math.max(10, Math.min(parsed, 400));
};
const LABEL_SIZE_VALUES = ["off", "tiny", "small", "normal", "medium", "large", "xlarge"];
const LABEL_FONT_SIZES = {
tiny: "7px",
small: "8px",
normal: "9px",
medium: "10px",
large: "11px",
xlarge: "13px",
};
const DEFAULT_LABEL_SIZE = "tiny";
const isValidLabelSize = (value) => LABEL_SIZE_VALUES.includes(value);
const getLabelSize = () => {
if (!labelSizeInput) return DEFAULT_LABEL_SIZE;
const value = labelSizeInput.value;
return isValidLabelSize(value) ? value : DEFAULT_LABEL_SIZE;
};
function setLabelSizeInput(value) {
if (!labelSizeInput) return;
labelSizeInput.value = isValidLabelSize(value) ? value : DEFAULT_LABEL_SIZE;
}
const getChannelLabel = (node) =>
(node && (node.channel_name || node.channel_id)) || "Unknown";
function appendPatternContent(pattern, baseColor, patternKey) {
pattern.append("rect").attr("width", 8).attr("height", 8).attr("fill", baseColor);
const strokeColor = "#1f1f1f";
const strokeOpacity = 0.35;
const addForward = () => {
pattern
.append("path")
.attr("d", "M-2,6 L2,2 M0,8 L8,0 M6,10 L10,4")
.attr("stroke", strokeColor)
.attr("stroke-width", 1)
.attr("stroke-opacity", strokeOpacity)
.attr("fill", "none");
};
const addBackward = () => {
pattern
.append("path")
.attr("d", "M-2,2 L2,6 M0,0 L8,8 M6,-2 L10,2")
.attr("stroke", strokeColor)
.attr("stroke-width", 1)
.attr("stroke-opacity", strokeOpacity)
.attr("fill", "none");
};
switch (patternKey) {
case "diag-forward":
addForward();
break;
case "diag-back":
addBackward();
break;
case "cross":
addForward();
addBackward();
break;
case "dots":
pattern
.append("circle")
.attr("cx", 4)
.attr("cy", 4)
.attr("r", 1.5)
.attr("fill", strokeColor)
.attr("fill-opacity", strokeOpacity);
break;
default:
break;
}
}
function createChannelStyle(label, baseColor, patternKey) {
const patternInfo =
PATTERN_TYPES.find((pattern) => pattern.key === patternKey) || PATTERN_TYPES[0];
return {
baseColor,
hatch: patternInfo ? patternInfo.key : "none",
legendClass: patternInfo ? patternInfo.legendClass : "none",
};
}
let currentGraphData = null;
let currentChannelStyles = new Map();
let currentDepth = sanitizeDepth(depthInput.value);
let currentMaxNodes = sanitizeMaxNodes(maxNodesInput.value);
let currentSimulation = null;
let currentFullGraph = false;
let currentIncludeExternal = true;
let previousMaxNodesValue = maxNodesInput ? maxNodesInput.value : "200";
let previousMaxNodesValue = maxNodesInput ? maxNodesInput.value : "200";
function setStatus(message, isError = false) {
if (!statusEl) return;
statusEl.textContent = message;
if (isError) {
statusEl.classList.add("error");
} else {
statusEl.classList.remove("error");
}
}
function sanitizeId(value) {
return (value || "").trim();
}
function isFullGraphMode(forceValue) {
if (typeof forceValue === "boolean") {
return forceValue;
}
return fullGraphToggle ? !!fullGraphToggle.checked : false;
}
function applyFullGraphState(forceValue) {
const enabled = isFullGraphMode(forceValue);
if (typeof forceValue === "boolean" && fullGraphToggle) {
fullGraphToggle.checked = forceValue;
}
if (depthInput) {
depthInput.disabled = enabled;
}
if (maxNodesInput) {
if (enabled) {
previousMaxNodesValue = maxNodesInput.value || previousMaxNodesValue || "200";
maxNodesInput.value = "0";
maxNodesInput.disabled = true;
} else {
if (maxNodesInput.disabled) {
maxNodesInput.value = previousMaxNodesValue || "200";
}
maxNodesInput.disabled = false;
}
}
if (videoInput) {
if (enabled) {
videoInput.removeAttribute("required");
} else {
videoInput.setAttribute("required", "required");
}
}
}
async function fetchGraph(
videoId,
depth,
maxNodes,
fullGraphMode = false,
includeExternal = true
) {
const params = new URLSearchParams();
if (videoId) {
params.set("video_id", videoId);
}
if (fullGraphMode) {
params.set("full_graph", "1");
params.set("max_nodes", "0");
} else {
params.set("depth", String(depth));
params.set("max_nodes", String(maxNodes));
}
params.set("external", includeExternal ? "1" : "0");
const response = await fetch(`/api/graph?${params.toString()}`);
if (!response.ok) {
const errorPayload = await response.json().catch(() => ({}));
const errorMessage =
errorPayload.error ||
`Graph request failed (${response.status} ${response.statusText})`;
throw new Error(errorMessage);
}
return response.json();
}
function resizeContainer() {
if (!container) return;
const minHeight = 520;
const viewportHeight = window.innerHeight;
container.style.height = `${Math.max(minHeight, Math.round(viewportHeight * 0.6))}px`;
}
function renderGraph(data, labelSize = "normal") {
if (!container) return;
if (currentSimulation) {
currentSimulation.stop();
currentSimulation = null;
}
container.innerHTML = "";
const width = container.clientWidth || 900;
const height = container.clientHeight || 600;
const svg = d3
.select(container)
.append("svg")
.attr("viewBox", [0, 0, width, height])
.attr("width", "100%")
.attr("height", height);
const defs = svg.append("defs");
defs
.append("marker")
.attr("id", "arrow-references")
.attr("viewBox", "0 -5 10 10")
.attr("refX", 18)
.attr("refY", 0)
.attr("markerWidth", 6)
.attr("markerHeight", 6)
.attr("orient", "auto")
.append("path")
.attr("d", "M0,-5L10,0L0,5")
.attr("fill", "#6c83c7");
defs
.append("marker")
.attr("id", "arrow-referenced-by")
.attr("viewBox", "0 -5 10 10")
.attr("refX", 18)
.attr("refY", 0)
.attr("markerWidth", 6)
.attr("markerHeight", 6)
.attr("orient", "auto")
.append("path")
.attr("d", "M0,-5L10,0L0,5")
.attr("fill", "#c76c6c");
const contentGroup = svg.append("g").attr("class", "graph-content");
const linkGroup = contentGroup.append("g").attr("class", "graph-links");
const nodeGroup = contentGroup.append("g").attr("class", "graph-nodes");
const labelGroup = contentGroup.append("g").attr("class", "graph-labels");
const links = data.links || [];
const nodes = data.nodes || [];
currentChannelStyles = new Map();
const uniqueChannels = [];
nodes.forEach((node) => {
const label = getChannelLabel(node);
if (!currentChannelStyles.has(label)) {
uniqueChannels.push(label);
}
});
const additionalPatternCount = ADDITIONAL_PATTERNS.length;
uniqueChannels.forEach((label, idx) => {
const baseColor = color(label);
let patternKey = "none";
if (idx >= paletteSizeDefault && additionalPatternCount > 0) {
const patternInfo =
ADDITIONAL_PATTERNS[(idx - paletteSizeDefault) % additionalPatternCount];
patternKey = patternInfo.key;
}
const style = createChannelStyle(label, baseColor, patternKey);
currentChannelStyles.set(label, style);
});
const linkSelection = linkGroup
.selectAll("line")
.data(links)
.enter()
.append("line")
.attr("stroke-width", 1.2)
.attr("stroke", (d) =>
d.relation === "references" ? "#6c83c7" : "#c76c6c"
)
.attr("stroke-opacity", 0.7)
.attr("marker-end", (d) =>
d.relation === "references" ? "url(#arrow-references)" : "url(#arrow-referenced-by)"
);
let nodePatternCounter = 0;
const nodePatternRefs = new Map();
const getNodeFill = (node) => {
const style = currentChannelStyles.get(getChannelLabel(node));
if (!style) {
return color(getChannelLabel(node));
}
if (!style.hatch || style.hatch === "none") {
return style.baseColor;
}
const patternId = `node-pattern-${nodePatternCounter++}`;
const pattern = defs
.append("pattern")
.attr("id", patternId)
.attr("patternUnits", "userSpaceOnUse")
.attr("width", 8)
.attr("height", 8);
appendPatternContent(pattern, style.baseColor, style.hatch);
pattern.attr("patternTransform", "translate(0,0)");
nodePatternRefs.set(node.id, pattern);
return `url(#${patternId})`;
};
const nodeSelection = nodeGroup
.selectAll("circle")
.data(nodes, (d) => d.id)
.enter()
.append("circle")
.attr("r", (d) => (d.is_root ? 10 : 7))
.attr("fill", (d) => getNodeFill(d))
.attr("stroke", "#1f1f1f")
.attr("stroke-width", (d) => (d.is_root ? 2 : 1))
.call(
d3
.drag()
.on("start", (event, d) => {
if (!event.active) simulation.alphaTarget(0.3).restart();
d.fx = d.x;
d.fy = d.y;
})
.on("drag", (event, d) => {
d.fx = event.x;
d.fy = event.y;
})
.on("end", (event, d) => {
if (!event.active) simulation.alphaTarget(0);
d.fx = null;
d.fy = null;
})
)
.on("click", (event, d) => {
if (d.url) {
window.open(d.url, "_blank", "noopener");
}
})
.on("contextmenu", (event, d) => {
event.preventDefault();
loadGraph(d.id, currentDepth, currentMaxNodes, {
updateInputs: true,
includeExternal: currentIncludeExternal,
});
});
nodeSelection
.append("title")
.text((d) => {
const parts = [];
parts.push(d.title || d.id);
if (d.channel_name) {
parts.push(`Channel: ${d.channel_name}`);
}
if (d.date) {
parts.push(`Date: ${d.date}`);
}
return parts.join("\n");
});
const labelSelection = labelGroup
.selectAll("text")
.data(nodes, (d) => d.id)
.enter()
.append("text")
.attr("class", "graph-node-label")
.attr("text-anchor", "middle")
.attr("fill", "#1f1f1f")
.attr("pointer-events", "none")
.text((d) => d.title || d.id);
applyLabelAppearance(labelSelection, labelSize);
const simulation = d3
.forceSimulation(nodes)
.force(
"link",
d3
.forceLink(links)
.id((d) => d.id)
.distance(120)
.strength(0.8)
)
.force("charge", d3.forceManyBody().strength(-320))
.force("center", d3.forceCenter(width / 2, height / 2))
.force(
"collide",
d3.forceCollide().radius((d) => (d.is_root ? 20 : 14)).iterations(2)
);
simulation.on("tick", () => {
linkSelection
.attr("x1", (d) => d.source.x)
.attr("y1", (d) => d.source.y)
.attr("x2", (d) => d.target.x)
.attr("y2", (d) => d.target.y);
nodeSelection.attr("cx", (d) => d.x).attr("cy", (d) => d.y);
labelSelection.attr("x", (d) => d.x).attr("y", (d) => d.y - (d.is_root ? 14 : 12));
nodeSelection.each(function (d) {
const pattern = nodePatternRefs.get(d.id);
if (pattern) {
const safeX = Number.isFinite(d.x) ? d.x : 0;
const safeY = Number.isFinite(d.y) ? d.y : 0;
pattern.attr("patternTransform", `translate(${safeX}, ${safeY})`);
}
});
});
const zoomBehavior = d3
.zoom()
.scaleExtent([0.3, 3])
.on("zoom", (event) => {
contentGroup.attr("transform", event.transform);
});
svg.call(zoomBehavior);
currentSimulation = simulation;
}
async function loadGraph(
videoId,
depth,
maxNodes,
{ updateInputs = false, fullGraph, includeExternal } = {}
) {
const wantsFull = isFullGraphMode(
typeof fullGraph === "boolean" ? fullGraph : undefined
);
const includeFlag =
typeof includeExternal === "boolean" ? includeExternal : currentIncludeExternal;
currentIncludeExternal = includeFlag;
const sanitizedId = sanitizeId(videoId);
if (!wantsFull && !sanitizedId) {
setStatus("Please enter a video ID.", true);
return;
}
const safeDepth = wantsFull ? currentDepth || 1 : sanitizeDepth(depth);
const safeMaxNodes = wantsFull ? 0 : sanitizeMaxNodes(maxNodes);
if (updateInputs) {
videoInput.value = sanitizedId;
depthInput.value = String(wantsFull ? currentDepth || 1 : safeDepth);
maxNodesInput.value = String(safeMaxNodes);
applyFullGraphState(wantsFull);
} else {
applyFullGraphState();
}
setStatus(wantsFull ? "Loading full reference graph…" : "Loading graph…");
try {
const data = await fetchGraph(
sanitizedId,
safeDepth,
safeMaxNodes,
wantsFull,
includeFlag
);
if (!data.nodes || data.nodes.length === 0) {
setStatus("No nodes returned for this video.", true);
container.innerHTML = "";
currentGraphData = null;
currentChannelStyles = new Map();
renderLegend([]);
return;
}
currentGraphData = data;
currentDepth = safeDepth;
currentMaxNodes = safeMaxNodes;
currentFullGraph = wantsFull;
renderGraph(data, getLabelSize());
renderLegend(data.nodes);
setStatus(
`Showing ${data.nodes.length} nodes and ${data.links.length} links (${
data.meta?.mode === "full" ? "full graph" : `depth ${data.depth}`
})`
);
updateUrlState(
sanitizedId,
safeDepth,
safeMaxNodes,
getLabelSize(),
wantsFull,
includeFlag
);
} catch (err) {
console.error(err);
setStatus(err.message || "Failed to build graph.", true);
container.innerHTML = "";
currentGraphData = null;
currentChannelStyles = new Map();
renderLegend([]);
}
}
async function handleSubmit(event) {
event.preventDefault();
await loadGraph(videoInput.value, depthInput.value, maxNodesInput.value, {
updateInputs: true,
fullGraph: isFullGraphMode(),
includeExternal: currentIncludeExternal,
});
}
function renderLegend(nodes) {
let legend = document.getElementById("graphLegend");
if (!legend) {
legend = document.createElement("div");
legend.id = "graphLegend";
legend.className = "graph-legend";
if (statusEl && statusEl.parentNode) {
statusEl.insertAdjacentElement("afterend", legend);
} else {
container.parentElement?.insertBefore(legend, container);
}
}
legend.innerHTML = "";
const edgesSection = document.createElement("div");
edgesSection.className = "graph-legend-section";
const edgesTitle = document.createElement("div");
edgesTitle.className = "graph-legend-title";
edgesTitle.textContent = "Edges";
edgesSection.appendChild(edgesTitle);
const createEdgeRow = (swatchClass, text) => {
const row = document.createElement("div");
row.className = "graph-legend-row";
const swatch = document.createElement("span");
swatch.className = `graph-legend-swatch ${swatchClass}`;
const label = document.createElement("span");
label.textContent = text;
row.appendChild(swatch);
row.appendChild(label);
return row;
};
edgesSection.appendChild(
createEdgeRow(
"graph-legend-swatch--references",
"Outgoing reference (video references other)"
)
);
edgesSection.appendChild(
createEdgeRow(
"graph-legend-swatch--referenced",
"Incoming reference (other video references this)"
)
);
legend.appendChild(edgesSection);
const channelSection = document.createElement("div");
channelSection.className = "graph-legend-section";
const channelTitle = document.createElement("div");
channelTitle.className = "graph-legend-title";
channelTitle.textContent = "Channels in view";
channelSection.appendChild(channelTitle);
const channelList = document.createElement("div");
channelList.className = "graph-legend-channel-list";
const channelEntries = Array.from(currentChannelStyles.entries()).sort((a, b) =>
a[0].localeCompare(b[0], undefined, { sensitivity: "base" })
);
const maxChannelItems = 20;
channelEntries.slice(0, maxChannelItems).forEach(([label, style]) => {
const item = document.createElement("div");
item.className = `graph-legend-channel graph-legend-channel--${
style.legendClass || "none"
}`;
const swatch = document.createElement("span");
swatch.className = "graph-legend-swatch graph-legend-channel-swatch";
swatch.style.backgroundColor = style.baseColor;
const text = document.createElement("span");
text.textContent = label;
item.appendChild(swatch);
item.appendChild(text);
channelList.appendChild(item);
});
const totalChannels = channelEntries.length;
if (channelList.childElementCount) {
channelSection.appendChild(channelList);
if (totalChannels > maxChannelItems) {
const note = document.createElement("div");
note.className = "graph-legend-note";
note.textContent = `+${totalChannels - maxChannelItems} more channels`;
channelSection.appendChild(note);
}
} else {
const empty = document.createElement("div");
empty.className = "graph-legend-note";
empty.textContent = "No channel data available.";
channelSection.appendChild(empty);
}
legend.appendChild(channelSection);
}
function applyLabelAppearance(selection, labelSize) {
if (labelSize === "off") {
selection.style("display", "none");
} else {
selection
.style("display", null)
.attr("font-size", LABEL_FONT_SIZES[labelSize] || LABEL_FONT_SIZES.normal);
}
}
function updateUrlState(
videoId,
depth,
maxNodes,
labelSize,
fullGraphMode,
includeExternal
) {
if (isEmbedded) {
return;
}
const next = new URL(window.location.href);
if (videoId) {
next.searchParams.set("video_id", videoId);
} else {
next.searchParams.delete("video_id");
}
if (fullGraphMode) {
next.searchParams.set("full_graph", "1");
next.searchParams.delete("depth");
next.searchParams.set("max_nodes", "0");
} else {
next.searchParams.set("depth", String(depth));
next.searchParams.delete("full_graph");
next.searchParams.set("max_nodes", String(maxNodes));
}
if (!includeExternal) {
next.searchParams.set("external", "0");
} else {
next.searchParams.delete("external");
}
if (labelSize && labelSize !== "normal") {
next.searchParams.set("label_size", labelSize);
} else {
next.searchParams.delete("label_size");
}
history.replaceState({}, "", next.toString());
}
function initFromQuery() {
const params = new URLSearchParams(window.location.search);
const videoId = sanitizeId(params.get("video_id"));
const depth = sanitizeDepth(params.get("depth") || "");
const rawMaxNodes = params.get("max_nodes");
let maxNodes = sanitizeMaxNodes(rawMaxNodes || "");
if (rawMaxNodes && rawMaxNodes.trim() === "0") {
maxNodes = 0;
}
const labelSizeParam = params.get("label_size");
const fullGraphParam = params.get("full_graph");
const viewFull =
fullGraphParam && ["1", "true", "yes"].includes(fullGraphParam.toLowerCase());
const externalParam = params.get("external");
const includeExternal =
!externalParam ||
!["0", "false", "no"].includes(externalParam.toLowerCase());
currentIncludeExternal = includeExternal;
if (videoId) {
videoInput.value = videoId;
}
depthInput.value = String(depth);
maxNodesInput.value = String(viewFull ? 0 : maxNodes);
if (fullGraphToggle) {
fullGraphToggle.checked = !!viewFull;
}
applyFullGraphState();
if (labelSizeParam && isValidLabelSize(labelSizeParam)) {
setLabelSizeInput(labelSizeParam);
} else {
setLabelSizeInput(getLabelSize());
}
if ((isEmbedded && !viewFull) || (!videoId && !viewFull)) {
return;
}
loadGraph(videoId, depth, maxNodes, {
updateInputs: false,
fullGraph: viewFull,
includeExternal,
});
}
resizeContainer();
window.addEventListener("resize", resizeContainer);
form.addEventListener("submit", handleSubmit);
if (fullGraphToggle) {
fullGraphToggle.addEventListener("change", () => {
applyFullGraphState();
});
}
labelSizeInput.addEventListener("change", () => {
const size = getLabelSize();
if (currentGraphData) {
renderGraph(currentGraphData, size);
renderLegend(currentGraphData.nodes);
}
updateUrlState(
sanitizeId(videoInput.value),
currentDepth,
currentMaxNodes,
size,
currentFullGraph,
currentIncludeExternal
);
});
initFromQuery();
Object.assign(GraphUI, {
load(videoId, depth, maxNodes, options = {}) {
const targetDepth = depth != null ? depth : currentDepth;
const targetMax = maxNodes != null ? maxNodes : currentMaxNodes;
const explicitFull =
typeof options.fullGraph === "boolean"
? options.fullGraph
: undefined;
if (fullGraphToggle && typeof explicitFull === "boolean") {
fullGraphToggle.checked = explicitFull;
}
applyFullGraphState(
typeof explicitFull === "boolean" ? explicitFull : undefined
);
const fullFlag =
typeof explicitFull === "boolean"
? explicitFull
: isFullGraphMode();
const explicitInclude =
typeof options.includeExternal === "boolean"
? options.includeExternal
: undefined;
if (typeof explicitInclude === "boolean") {
currentIncludeExternal = explicitInclude;
}
return loadGraph(videoId, targetDepth, targetMax, {
updateInputs: options.updateInputs !== false,
fullGraph: fullFlag,
includeExternal:
typeof explicitInclude === "boolean"
? explicitInclude
: currentIncludeExternal,
});
},
setLabelSize(size) {
if (!labelSizeInput || !size) return;
setLabelSizeInput(size);
labelSizeInput.dispatchEvent(new Event("change", { bubbles: true }));
},
setDepth(value) {
if (!depthInput) return;
const safe = sanitizeDepth(value);
depthInput.value = String(safe);
currentDepth = safe;
},
setMaxNodes(value) {
if (!maxNodesInput) return;
const safe = sanitizeMaxNodes(value);
maxNodesInput.value = String(safe);
currentMaxNodes = safe;
},
focusInput() {
if (videoInput) {
videoInput.focus();
videoInput.select();
}
},
stop() {
if (currentSimulation) {
currentSimulation.stop();
currentSimulation = null;
}
},
getState() {
return {
depth: currentDepth,
maxNodes: currentMaxNodes,
labelSize: getLabelSize(),
nodes: currentGraphData ? currentGraphData.nodes.slice() : [],
links: currentGraphData ? currentGraphData.links.slice() : [],
fullGraph: currentFullGraph,
includeExternal: currentIncludeExternal,
};
},
setIncludeExternal(value) {
if (typeof value !== "boolean") return;
currentIncludeExternal = value;
},
isEmbedded,
});
GraphUI.ready = true;
setTimeout(() => {
window.dispatchEvent(new CustomEvent("graph-ui-ready"));
}, 0);
})();

View File

@@ -3,22 +3,40 @@
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>This Little Corner (Python)</title>
<link rel="stylesheet" href="https://unpkg.com/xp.css" />
<title>TLC Search</title>
<link rel="icon" href="/static/favicon.png" type="image/png" />
<link rel="stylesheet" href="https://unpkg.com/xp.css" integrity="sha384-isKk8ZXKlU28/m3uIrnyTfuPaamQIF4ONLeGSfsWGEe3qBvaeLU5wkS4J7cTIwxI" crossorigin="anonymous" />
<link rel="stylesheet" href="/static/style.css" />
<script src="https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js" integrity="sha384-CjloA8y00+1SDAUkjs099PVfnY2KmDC2BZnws9kh8D/lX1s46w6EPhpXdqMfjK6i" crossorigin="anonymous"></script>
</head>
<body>
<div class="window" style="max-width: 1200px; margin: 20px auto;">
<div class="title-bar">
<div class="title-bar-text">This Little Corner — Elastic Search</div>
<div class="title-bar-text">This Little Corner</div>
<div class="title-bar-controls">
<button id="aboutBtn" aria-label="About">?</button>
<button id="minimizeBtn" aria-label="Minimize"></button>
<button aria-label="Maximize"></button>
<button aria-label="Close"></button>
</div>
</div>
<div class="window-body">
<div class="window-actions">
<a
id="rssButton"
class="rss-button"
href="/rss"
target="_blank"
rel="noopener"
title="Unified RSS feed"
aria-label="Unified RSS feed"
>
<svg class="rss-button__icon" viewBox="0 0 24 24" aria-hidden="true">
<path d="M6 18a2 2 0 1 0 0 4a2 2 0 0 0 0-4zm-4 6a4 4 0 0 1 4-4a4 4 0 0 1 4 4h-2a2 2 0 0 0-2-2a2 2 0 0 0-2 2zm0-8v-2c6.627 0 12 5.373 12 12h-2c0-5.523-4.477-10-10-10zm0-4V4c11.046 0 20 8.954 20 20h-2c0-9.941-8.059-18-18-18z"/>
</svg>
<span class="rss-button__label">RSS</span>
</a>
</div>
<p>Enter a phrase to query title, description, and transcript text.</p>
<fieldset>
@@ -30,19 +48,22 @@
</div>
<div class="field-row" style="margin-bottom: 8px; align-items: center;">
<label style="width: 60px;">Channel:</label>
<details id="channelDropdown" class="channel-dropdown" style="flex: 1;">
<summary id="channelSummary">All Channels</summary>
<div id="channelOptions" class="channel-options">
<div>Loading channels…</div>
</div>
</details>
<label for="channel" style="width: 60px;">Channel:</label>
<select id="channel" style="flex: 1;">
<option value="">All Channels</option>
</select>
<label for="year" style="margin-left: 8px;">Year:</label>
<select id="year">
<option value="">All Years</option>
</select>
<label for="sort" style="margin-left: 8px;">Sort:</label>
<select id="sort">
<option value="relevant">Most relevant</option>
<option value="newer">Newest first</option>
<option value="older">Oldest first</option>
<option value="referenced">Most referenced</option>
</select>
<label for="size" style="margin-left: 8px;">Size:</label>
@@ -53,18 +74,36 @@
</select>
</div>
<div class="field-row">
<input type="checkbox" id="exactToggle" checked />
<label for="exactToggle">Exact</label>
<div class="field-row toggle-row">
<div class="toggle-item toggle-item--first">
<input type="checkbox" id="exactToggle" checked />
<label for="exactToggle">Exact</label>
<span class="toggle-help">Match all terms exactly.</span>
</div>
<input type="checkbox" id="fuzzyToggle" checked />
<label for="fuzzyToggle">Fuzzy</label>
<div class="toggle-item">
<input type="checkbox" id="fuzzyToggle" checked />
<label for="fuzzyToggle">Fuzzy</label>
<span class="toggle-help">Allow small typos and variations.</span>
</div>
<input type="checkbox" id="phraseToggle" checked />
<label for="phraseToggle">Phrase</label>
<div class="toggle-item">
<input type="checkbox" id="phraseToggle" checked />
<label for="phraseToggle">Phrase</label>
<span class="toggle-help">Boost exact phrases inside transcripts.</span>
</div>
<input type="checkbox" id="queryStringToggle" />
<label for="queryStringToggle">Query string mode</label>
<div class="toggle-item">
<input type="checkbox" id="externalToggle" />
<label for="externalToggle">External</label>
<span class="toggle-help">Include externally referenced items.</span>
</div>
<div class="toggle-item">
<input type="checkbox" id="queryStringToggle" />
<label for="queryStringToggle">Query string mode</label>
<span class="toggle-help">Use raw Lucene syntax (overrides other toggles).</span>
</div>
</div>
</fieldset>
@@ -78,7 +117,7 @@
</fieldset>
</div>
<div class="summary-right">
<fieldset style="height: 100%;">
<fieldset>
<legend>Timeline</legend>
<div id="frequencySummary" style="font-size: 11px; margin-bottom: 8px;"></div>
<div id="frequencyChart"></div>
@@ -92,11 +131,119 @@
</fieldset>
</div>
<div class="status-bar">
<p class="status-bar-field">Ready</p>
<div class="status-bar">
<p class="status-bar-field">Ready</p>
</div>
</div>
<div class="about-panel" id="aboutPanel" hidden>
<div class="about-panel__header">
<strong>About This App</strong>
<button id="aboutCloseBtn" aria-label="Close about panel">×</button>
</div>
<div class="about-panel__body">
<p>Use the toggles to choose exact, fuzzy, or phrase matching. Query string mode accepts raw Lucene syntax.</p>
<p>Results are ranked by your chosen sort order; the timeline summarizes the same query.</p>
<p>You can download transcripts, copy MLA citations, or explore references via the graph button.</p>
<div class="about-panel__section">
<div class="about-panel__label">Unified RSS feed</div>
<a id="rssFeedLink" href="#" target="_blank" rel="noopener">Loading…</a>
</div>
<div class="about-panel__section">
<div class="about-panel__label">Channel list</div>
<a id="channelListLink" href="/api/channel-list" target="_blank" rel="noopener">View JSON</a>
<div id="channelCount" class="about-panel__meta"></div>
</div>
</div>
</div>
<div
id="graphModalOverlay"
class="graph-modal-overlay"
aria-hidden="true"
>
<div
class="window graph-window graph-modal-window"
id="graphModalWindow"
role="dialog"
aria-modal="true"
aria-labelledby="graphModalTitle"
>
<div class="title-bar">
<div class="title-bar-text" id="graphModalTitle">Reference Graph</div>
<div class="title-bar-controls">
<button id="graphModalClose" aria-label="Close"></button>
</div>
</div>
<div class="window-body">
<p>
Explore how this video links with its neighbors. Adjust depth or node cap to expand the graph.
</p>
<form id="graphForm" class="graph-controls">
<div class="field-group">
<label for="graphVideoId">Video ID</label>
<input
id="graphVideoId"
name="video_id"
type="text"
placeholder="e.g. dQw4w9WgXcQ"
required
/>
</div>
<div class="field-group">
<label for="graphDepth">Depth</label>
<select id="graphDepth" name="depth">
<option value="1" selected>1 hop</option>
<option value="2">2 hops</option>
<option value="3">3 hops</option>
</select>
</div>
<div class="field-group">
<label for="graphMaxNodes">Max nodes</label>
<select id="graphMaxNodes" name="max_nodes">
<option value="100">100</option>
<option value="150">150</option>
<option value="200" selected>200</option>
<option value="300">300</option>
<option value="400">400</option>
</select>
</div>
<div class="field-group">
<label for="graphLabelSize">Labels</label>
<select id="graphLabelSize" name="label_size">
<option value="off">Off</option>
<option value="tiny" selected>Tiny</option>
<option value="small">Small</option>
<option value="normal">Normal</option>
<option value="medium">Medium</option>
<option value="large">Large</option>
<option value="xlarge">Extra large</option>
</select>
</div>
<button type="submit">Build graph</button>
</form>
<div id="graphStatus" class="graph-status">Enter a video ID to begin.</div>
<div
id="graphContainer"
class="graph-container"
data-embedded="true"
></div>
</div>
<div class="status-bar">
<p class="status-bar-field">Right-click a node to set a new root</p>
<p class="status-bar-field">Colors (and hatches) represent channels</p>
</div>
</div>
</div>
<script src="/static/graph.js"></script>
<script src="/static/app.js"></script>
</body>
</html>

61
static/notes.html Normal file
View File

@@ -0,0 +1,61 @@
<!doctype html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>Notes</title>
<link rel="icon" href="/static/favicon.png" type="image/png" />
<link rel="stylesheet" href="https://unpkg.com/xp.css" integrity="sha384-isKk8ZXKlU28/m3uIrnyTfuPaamQIF4ONLeGSfsWGEe3qBvaeLU5wkS4J7cTIwxI" crossorigin="anonymous" />
<link rel="stylesheet" href="/static/style.css" />
<style>
.notes-content {
line-height: 1.6;
}
.notes-content h2 {
margin-top: 1.5em;
margin-bottom: 0.5em;
border-bottom: 1px solid #ccc;
padding-bottom: 0.25em;
}
.notes-content h2:first-child {
margin-top: 0;
}
.notes-content p {
margin: 0.75em 0;
}
.notes-content ul, .notes-content ol {
margin: 0.75em 0;
padding-left: 1.5em;
}
.notes-content li {
margin: 0.25em 0;
}
</style>
</head>
<body>
<div class="window" style="max-width: 800px; margin: 20px auto;">
<div class="title-bar">
<div class="title-bar-text">Notes</div>
<div class="title-bar-controls">
<button aria-label="Minimize"></button>
<button aria-label="Maximize"></button>
<button aria-label="Close"></button>
</div>
</div>
<div class="window-body">
<p style="margin-bottom: 16px;"><a href="/">← Back to search</a></p>
<div class="notes-content">
<h2>Welcome</h2>
<p>This is a space for thoughts, observations, and notes related to this project and beyond.</p>
<!-- Add your notes below -->
</div>
</div>
<div class="status-bar">
<p class="status-bar-field">Last updated: January 2026</p>
</div>
</div>
</body>
</html>

View File

@@ -63,7 +63,7 @@ body.dimmed {
}
.field-row input[type="text"],
.field-row .channel-dropdown {
.field-row select#channel {
flex: 1 1 100% !important;
min-width: 0 !important;
max-width: 100% !important;
@@ -86,63 +86,73 @@ body.dimmed {
max-width: 100%;
min-width: 100%;
}
.graph-controls {
flex-direction: column;
align-items: stretch;
}
.graph-controls .field-group,
.graph-controls input,
.graph-controls select {
width: 100%;
min-width: 0;
}
}
/* Channel dropdown custom styling */
.channel-dropdown {
position: relative;
display: inline-block;
.toggle-row {
flex-direction: column;
align-items: flex-start;
gap: 4px;
margin-top: 8px;
}
.channel-dropdown summary {
list-style: none;
cursor: pointer;
padding: 3px 4px;
background: ButtonFace;
border: 1px solid;
border-color: ButtonHighlight ButtonShadow ButtonShadow ButtonHighlight;
min-width: 180px;
text-align: left;
.toggle-row > * {
margin-left: 0 !important;
}
.channel-dropdown summary::-webkit-details-marker {
display: none;
}
.channel-dropdown summary::after {
content: ' ▼';
font-size: 8px;
float: right;
}
.channel-dropdown[open] summary::after {
content: ' ▲';
}
.channel-options {
position: absolute;
margin-top: 2px;
padding: 4px;
background: ButtonFace;
border: 1px solid;
border-color: ButtonHighlight ButtonShadow ButtonShadow ButtonHighlight;
max-height: 300px;
overflow-y: auto;
box-shadow: 2px 2px 0 rgba(0, 0, 0, 0.2);
z-index: 100;
min-width: 220px;
}
.channel-option {
.toggle-item {
display: flex;
align-items: center;
gap: 6px;
margin-bottom: 4px;
font-size: 11px;
user-select: none;
}
.channel-option:last-child {
margin-bottom: 0;
.toggle-item label {
cursor: pointer;
width: auto !important;
}
.toggle-item--first {
margin-left: 0;
}
.toggle-item input[type="checkbox"] {
margin: 0;
}
.toggle-item input[type="checkbox"]:disabled + label {
color: GrayText;
opacity: 0.7;
}
.toggle-item input[type="checkbox"]:disabled {
cursor: not-allowed;
}
.toggle-item input[type="checkbox"]:disabled + label {
cursor: not-allowed;
}
.description-block {
background: Window;
border: 1px solid #919b9c;
padding: 6px 8px;
margin-top: 6px;
font-size: 11px;
white-space: pre-wrap;
max-height: 6em;
overflow-y: auto;
}
/* Layout helpers */
@@ -163,15 +173,440 @@ body.dimmed {
min-width: 300px;
}
.graph-window {
width: 95%;
}
.graph-controls {
display: flex;
flex-wrap: wrap;
gap: 12px;
align-items: flex-end;
margin-bottom: 12px;
}
.graph-controls .field-group {
display: flex;
flex-direction: column;
gap: 4px;
}
.graph-controls label {
font-size: 11px;
font-weight: bold;
}
.graph-controls .field-hint {
font-size: 10px;
color: #3c3c3c;
margin: 0;
max-width: 280px;
}
.graph-controls input,
.graph-controls select {
min-width: 160px;
}
.graph-status {
font-size: 11px;
margin-bottom: 8px;
color: #1f1f1f;
}
.graph-status.error {
color: #b00020;
}
.graph-container {
background: Window;
border: 1px solid #919b9c;
box-shadow: inset -1px -1px #0a0a0a, inset 1px 1px #fff;
position: relative;
width: 100%;
min-height: 520px;
height: auto;
overflow: visible;
}
.graph-modal-overlay {
position: fixed;
inset: 0;
display: none;
align-items: center;
justify-content: center;
padding: 24px;
background: rgba(0, 0, 0, 0.35);
z-index: 2000;
}
.graph-modal-overlay.active {
display: flex;
}
.graph-modal-window {
width: min(960px, 100%);
max-height: calc(100vh - 48px);
}
.graph-modal-window .window-body {
max-height: calc(100vh - 180px);
overflow-y: auto;
}
.graph-modal-window .graph-container {
height: 560px;
}
body.modal-open {
overflow: hidden;
}
.result-header {
display: flex;
justify-content: flex-start;
gap: 6px;
flex-wrap: wrap;
align-items: flex-start;
}
.result-header-main {
flex: 1 1 auto;
min-width: 220px;
}
.result-actions {
display: flex;
align-items: flex-start;
gap: 6px;
margin-left: auto;
}
.result-action-btn {
white-space: nowrap;
font-family: "Tahoma", "MS Sans Serif", sans-serif;
font-size: 11px;
padding: 4px 10px;
}
.result-meta {
display: flex;
align-items: center;
flex-wrap: wrap;
gap: 4px;
}
.result-status {
display: inline-flex;
align-items: center;
gap: 4px;
padding: 1px 6px;
border-radius: 3px;
font-size: 10px;
line-height: 1.3;
border: 1px solid #c4a3a3;
background: #fff6f6;
color: #6b1f1f;
}
.result-status::before {
content: "⚠";
font-size: 10px;
line-height: 1;
}
.result-status--deleted {
border-color: #d1a6a6;
background: #fff8f8;
color: #6b1f1f;
}
.graph-launch-btn {
white-space: nowrap;
}
.graph-node-label {
text-shadow: -1px -1px 0 #fff, 1px -1px 0 #fff, -1px 1px 0 #fff, 1px 1px 0 #fff;
}
.graph-nodes circle {
cursor: pointer;
}
.graph-legend {
margin: 12px 0;
font-size: 11px;
background: Window;
border: 1px solid #919b9c;
padding: 8px 10px;
display: inline-flex;
flex-direction: column;
gap: 4px;
box-shadow: inset -1px -1px #0a0a0a, inset 1px 1px #fff;
}
.graph-legend-section {
display: flex;
flex-direction: column;
gap: 4px;
}
.graph-legend-title {
font-weight: bold;
color: #1f1f1f;
}
.graph-legend-row {
display: flex;
align-items: center;
gap: 8px;
}
.graph-legend-swatch {
display: inline-block;
width: 18px;
height: 12px;
border: 1px solid #1f1f1f;
}
.graph-legend-swatch--references {
background: #6c83c7;
}
.graph-legend-swatch--referenced {
background: #c76c6c;
}
.graph-legend-channel-list {
display: flex;
flex-wrap: wrap;
gap: 8px;
}
.graph-legend-channel {
display: flex;
align-items: center;
gap: 6px;
}
.graph-legend-channel-swatch {
width: 14px;
height: 14px;
background-repeat: repeat;
background-position: 0 0;
background-size: 6px 6px;
}
.graph-legend-channel--none .graph-legend-channel-swatch {
background-image: none;
}
.graph-legend-channel--diag-forward .graph-legend-channel-swatch {
background-image: repeating-linear-gradient(
45deg,
rgba(0, 0, 0, 0.35) 0,
rgba(0, 0, 0, 0.35) 2px,
transparent 2px,
transparent 4px
);
background-blend-mode: multiply;
}
.graph-legend-channel--diag-back .graph-legend-channel-swatch {
background-image: repeating-linear-gradient(
-45deg,
rgba(0, 0, 0, 0.35) 0,
rgba(0, 0, 0, 0.35) 2px,
transparent 2px,
transparent 4px
);
background-blend-mode: multiply;
}
.graph-legend-channel--cross .graph-legend-channel-swatch {
background-image:
repeating-linear-gradient(
45deg,
rgba(0, 0, 0, 0.25) 0,
rgba(0, 0, 0, 0.25) 2px,
transparent 2px,
transparent 4px
),
repeating-linear-gradient(
-45deg,
rgba(0, 0, 0, 0.25) 0,
rgba(0, 0, 0, 0.25) 2px,
transparent 2px,
transparent 4px
);
background-blend-mode: multiply;
}
.graph-legend-channel--dots .graph-legend-channel-swatch {
background-image: radial-gradient(rgba(0, 0, 0, 0.35) 30%, transparent 31%);
background-size: 6px 6px;
background-blend-mode: multiply;
}
.graph-legend-note {
font-size: 10px;
color: #555;
font-style: italic;
}
.title-bar-link {
display: inline-block;
color: inherit;
text-decoration: none;
font-size: 11px;
padding: 2px 6px;
border: 1px solid;
border-color: ButtonHighlight ButtonShadow ButtonShadow ButtonHighlight;
background: ButtonFace;
}
.title-bar-controls #aboutBtn {
font-weight: bold;
font-size: 12px;
padding: 0 6px;
margin-right: 6px;
}
.toggle-item {
display: flex;
align-items: center;
gap: 6px;
}
.toggle-help {
font-size: 10px;
color: #555;
}
.about-panel {
position: fixed;
top: 20px;
right: 20px;
width: 280px;
background: Window;
border: 2px solid #919b9c;
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.25);
z-index: 2100;
font-size: 11px;
}
.about-panel__header {
display: flex;
justify-content: space-between;
align-items: center;
padding: 6px 8px;
background: #0055aa;
color: #fff;
}
.about-panel__body {
padding: 8px;
background: Window;
color: #000;
}
.about-panel__section {
margin-top: 8px;
padding-top: 6px;
border-top: 1px solid #c0c0c0;
}
.about-panel__label {
font-weight: bold;
margin-bottom: 2px;
}
.about-panel__meta {
font-size: 10px;
color: #555;
}
.about-panel__header button {
border: none;
background: transparent;
color: inherit;
font-weight: bold;
cursor: pointer;
}
/* Results styling */
#results .item {
border-bottom: 1px solid ButtonShadow;
padding: 12px 0;
background: Window;
border: 2px solid #919b9c;
padding: 12px;
margin-bottom: 8px;
max-width: 100%;
overflow: hidden;
word-wrap: break-word;
box-sizing: border-box;
box-shadow: 2px 2px 0 rgba(0, 0, 0, 0.15);
}
#results .item:last-child {
border-bottom: none;
margin-bottom: 0;
}
#results .item strong {
word-break: break-word;
max-width: 100%;
display: inline-block;
}
.window-body {
max-width: 100%;
overflow-x: hidden;
margin: 0;
padding: 1rem;
box-sizing: border-box;
}
.window-actions {
display: flex;
justify-content: flex-end;
margin-bottom: 6px;
}
.rss-button {
display: inline-flex;
align-items: center;
gap: 4px;
padding: 2px 6px;
border: 1px solid;
border-color: ButtonHighlight ButtonShadow ButtonShadow ButtonHighlight;
background: ButtonFace;
color: #000;
text-decoration: none;
font-size: 11px;
cursor: pointer;
}
.rss-button:hover {
background: #f3f3f3;
}
.rss-button:active {
border-color: ButtonShadow ButtonHighlight ButtonHighlight ButtonShadow;
}
.rss-button.is-disabled {
opacity: 0.5;
cursor: default;
pointer-events: none;
}
.rss-button__icon {
width: 14px;
height: 14px;
fill: #f38b00;
}
.rss-button__label {
font-weight: bold;
}
/* Badges */
@@ -180,6 +615,8 @@ body.dimmed {
display: flex;
gap: 4px;
flex-wrap: wrap;
max-width: 100%;
overflow: hidden;
}
.badge {
@@ -189,6 +626,31 @@ body.dimmed {
padding: 2px 6px;
font-size: 10px;
font-weight: bold;
white-space: nowrap;
word-break: keep-all;
}
.badge--transcript-primary {
background: #0b6efd;
}
.badge--transcript-secondary {
background: #8f4bff;
}
.badge--external {
background: #f5d08a;
color: #000;
border: 1px solid #cfa74f;
}
.badge-clickable {
cursor: pointer;
}
.badge-clickable:focus {
outline: 2px solid rgba(11, 110, 253, 0.6);
outline-offset: 1px;
}
/* Transcript and highlights */
@@ -212,9 +674,14 @@ body.dimmed {
}
.highlight-row {
padding: 4px;
padding: 4px 6px;
cursor: pointer;
border: 1px solid transparent;
display: flex;
align-items: flex-start;
gap: 8px;
max-width: 100%;
box-sizing: border-box;
}
.highlight-row:hover {
@@ -223,6 +690,77 @@ body.dimmed {
border: 1px dotted WindowText;
}
.highlight-text {
flex: 1 1 auto;
word-break: break-word;
overflow-wrap: anywhere;
}
.highlight-source-indicator {
width: 10px;
height: 10px;
border-radius: 2px;
border: 1px solid transparent;
margin-left: auto;
flex: 0 0 auto;
}
.highlight-source-indicator--primary {
background: #0b6efd;
border-color: #084bb5;
}
.highlight-source-indicator--secondary {
background: #8f4bff;
border-color: #5d2db3;
}
.vector-chunk {
margin-top: 8px;
padding: 8px;
background: #f3f7ff;
border: 1px solid #c7d0e2;
font-size: 11px;
line-height: 1.5;
word-break: break-word;
}
@media screen and (max-width: 640px) {
.result-header {
flex-direction: column;
gap: 6px;
}
.result-header-main {
flex: 1 1 auto;
min-width: 0;
width: 100%;
}
.result-actions {
width: auto;
align-self: flex-start;
justify-content: flex-start;
flex-wrap: wrap;
gap: 4px;
margin-left: 0;
}
.result-action-btn {
width: 100%;
text-align: left;
}
.highlight-row {
flex-direction: column;
gap: 4px;
}
.highlight-source-indicator {
align-self: flex-end;
}
}
mark {
background: yellow;
color: black;
@@ -237,8 +775,7 @@ mark {
margin-top: 12px;
padding: 8px;
background: Window;
border: 2px solid;
border-color: ButtonShadow ButtonHighlight ButtonHighlight ButtonShadow;
border: 2px solid #919b9c;
max-height: 400px;
overflow-y: auto;
font-size: 11px;
@@ -250,6 +787,10 @@ mark {
border-bottom: 1px solid ButtonShadow;
}
.transcript-segment--matched {
background: #fff6cc;
}
.transcript-segment:last-child {
border-bottom: none;
margin-bottom: 0;
@@ -294,27 +835,9 @@ mark {
line-height: 1.4;
}
.transcript-header {
font-weight: bold;
margin-bottom: 8px;
display: flex;
align-items: center;
justify-content: space-between;
background: ActiveCaption;
color: CaptionText;
padding: 2px 4px;
}
.transcript-header,
.transcript-close {
cursor: pointer;
font-size: 16px;
padding: 0 4px;
font-weight: bold;
}
.transcript-close:hover {
background: Highlight;
color: HighlightText;
display: none;
}
/* Chart styling */

188
sync_qdrant_channels.py Normal file
View File

@@ -0,0 +1,188 @@
"""
Utility to backfill channel titles/names inside the Qdrant payloads.
Usage:
python -m python_app.sync_qdrant_channels \
--batch-size 512 \
--max-batches 200 \
--dry-run
"""
from __future__ import annotations
import argparse
import logging
from typing import Dict, Iterable, List, Optional, Set, Tuple
import time
import requests
from .config import CONFIG
from .search_app import _ensure_client
LOGGER = logging.getLogger(__name__)
def chunked(iterable: Iterable, size: int):
chunk: List = []
for item in iterable:
chunk.append(item)
if len(chunk) >= size:
yield chunk
chunk = []
if chunk:
yield chunk
def resolve_channels(channel_ids: Iterable[str]) -> Dict[str, str]:
client = _ensure_client(CONFIG)
ids = list(set(channel_ids))
if not ids:
return {}
body = {
"size": len(ids) * 2,
"_source": ["channel_id", "channel_name"],
"query": {"terms": {"channel_id.keyword": ids}},
}
response = client.search(index=CONFIG.elastic.index, body=body)
resolved: Dict[str, str] = {}
for hit in response.get("hits", {}).get("hits", []):
source = hit.get("_source") or {}
cid = source.get("channel_id")
cname = source.get("channel_name")
if cid and cname and cid not in resolved:
resolved[cid] = cname
return resolved
def upsert_channel_payload(
qdrant_url: str,
collection: str,
channel_id: str,
channel_name: str,
*,
dry_run: bool = False,
) -> bool:
"""Set channel_name/channel_title for all vectors with this channel_id."""
payload = {"channel_name": channel_name, "channel_title": channel_name}
body = {
"payload": payload,
"filter": {"must": [{"key": "channel_id", "match": {"value": channel_id}}]},
}
LOGGER.info("Updating channel_id=%s -> %s", channel_id, channel_name)
if dry_run:
return True
resp = requests.post(
f"{qdrant_url}/collections/{collection}/points/payload",
json=body,
timeout=120,
)
if resp.status_code >= 400:
LOGGER.error("Failed to update %s: %s", channel_id, resp.text)
return False
return True
def scroll_missing_payloads(
qdrant_url: str,
collection: str,
batch_size: int,
*,
max_points: Optional[int] = None,
) -> Iterable[List[Tuple[str, Dict[str, any]]]]:
"""Yield batches of (point_id, payload) missing channel names."""
fetched = 0
next_page = None
while True:
current_limit = batch_size
while True:
body = {
"limit": current_limit,
"with_payload": True,
"filter": {"must": [{"is_empty": {"key": "channel_name"}}]},
}
if next_page:
body["offset"] = next_page
try:
resp = requests.post(
f"{qdrant_url}/collections/{collection}/points/scroll",
json=body,
timeout=120,
)
resp.raise_for_status()
break
except requests.HTTPError as exc:
LOGGER.warning(
"Scroll request failed at limit=%s: %s", current_limit, exc
)
if current_limit <= 5:
raise
current_limit = max(5, current_limit // 2)
LOGGER.info("Reducing scroll batch size to %s", current_limit)
time.sleep(2)
except requests.RequestException as exc: # type: ignore[attr-defined]
LOGGER.warning("Transient scroll error: %s", exc)
time.sleep(2)
payload = resp.json().get("result", {})
points = payload.get("points", [])
if not points:
break
batch: List[Tuple[str, Dict[str, any]]] = []
for point in points:
pid = point.get("id")
p_payload = point.get("payload") or {}
batch.append((pid, p_payload))
yield batch
fetched += len(points)
if max_points and fetched >= max_points:
break
next_page = payload.get("next_page_offset")
if not next_page:
break
def main() -> None:
logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s")
parser = argparse.ArgumentParser(
description="Backfill missing channel_name/channel_title in Qdrant payloads"
)
parser.add_argument("--batch-size", type=int, default=512)
parser.add_argument(
"--max-points",
type=int,
default=None,
help="Limit processing to the first N points for testing",
)
parser.add_argument("--dry-run", action="store_true")
args = parser.parse_args()
q_url = CONFIG.qdrant_url
collection = CONFIG.qdrant_collection
total_updates = 0
for batch in scroll_missing_payloads(
q_url, collection, args.batch_size, max_points=args.max_points
):
channel_ids: Set[str] = set()
for _, payload in batch:
cid = payload.get("channel_id")
if cid:
channel_ids.add(str(cid))
if not channel_ids:
continue
resolved = resolve_channels(channel_ids)
if not resolved:
LOGGER.warning("No channel names resolved for ids: %s", channel_ids)
continue
for cid, name in resolved.items():
if upsert_channel_payload(
q_url, collection, cid, name, dry_run=args.dry_run
):
total_updates += 1
LOGGER.info("Updated %s channel payloads so far", total_updates)
LOGGER.info("Finished. Total channel updates attempted: %s", total_updates)
if __name__ == "__main__":
main()

78
urls.txt Normal file
View File

@@ -0,0 +1,78 @@
https://www.youtube.com/channel/UCCebR16tXbv5Ykk9_WtCCug/videos
https://www.youtube.com/channel/UC6vg0HkKKlgsWk-3HfV-vnw/videos
https://www.youtube.com/channel/UCeWWxwzgLYUbfjWowXhVdYw/videos
https://www.youtube.com/channel/UC952hDf_C4nYJdqwK7VzTxA/videos
https://www.youtube.com/channel/UCU5SNBfTo4umhjYz6M0Jsmg/videos
https://www.youtube.com/channel/UC6Tvr9mBXNaAxLGRA_sUSRA/videos
https://www.youtube.com/channel/UC4Rmxg7saTfwIpvq3QEzylQ/videos
https://www.youtube.com/channel/UCTdH4nh6JTcfKUAWvmnPoIQ/videos
https://www.youtube.com/channel/UCsi_x8c12NW9FR7LL01QXKA/videos
https://www.youtube.com/channel/UCAqTQ5yLHHH44XWwWXLkvHQ/videos
https://www.youtube.com/channel/UCprytROeCztMOMe8plyJRMg/videos
https://www.youtube.com/channel/UCpqDUjTsof-kTNpnyWper_Q/videos
https://www.youtube.com/channel/UCL_f53ZEJxp8TtlOkHwMV9Q/videos
https://www.youtube.com/channel/UCez1fzMRGctojfis2lfRYug/videos
https://www.youtube.com/channel/UC2leFZRD0ZlQDQxpR2Zd8oA/videos
https://www.youtube.com/channel/UC8SErJkYnDsYGh1HxoZkl-g/videos
https://www.youtube.com/channel/UCEPOn4cgvrrerg_-q_Ygw1A/videos
https://www.youtube.com/channel/UC2yCyOMUeem-cYwliC-tLJg/videos
https://www.youtube.com/channel/UCGsDIP_K6J6VSTqlq-9IPlg/videos
https://www.youtube.com/channel/UCEzWTLDYmL8soRdQec9Fsjw/videos
https://www.youtube.com/channel/UC1KgNsMdRoIA_njVmaDdHgA/videos
https://www.youtube.com/channel/UCFQ6Gptuq-sLflbJ4YY3Umw/videos
https://www.youtube.com/channel/UCEY1vGNBPsC3dCatZyK3Jkw/videos
https://www.youtube.com/channel/UCIAtCuzdvgNJvSYILnHtdWA/videos
https://www.youtube.com/channel/UClIDP7_Kzv_7tDQjTv9EhrA/videos
https://www.youtube.com/channel/UC-QiBn6GsM3JZJAeAQpaGAA/videos
https://www.youtube.com/channel/UCiJmdXTb76i8eIPXdJyf8ZQ/videos
https://www.youtube.com/channel/UCM9Z05vuQhMEwsV03u6DrLA/videos
https://www.youtube.com/channel/UCgp_r6WlBwDSJrP43Mz07GQ/videos
https://www.youtube.com/channel/UC5uv-BxzCrN93B_5qbOdRWw/videos
https://www.youtube.com/channel/UCtCTSf3UwRU14nYWr_xm-dQ/videos
https://www.youtube.com/channel/UC1a4VtU_SMSfdRiwMJR33YQ/videos
https://www.youtube.com/channel/UCg7Ed0lecvko58ibuX1XHng/videos
https://www.youtube.com/channel/UCMVG5eqpYFVEB-a9IqAOuHA/videos
https://www.youtube.com/channel/UC8mJqpS_EBbMcyuzZDF0TEw/videos
https://www.youtube.com/channel/UCGHuURJ1XFHzPSeokf6510A/videos
https://www.youtube.com/@chrishoward8473/videos
https://www.youtube.com/channel/UChptV-kf8lnncGh7DA2m8Pw/videos
https://www.youtube.com/channel/UCzX6R3ZLQh5Zma_5AsPcqPA/videos
https://www.youtube.com/channel/UCiukuaNd_qzRDTW9qe2OC1w/videos
https://www.youtube.com/channel/UC5yLuFQCms4nb9K2bGQLqIw/videos
https://www.youtube.com/channel/UCVdSgEf9bLXFMBGSMhn7x4Q/videos
https://www.youtube.com/channel/UC_dnk5D4tFCRYCrKIcQlcfw/videos
https://www.youtube.com/@Freerilian/videos
https://www.youtube.com/@marks.-ry7bm/videos
https://www.youtube.com/@Adams-Fall/videos
https://www.youtube.com/@mcmosav/videos
https://www.youtube.com/@Landbeorht/videos
https://www.youtube.com/@Corner_Citizen/videos
https://www.youtube.com/@ethan.caughey/videos
https://www.youtube.com/@MarcInTbilisi/videos
https://www.youtube.com/@climbingmt.sophia/videos
https://www.youtube.com/@Skankenstein/videos
https://www.youtube.com/@UpCycleClub/videos
https://www.youtube.com/@JessPurviance/videos
https://www.youtube.com/@greyhamilton52/videos
https://www.youtube.com/@paulrenenichols/videos
https://www.youtube.com/@OfficialSecularKoranism/videos
https://www.youtube.com/@FromWhomAllBlessingsFlow/videos
https://www.youtube.com/@FoodTruckEmily/videos
https://www.youtube.com/@O.G.Rose.Michelle.and.Daniel/videos
https://www.youtube.com/@JonathanDumeer/videos
https://www.youtube.com/@JordanGreenhall/videos
https://www.youtube.com/@NechamaGluck/videos
https://www.youtube.com/@justinsmorningcoffee/videos
https://www.youtube.com/@grahampardun/videos
https://www.youtube.com/@michaelmartin8681/videos
https://www.youtube.com/@davidbusuttil9086/videos
https://www.youtube.com/@matthewparlato5626/videos
https://www.youtube.com/@lancecleaver227/videos
https://www.youtube.com/@theplebistocrat/videos
https://www.youtube.com/@rigelwindsongthurston/videos
https://www.youtube.com/@RightInChrist/videos
https://www.youtube.com/@RafeKelley/videos
https://www.youtube.com/@WavesOfObsession/videos
https://www.youtube.com/@LeviathanForPlay/videos
https://www.youtube.com/channel/UCehAungJpAeC-F3R5FwvvCQ/videos
https://www.youtube.com/channel/UC4YwC5zA9S_2EwthE27Xlew/videos