Compare commits

...

34 Commits

Author SHA1 Message Date
d23888c68d Add last_posted date to /api/channel-list from Elasticsearch
Some checks failed
docker-build / build (push) Has been cancelled
Queries the latest video date per channel and includes it in the
channel-list JSON response.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 12:14:53 -04:00
c019730666 Fix remaining placeholder channel names
Some checks failed
docker-build / build (push) Has been cancelled
- UCCebR16tXbv5Ykk9_WtCCug -> Christian T. Golden
- UC4YwC5zA9S_2EwthE27Xlew -> CMA

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 12:04:50 -04:00
bb2850ef98 Add /channels HTML page and fix placeholder channel names
Some checks failed
docker-build / build (push) Has been cancelled
- Add /channels route serving a simple HTML page with channel names
  linked to their YouTube pages
- Fix names for UCehAungJpAeC (Wholly Unfocused) and UCiJmdXTb76i
  (Bridges of Meaning Hub) from Elasticsearch data

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 12:01:45 -04:00
7fdb31bf18 Add 3 missing channels from jet-alone to channels.yml source of truth
Some checks failed
docker-build / build (push) Has been cancelled
Syncs channels.yml (canonical) and urls.txt with channels that existed
only on jet-alone: LeviathanForPlay, UCehAungJpAeC-F3R5FwvvCQ,
UC4YwC5zA9S_2EwthE27Xlew.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 11:39:06 -04:00
Ubuntu
090f5943c3 Add notes page
Some checks failed
docker-build / build (push) Has been cancelled
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 20:40:53 +00:00
d168287636 Add Rigel Windsong Thurston
Some checks failed
docker-build / build (push) Has been cancelled
2026-01-10 13:36:10 -05:00
6534db6f64 Ignore .gemini artifacts
Some checks failed
docker-build / build (push) Has been cancelled
2026-01-08 22:55:33 -05:00
30503628b5 Add unified channel feed 2026-01-08 22:53:30 -05:00
63fe922860 Document channel feeds 2026-01-08 22:46:30 -05:00
1ac076e5f2 Harden search responses 2026-01-08 15:42:21 -05:00
1c95f47766 Add API rate limits 2026-01-08 15:24:05 -05:00
6a3d1ee491 Disable vector search 2026-01-08 15:20:06 -05:00
8e4c57a93a Security: add security headers, CSP, request size limits 2026-01-08 14:53:44 -05:00
1565c8db38 Security: disable debug mode, sanitize query input, validate Qdrant filters, add size/offset bounds 2026-01-08 14:41:42 -05:00
d26edda029 Add graph traversal endpoints and sort metrics by channel name 2026-01-08 14:22:01 -05:00
9dd74111e7 Change default sort to newer first 2026-01-08 14:12:15 -05:00
93774c025f Respect external filter in metrics and graph
Some checks failed
docker-build / build (push) Has been cancelled
2025-11-20 09:54:41 -05:00
b0c9d319ef Remove full graph node cap
Some checks failed
docker-build / build (push) Has been cancelled
2025-11-20 09:42:14 -05:00
82c334b131 Add full reference graph mode
Some checks failed
docker-build / build (push) Has been cancelled
2025-11-19 15:23:21 -05:00
7f74aaced8 Persist search settings locally
Some checks failed
docker-build / build (push) Has been cancelled
2025-11-19 10:20:00 -05:00
c88d1886c9 Fix backlink badge query to target referencing videos
Some checks failed
docker-build / build (push) Has been cancelled
2025-11-18 23:47:07 -05:00
c6b46edacc Default external off and filter channels/backlink queries
Some checks failed
docker-build / build (push) Has been cancelled
2025-11-18 23:42:49 -05:00
4c20329f36 Add external reference toggle and badges
Some checks failed
docker-build / build (push) Has been cancelled
2025-11-18 23:07:13 -05:00
b267a0ecc6 Add Gitea workflow for Docker image builds
Some checks failed
docker-build / build (push) Has been cancelled
2025-11-18 19:14:20 -05:00
f299126ab2 Point compose to remote Elasticsearch and Qdrant 2025-11-18 13:25:41 -05:00
86fd017f3c Add Docker and compose setup 2025-11-18 13:21:14 -05:00
40d4f41f6e Add graph and vector search features 2025-11-09 14:24:50 -05:00
14d37f23e4 Add clickable reference badges and improve UI layout
- Add clickable badges for backlinks and references that trigger query string searches
- Improve toggle checkbox layout with better styling
- Add description block styling with scrollable container
- Update results styling with bordered cards and shadows
- Add favicon support across pages
- Enhance .env loading with logging for debugging

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 14:56:43 -05:00
d8d2c5e34c Fix results overflow and add debug logging for reference badges
CSS Changes:
- Added max-width and overflow handling to .badge-row
- Added word-wrap and overflow protection to .item
- Added overflow-x: hidden to .window-body
- Badges now use white-space: nowrap to prevent text wrapping
- Item titles now break words properly with word-break

JavaScript Changes:
- Added console.log debugging for reference counts
- Logs show whether fields are present and their values
- Helps diagnose why badges aren't appearing

This should fix the overflow issue and help debug badge visibility.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 11:18:17 -05:00
595b19f7c7 Fix sorting by referenced_by_count with unmapped_type handling
- Added unmapped_type parameter to referenced_by_count sort
- This handles documents that don't have the field yet
- Updated ingest.py to include reference fields when indexing:
  * internal_references
  * internal_references_count
  * referenced_by
  * referenced_by_count
- Updated index mapping to include reference fields
- Documents without the field will sort as 0 (appear last)

Fixes BadRequestError: No mapping found for [referenced_by_count]

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 11:10:56 -05:00
d616b87701 Add python-dotenv support for automatic .env loading
- Added python-dotenv to requirements.txt
- Config now automatically loads .env file if present
- Allows local development without manually exporting env vars
- Gracefully falls back if python-dotenv not installed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 11:03:42 -05:00
7988e2751a Add video reference tracking and display
- Add "Most referenced" sort option to sort by backlink count
- Backend now supports sorting by referenced_by_count field
- Search results now display reference counts as badges:
  - Shows number of backlinks (videos linking to this one)
  - Shows number of internal references (outbound links)
- Reference badges appear alongside transcript source badges

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 10:52:00 -05:00
2846e13a81 Fix timestamp parsing for string format timestamps
Both primary and secondary transcripts use 'timestamp' field
with string format "HH:MM:SS.mmm" instead of numeric seconds.

Changes:
- Add parseTimestampToSeconds() to handle string timestamps
- Parse "HH:MM:SS.mmm" format (e.g., "00:00:39.480")
- Also handle "MM:SS" format
- Still support numeric timestamps (seconds or milliseconds)
- Check 'timestamp' field first (primary format in data)

This fixes the NaN issue and displays correct timestamps
for both primary and secondary transcripts.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 01:16:26 -05:00
e241d206c5 Fix NaN timestamps with proper type checking
Previous || chain could pass through invalid values causing NaN.
Now explicitly checks each possible timestamp field with:
- null check (field != null)
- NaN check (!isNaN(parseFloat(field)))
- Takes first valid numeric value found

This ensures timestamps always have a valid number, defaulting
to 0 if no valid timestamp field is found.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 01:09:21 -05:00
28 changed files with 5411 additions and 391 deletions

13
.dockerignore Normal file
View File

@@ -0,0 +1,13 @@
.git
.gitignore
.venv
__pycache__
*.pyc
*.pyo
.DS_Store
node_modules
data
videos
*.log
feed-master-config/var
feed-master-config/images

View File

@@ -0,0 +1,37 @@
# Build and push the TLC Search Docker image whenever changes land on master.
name: docker-build
on:
push:
branches:
- master
env:
IMAGE_NAME: gitea.ghost.tel/knight/tlc-search
jobs:
build:
runs-on: docker
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to registry
uses: docker/login-action@v2
with:
registry: gitea.ghost.tel
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Build and push image
uses: docker/build-push-action@v5
with:
context: .
file: Dockerfile
push: true
tags: |
${{ env.IMAGE_NAME }}:latest
${{ env.IMAGE_NAME }}:${{ github.sha }}

5
.gitignore vendored
View File

@@ -33,6 +33,7 @@ env/
# IDE # IDE
.vscode/ .vscode/
.idea/ .idea/
.gemini/
*.swp *.swp
*.swo *.swo
*~ *~
@@ -51,6 +52,10 @@ Thumbs.db
# Logs # Logs
*.log *.log
# Feed Master runtime cache
feed-master-config/var/
feed-master-config/images/
# Testing # Testing
.pytest_cache/ .pytest_cache/
.coverage .coverage

31
AGENTS.md Normal file
View File

@@ -0,0 +1,31 @@
# Repository Guidelines
## Project Structure & Module Organization
- Core modules live under `python_app/`: `config.py` centralizes settings, `transcript_collector.py` gathers transcripts, `ingest.py` handles Elasticsearch bulk loads, and `search_app.py` exposes the Flask UI.
- Static assets belong in `static/` (`index.html`, `frequency.html`, companion JS/CSS). Keep HTML here and wire it up through Flask routes.
- Runtime artifacts land in `data/` (`raw/` for downloads, `video_metadata/` for cleaned payloads). Preserve the JSON schema emitted by the collector.
- When adding utilities, place them in `python_app/` and use package-relative imports so scripts continue to run via `python -m`.
## Build, Test, and Development Commands
- `python -m venv .venv && source .venv/bin/activate`: bootstrap the virtualenv used by all scripts.
- `pip install -r requirements.txt`: install Flask, Elasticsearch tooling, Google API clients, and dotenv support.
- `python -m python_app.transcript_collector --channel UC... --output data/raw`: fetch transcript JSON for a channel; rerun to refresh cached data.
- `python -m python_app.ingest --source data/video_metadata --index this_little_corner_py`: index prepared metadata and auto-create mappings when needed.
- `python -m python_app.search_app`: launch the Flask server on port 8080 for UI smoke tests.
## Coding Style & Naming Conventions
- Follow PEP 8 with 4-space indentation, `snake_case` for functions/modules, and `CamelCase` for classes; reserve UPPER_SNAKE_CASE for configuration constants.
- Keep Elasticsearch payload keys lower-case with underscores, and centralize shared values in `config.py` rather than scattering literals.
## Testing Guidelines
- No automated suite is committed yet; when adding coverage, create `tests/` modules using `pytest` with files named `test_*.py`.
- Focus tests on collector pagination, ingest transformations, and Flask route helpers, and run `python -m pytest` locally before opening a PR.
- Manually verify by ingesting a small sample into a local Elasticsearch node and checking facets, highlights, and transcript retrieval via the UI.
## Commit & Pull Request Guidelines
- Mirror the existing history: short, imperative commit subjects (e.g. “Fix results overflow”, “Add video reference tracking”).
- PRs should describe scope, list environment variables or indices touched, link issues, and attach before/after screenshots whenever UI output changes. Highlight Elasticsearch mapping or data migration impacts for both search and frontend reviewers.
## Configuration & Security Tips
- Load credentials through environment variables (`ELASTIC_URL`, `ELASTIC_USERNAME`, `ELASTIC_PASSWORD`, `ELASTIC_API_KEY`, `YOUTUBE_API_KEY`) or a `.env` file, and keep secrets out of version control.
- Adjust `ELASTIC_VERIFY_CERTS`, `ELASTIC_CA_CERT`, and `ELASTIC_DEBUG` only while debugging, and prefer branch-specific indices (`this_little_corner_py_<initials>`) to avoid clobbering shared data.

32
Dockerfile Normal file
View File

@@ -0,0 +1,32 @@
FROM python:3.11-slim
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
WORKDIR /app
# System deps kept lean to support torch/sentence-transformers wheels.
RUN apt-get update \
&& apt-get install -y --no-install-recommends build-essential git curl \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt /app/requirements.txt
RUN pip install --no-cache-dir -r /app/requirements.txt
# Copy the package into /app/python_app so `python -m python_app.search_app` works.
COPY . /app/python_app
ENV ELASTIC_URL=http://elasticsearch:9200 \
ELASTIC_INDEX=this_little_corner_py \
ELASTIC_VERIFY_CERTS=0 \
QDRANT_URL=http://qdrant:6333 \
QDRANT_COLLECTION=tlc-captions-full \
QDRANT_VECTOR_NAME= \
QDRANT_VECTOR_SIZE=1024 \
QDRANT_EMBED_MODEL=BAAI/bge-large-en-v1.5 \
LOCAL_DATA_DIR=/app/data/video_metadata
EXPOSE 8080
WORKDIR /app
CMD ["python", "-m", "python_app.search_app"]

87
Makefile Normal file
View File

@@ -0,0 +1,87 @@
# Makefile for TLC Search + Feed Master
.PHONY: help config up down restart logs status update-channels
help:
@echo "TLC Search + Feed Master - Management Commands"
@echo ""
@echo "Configuration:"
@echo " make config - Regenerate feed-master configuration from channels.yml"
@echo ""
@echo "Service Management:"
@echo " make up - Start all services"
@echo " make down - Stop all services"
@echo " make restart - Restart all services"
@echo " make logs - View all service logs"
@echo " make status - Check service status"
@echo ""
@echo "Updates:"
@echo " make update-channels - Regenerate config and restart feed-master"
@echo ""
@echo "Individual Services:"
@echo " make logs-feed - View feed-master logs"
@echo " make logs-bridge - View rss-bridge logs"
@echo " make logs-app - View TLC Search logs"
@echo " make restart-feed - Restart feed-master only"
# Generate feed-master configuration from channels.yml
config:
@echo "Generating feed-master configuration..."
python3 -m python_app.generate_feed_config_simple
@echo "Configuration updated!"
# Start all services
up:
docker compose up -d
@echo ""
@echo "Services started!"
@echo " - RSS Bridge: http://localhost:3001"
@echo " - Feed Master: http://localhost:8097/rss/youtube-unified"
@echo " - TLC Search: http://localhost:8080"
# Stop all services
down:
docker compose down
# Restart all services
restart:
docker compose restart
# View all logs
logs:
docker compose logs -f
# View feed-master logs
logs-feed:
docker compose logs -f feed-master
# View rss-bridge logs
logs-bridge:
docker compose logs -f rss-bridge
# View TLC Search logs
logs-app:
docker compose logs -f app
# Check service status
status:
@docker compose ps
@echo ""
@echo "Endpoints:"
@echo " - RSS Bridge: http://localhost:3001"
@echo " - Feed Master: http://localhost:8097/rss/youtube-unified"
@echo " - TLC Search: http://localhost:8080"
# Restart only feed-master
restart-feed:
docker compose restart feed-master
# Pull latest channel URLs and regenerate configuration
update-channels:
@echo "Regenerating feed-master configuration..."
python3 -m python_app.generate_feed_config_simple
@echo ""
@echo "Restarting feed-master..."
docker compose restart feed-master
@echo ""
@echo "Update complete!"

209
README-FEED-MASTER.md Normal file
View File

@@ -0,0 +1,209 @@
# TLC Search + Feed Master Integration
This directory contains an integrated setup combining:
- **TLC Search**: Flask app for searching YouTube transcripts (Elasticsearch/Qdrant)
- **Feed Master**: RSS aggregator for YouTube channels
- **RSS Bridge**: Converts YouTube channels to RSS feeds
All services share the same source of truth for YouTube channels from `channels.yml` and the adjacent
`urls.txt` in this repository.
## Architecture
```
┌─────────────────────┐
│ channels.yml │ Source of truth (this repo)
│ (python_app repo) │
└──────────┬──────────┘
├─────────────────────────────┬────────────────────────┐
│ │ │
v v v
┌──────────────┐ ┌──────────────┐ ┌─────────────────┐
│ TLC Search │ │ RSS Bridge │ │ Feed Master │
│ (Flask App) │ │ (Port 3001) │───────>│ (Port 8097) │
│ Port 8080 │ └──────────────┘ └─────────────────┘
│ │ │
│ Elasticsearch│ │
│ Qdrant │ │
└──────────────┘ │
v
http://localhost:8097/rss/youtube-unified
```
## Services
### 1. TLC Search (Port 8080)
- Indexes and searches YouTube transcripts
- Uses Elasticsearch for metadata and Qdrant for vector search
- Connects to remote Elasticsearch/Qdrant instances
### 2. RSS Bridge (Port 3001)
- Converts YouTube channels to RSS feeds
- Supports both channel IDs and @handles
- Used by Feed Master to aggregate feeds
### 3. Feed Master (Port 8097)
- Aggregates all YouTube channel RSS feeds into one unified feed
- Updates every 5 minutes
- Keeps the most recent 200 items from all channels
## Setup
### Prerequisites
- Docker and Docker Compose
- Python 3.x
### Configuration
1. **Environment Variables**: Create `.env` file with:
```bash
# Elasticsearch
ELASTIC_URL=https://your-elasticsearch-url
ELASTIC_INDEX=this_little_corner_py
ELASTIC_USERNAME=your_username
ELASTIC_PASSWORD=your_password
# Qdrant
QDRANT_URL=https://your-qdrant-url
QDRANT_COLLECTION=tlc-captions-full
# Optional UI links
RSS_FEED_URL=/rss/youtube-unified
CHANNELS_PATH=/app/python_app/channels.yml
RSS_FEED_UPSTREAM=http://feed-master:8080
```
2. **Generate Feed Configuration**:
```bash
# Regenerate feed-master config from the channels list
python3 -m python_app.generate_feed_config_simple
```
This reads `channels.yml` and generates `feed-master-config/fm.yml`.
### Starting Services
```bash
# Start all services
docker compose up -d
# View logs
docker compose logs -f
# View specific service logs
docker compose logs -f feed-master
docker compose logs -f rss-bridge
docker compose logs -f app
```
### Stopping Services
```bash
# Stop all services
docker compose down
# Stop specific service
docker compose stop feed-master
```
## Usage
### Unified RSS Feed
Access the aggregated feed through the TLC app (recommended):
- **URL**: http://localhost:8080/rss
- **Format**: RSS/Atom XML
- **Behavior**: Filters RSS-Bridge error items and prefixes titles with channel name
- **Updates**: Every 5 minutes (feed-master schedule)
- **Items**: Most recent 200 items across all channels
Direct feed-master access still works:
- **URL**: http://localhost:8097/rss/youtube-unified
### TLC Search
Access the search interface at:
- **URL**: http://localhost:8080
### Channel List Endpoints
- **Plain text list**: http://localhost:8080/channels.txt
- **JSON metadata**: http://localhost:8080/api/channel-list
### RSS Bridge
Access individual channel feeds or the web interface at:
- **URL**: http://localhost:3001
## Updating Channel List
When channels are added/removed from `channels.yml`:
```bash
# 1. Regenerate feed configuration
cd /var/core/this-little-corner/src/python_app
python3 -m python_app.generate_feed_config_simple
# 2. Restart feed-master to pick up changes
docker compose restart feed-master
```
## File Structure
```
python_app/
├── docker-compose.yml # All services configuration
├── channels.yml # Canonical YouTube channel list
├── urls.txt # URL list kept in sync with channels.yml
├── generate_feed_config_simple.py # Config generator script (run via python -m)
├── feed-master-config/
│ ├── fm.yml # Feed Master configuration (auto-generated)
│ ├── var/ # Feed Master database
│ └── images/ # Cached images
├── data/ # TLC Search data (read-only)
└── README-FEED-MASTER.md # This file
```
## Troubleshooting
### Feed Master not updating
```bash
# Check if RSS Bridge is accessible
curl http://localhost:3001
# Restart both services in order
docker compose restart rss-bridge
sleep 10
docker compose restart feed-master
```
### Configuration issues
```bash
# Regenerate configuration
python -m python_app.generate_feed_config_simple
# Validate the YAML
cat feed-master-config/fm.yml
# Restart feed-master
docker compose restart feed-master
```
### View feed-master logs
```bash
docker compose logs -f feed-master | grep -E "(ERROR|WARN|youtube)"
```
## Integration Notes
- **Single Source of Truth**: All channel URLs come from `channels.yml` and `urls.txt` in this repo
- **Automatic Regeneration**: Run `python3 -m python_app.generate_feed_config_simple` when `channels.yml` changes
- **No Manual Editing**: Don't edit `fm.yml` directly - regenerate it from the script
- **Handle Support**: Supports both `/channel/ID` and `/@handle` URL formats
- **Shared Channels**: Same channels used for transcript indexing (TLC Search) and RSS aggregation (Feed Master)
- **Skip Broken RSS**: Set `rss: false` in `channels.yml` to exclude a channel from RSS aggregation
## Future Enhancements
- [ ] Automated config regeneration on git pull
- [ ] Channel name lookup from YouTube API
- [ ] Integration with TLC Search for unified UI
- [ ] Webhook notifications for new videos
- [ ] OPML export for other RSS readers

View File

@@ -85,3 +85,34 @@ Visit <http://localhost:8080/> and youll see a barebones UI that:
Feel free to expand on this scaffold—add proper logging, schedule transcript Feel free to expand on this scaffold—add proper logging, schedule transcript
updates, or flesh out the UI—once youre happy with the baseline behaviour. updates, or flesh out the UI—once youre happy with the baseline behaviour.
## Run with Docker Compose (App Only; Remote ES/Qdrant)
The provided compose file builds/runs only the Flask app and expects **remote** Elasticsearch/Qdrant endpoints. Supply them via environment variables (directly or a `.env` alongside `docker-compose.yml`):
```bash
ELASTIC_URL=https://your-es-host:9200 \
QDRANT_URL=https://your-qdrant-host:6333 \
docker compose up --build
```
Other tunables (defaults shown in compose):
- `ELASTIC_INDEX` (default `this_little_corner_py`)
- `ELASTIC_USERNAME` / `ELASTIC_PASSWORD` or `ELASTIC_API_KEY`
- `ELASTIC_VERIFY_CERTS` (set to `1` for real TLS verification)
- `QDRANT_COLLECTION` (default `tlc-captions-full`)
- `QDRANT_VECTOR_NAME` / `QDRANT_VECTOR_SIZE` / `QDRANT_EMBED_MODEL`
- `RATE_LIMIT_ENABLED` (default `1`)
- `RATE_LIMIT_REQUESTS` (default `60`)
- `RATE_LIMIT_WINDOW_SECONDS` (default `60`)
Port 8080 on the host is forwarded to the app. Mount `./data` (read-only) if you want local fallbacks for metrics (`LOCAL_DATA_DIR=/app/data/video_metadata`); otherwise the app will rely purely on the remote backends. Stop the container with `docker compose down`.
## CI (Docker build)
A Gitea Actions workflow (`.gitea/workflows/docker-build.yml`) builds and pushes the Docker image on every push to `master`. Configure the following repository secrets in Gitea:
- `DOCKER_USERNAME`
- `DOCKER_PASSWORD`
The image is tagged as `gitea.ghost.tel/knight/tlc-search:latest` and with the commit SHA. Adjust `IMAGE_NAME` in the workflow if you need a different registry/repo.

162
channel_config.py Normal file
View File

@@ -0,0 +1,162 @@
from __future__ import annotations
import json
import re
from pathlib import Path
from typing import Any, Dict, List, Optional
_CHANNEL_ID_PATTERN = re.compile(r"(?:https?://)?(?:www\.)?youtube\.com/channel/([^/?#]+)")
_HANDLE_PATTERN = re.compile(r"(?:https?://)?(?:www\.)?youtube\.com/@([^/?#]+)")
def _strip_quotes(value: str) -> str:
if len(value) >= 2 and value[0] == value[-1] and value[0] in {"'", '"'}:
return value[1:-1]
return value
def _parse_yaml_channels(text: str) -> List[Dict[str, str]]:
channels: List[Dict[str, str]] = []
current: Dict[str, str] = {}
for raw_line in text.splitlines():
line = raw_line.strip()
if not line or line.startswith("#"):
continue
if line == "channels:":
continue
if line.startswith("- "):
if current:
channels.append(current)
current = {}
line = line[2:].strip()
if not line:
continue
if ":" not in line:
continue
key, value = line.split(":", 1)
current[key.strip()] = _strip_quotes(value.strip())
if current:
channels.append(current)
return channels
def _extract_from_url(url: str) -> Dict[str, Optional[str]]:
channel_id = None
handle = None
channel_match = _CHANNEL_ID_PATTERN.search(url)
if channel_match:
channel_id = channel_match.group(1)
handle_match = _HANDLE_PATTERN.search(url)
if handle_match:
handle = handle_match.group(1)
return {"id": channel_id, "handle": handle}
def _normalize_handle(handle: Optional[str]) -> Optional[str]:
if not handle:
return None
return handle.lstrip("@").strip() or None
def _parse_bool(value: Optional[object]) -> Optional[bool]:
if isinstance(value, bool):
return value
if value is None:
return None
text = str(value).strip().lower()
if text in {"1", "true", "yes", "y"}:
return True
if text in {"0", "false", "no", "n"}:
return False
return None
def _normalize_entry(entry: Dict[str, Any]) -> Optional[Dict[str, Any]]:
channel_id = entry.get("id") or entry.get("channel_id")
handle = _normalize_handle(entry.get("handle") or entry.get("username"))
url = entry.get("url")
name = entry.get("name")
rss_flag = _parse_bool(
entry.get("rss_enabled") or entry.get("rss") or entry.get("include_in_feed")
)
if url:
extracted = _extract_from_url(url)
channel_id = channel_id or extracted.get("id")
handle = handle or extracted.get("handle")
if not url:
if channel_id:
url = f"https://www.youtube.com/channel/{channel_id}"
elif handle:
url = f"https://www.youtube.com/@{handle}"
if not name:
name = handle or channel_id
if not name or not url:
return None
normalized = {
"id": channel_id or "",
"handle": handle or "",
"name": name,
"url": url,
"rss_enabled": True if rss_flag is None else rss_flag,
}
return normalized
def load_channel_entries(path: Path) -> List[Dict[str, str]]:
if not path.exists():
raise FileNotFoundError(path)
if path.suffix.lower() == ".json":
payload = json.loads(path.read_text(encoding="utf-8"))
if isinstance(payload, dict):
raw_entries = payload.get("channels", [])
else:
raw_entries = payload
else:
raw_entries = _parse_yaml_channels(path.read_text(encoding="utf-8"))
entries: List[Dict[str, str]] = []
for raw in raw_entries:
if not isinstance(raw, dict):
continue
raw_payload: Dict[str, Any] = {}
for key, value in raw.items():
if value is None:
continue
if isinstance(value, bool):
raw_payload[str(key).strip()] = value
else:
raw_payload[str(key).strip()] = str(value).strip()
normalized = _normalize_entry(raw_payload)
if normalized:
entries.append(normalized)
entries.sort(key=lambda item: item["name"].lower())
return entries
def build_rss_bridge_url(entry: Dict[str, str], rss_bridge_host: str = "rss-bridge") -> Optional[str]:
channel_id = entry.get("id") or ""
handle = _normalize_handle(entry.get("handle"))
if channel_id:
return (
f"http://{rss_bridge_host}/?action=display&bridge=YoutubeBridge"
f"&context=By+channel+id&c={channel_id}&format=Mrss"
)
if handle:
return (
f"http://{rss_bridge_host}/?action=display&bridge=YoutubeBridge"
f"&context=By+username&u={handle}&format=Mrss"
)
return None

271
channels.yml Normal file
View File

@@ -0,0 +1,271 @@
# Shared YouTube Channel Configuration
# Used by both TLC Search (transcript collection) and Feed Master (RSS aggregation)
channels:
- id: UCCebR16tXbv5Ykk9_WtCCug
name: Christian T. Golden
url: https://www.youtube.com/channel/UCCebR16tXbv5Ykk9_WtCCug/videos
- id: UC6vg0HkKKlgsWk-3HfV-vnw
name: A Quality Existence
url: https://www.youtube.com/channel/UC6vg0HkKKlgsWk-3HfV-vnw/videos
- id: UCeWWxwzgLYUbfjWowXhVdYw
name: Andrea with the Bangs
url: https://www.youtube.com/channel/UCeWWxwzgLYUbfjWowXhVdYw/videos
- id: UC952hDf_C4nYJdqwK7VzTxA
name: Charlie's Little Corner
url: https://www.youtube.com/channel/UC952hDf_C4nYJdqwK7VzTxA/videos
- id: UCU5SNBfTo4umhjYz6M0Jsmg
name: Christian Baxter
url: https://www.youtube.com/channel/UCU5SNBfTo4umhjYz6M0Jsmg/videos
- id: UC6Tvr9mBXNaAxLGRA_sUSRA
name: Finding Ideas
url: https://www.youtube.com/channel/UC6Tvr9mBXNaAxLGRA_sUSRA/videos
- id: UC4Rmxg7saTfwIpvq3QEzylQ
name: Ein Sof - Infinite Reflections
url: https://www.youtube.com/channel/UC4Rmxg7saTfwIpvq3QEzylQ/videos
- id: UCTdH4nh6JTcfKUAWvmnPoIQ
name: Eric Seitz
url: https://www.youtube.com/channel/UCTdH4nh6JTcfKUAWvmnPoIQ/videos
- id: UCsi_x8c12NW9FR7LL01QXKA
name: Grail Country
url: https://www.youtube.com/channel/UCsi_x8c12NW9FR7LL01QXKA/videos
- id: UCAqTQ5yLHHH44XWwWXLkvHQ
name: Grizwald Grim
url: https://www.youtube.com/channel/UCAqTQ5yLHHH44XWwWXLkvHQ/videos
- id: UCprytROeCztMOMe8plyJRMg
name: faturechi
url: https://www.youtube.com/channel/UCprytROeCztMOMe8plyJRMg/videos
- id: UCpqDUjTsof-kTNpnyWper_Q
name: John Vervaeke
url: https://www.youtube.com/channel/UCpqDUjTsof-kTNpnyWper_Q/videos
- id: UCL_f53ZEJxp8TtlOkHwMV9Q
name: Jordan B Peterson
url: https://www.youtube.com/channel/UCL_f53ZEJxp8TtlOkHwMV9Q/videos
- id: UCez1fzMRGctojfis2lfRYug
name: Lucas Vos
url: https://www.youtube.com/channel/UCez1fzMRGctojfis2lfRYug/videos
- id: UC2leFZRD0ZlQDQxpR2Zd8oA
name: Mary Kochan
url: https://www.youtube.com/channel/UC2leFZRD0ZlQDQxpR2Zd8oA/videos
- id: UC8SErJkYnDsYGh1HxoZkl-g
name: Sartori Studios
url: https://www.youtube.com/channel/UC8SErJkYnDsYGh1HxoZkl-g/videos
- id: UCEPOn4cgvrrerg_-q_Ygw1A
name: More Christ
url: https://www.youtube.com/channel/UCEPOn4cgvrrerg_-q_Ygw1A/videos
- id: UC2yCyOMUeem-cYwliC-tLJg
name: Paul Anleitner
url: https://www.youtube.com/channel/UC2yCyOMUeem-cYwliC-tLJg/videos
- id: UCGsDIP_K6J6VSTqlq-9IPlg
name: Paul VanderKlay
url: https://www.youtube.com/channel/UCGsDIP_K6J6VSTqlq-9IPlg/videos
- id: UCEzWTLDYmL8soRdQec9Fsjw
name: Randos United
url: https://www.youtube.com/channel/UCEzWTLDYmL8soRdQec9Fsjw/videos
- id: UC1KgNsMdRoIA_njVmaDdHgA
name: Randos United 2
url: https://www.youtube.com/channel/UC1KgNsMdRoIA_njVmaDdHgA/videos
- id: UCFQ6Gptuq-sLflbJ4YY3Umw
name: Rebel Wisdom
url: https://www.youtube.com/channel/UCFQ6Gptuq-sLflbJ4YY3Umw/videos
- id: UCEY1vGNBPsC3dCatZyK3Jkw
name: Strange Theology
url: https://www.youtube.com/channel/UCEY1vGNBPsC3dCatZyK3Jkw/videos
- id: UCIAtCuzdvgNJvSYILnHtdWA
name: The Anadromist
url: https://www.youtube.com/channel/UCIAtCuzdvgNJvSYILnHtdWA/videos
- id: UClIDP7_Kzv_7tDQjTv9EhrA
name: The Chris Show
url: https://www.youtube.com/channel/UClIDP7_Kzv_7tDQjTv9EhrA/videos
- id: UC-QiBn6GsM3JZJAeAQpaGAA
name: TheCommonToad
url: https://www.youtube.com/channel/UC-QiBn6GsM3JZJAeAQpaGAA/videos
- id: UCiJmdXTb76i8eIPXdJyf8ZQ
name: Bridges of Meaning Hub
url: https://www.youtube.com/channel/UCiJmdXTb76i8eIPXdJyf8ZQ/videos
- id: UCM9Z05vuQhMEwsV03u6DrLA
name: Cassidy van der Kamp
url: https://www.youtube.com/channel/UCM9Z05vuQhMEwsV03u6DrLA/videos
- id: UCgp_r6WlBwDSJrP43Mz07GQ
name: The Meaning Code
url: https://www.youtube.com/channel/UCgp_r6WlBwDSJrP43Mz07GQ/videos
- id: UC5uv-BxzCrN93B_5qbOdRWw
name: TheScrollersPodcast
url: https://www.youtube.com/channel/UC5uv-BxzCrN93B_5qbOdRWw/videos
- id: UCtCTSf3UwRU14nYWr_xm-dQ
name: Jonathan Pageau
url: https://www.youtube.com/channel/UCtCTSf3UwRU14nYWr_xm-dQ/videos
- id: UC1a4VtU_SMSfdRiwMJR33YQ
name: The Young Levite
url: https://www.youtube.com/channel/UC1a4VtU_SMSfdRiwMJR33YQ/videos
- id: UCg7Ed0lecvko58ibuX1XHng
name: Transfigured
url: https://www.youtube.com/channel/UCg7Ed0lecvko58ibuX1XHng/videos
- id: UCMVG5eqpYFVEB-a9IqAOuHA
name: President Foxman
url: https://www.youtube.com/channel/UCMVG5eqpYFVEB-a9IqAOuHA/videos
- id: UC8mJqpS_EBbMcyuzZDF0TEw
name: Neal Daedalus
url: https://www.youtube.com/channel/UC8mJqpS_EBbMcyuzZDF0TEw/videos
- id: UCGHuURJ1XFHzPSeokf6510A
name: Aphrael Pilotson
url: https://www.youtube.com/channel/UCGHuURJ1XFHzPSeokf6510A/videos
- id: UC704NVL2DyzYg3rMU9r1f7A
handle: chrishoward8473
name: Chris Howard
url: https://www.youtube.com/@chrishoward8473/videos
- id: UChptV-kf8lnncGh7DA2m8Pw
name: Shoulder Serf
url: https://www.youtube.com/channel/UChptV-kf8lnncGh7DA2m8Pw/videos
- id: UCzX6R3ZLQh5Zma_5AsPcqPA
name: Restoring Meaning
url: https://www.youtube.com/channel/UCzX6R3ZLQh5Zma_5AsPcqPA/videos
- id: UCiukuaNd_qzRDTW9qe2OC1w
name: Kale Zelden
url: https://www.youtube.com/channel/UCiukuaNd_qzRDTW9qe2OC1w/videos
- id: UC5yLuFQCms4nb9K2bGQLqIw
name: Ron Copperman
url: https://www.youtube.com/channel/UC5yLuFQCms4nb9K2bGQLqIw/videos
- id: UCVdSgEf9bLXFMBGSMhn7x4Q
name: Mark D Parker
url: https://www.youtube.com/channel/UCVdSgEf9bLXFMBGSMhn7x4Q/videos
- id: UC_dnk5D4tFCRYCrKIcQlcfw
name: Luke Thompson
url: https://www.youtube.com/channel/UC_dnk5D4tFCRYCrKIcQlcfw/videos
- id: UCT8Lq3ufaGEnCSS8WpFatqw
handle: Freerilian
name: Free Rilian
url: https://www.youtube.com/@Freerilian/videos
- id: UC977g6oGYIJDQnsZOGjQBBA
handle: marks.-ry7bm
name: Mark S
url: https://www.youtube.com/@marks.-ry7bm/videos
- id: UCbD1Pm0TOcRK2zaCrwgcTTg
handle: Adams-Fall
name: Adams Fall
url: https://www.youtube.com/@Adams-Fall/videos
- id: UCnojyPW0IgLWTQ0SaDQ1KBA
handle: mcmosav
name: mcmosav
url: https://www.youtube.com/@mcmosav/videos
- id: UCiOZYvBGHw1Y6wyzffwEp9g
handle: Landbeorht
name: Joseph Lambrecht
url: https://www.youtube.com/@Landbeorht/videos
- id: UCAXyF_HFeMgwS8nkGVeroAA
handle: Corner_Citizen
name: Corner Citizen
url: https://www.youtube.com/@Corner_Citizen/videos
- id: UCv2Qft5mZrmA9XAwnl9PU-g
handle: ethan.caughey
name: Ethan Caughey
url: https://www.youtube.com/@ethan.caughey/videos
- id: UCMJCtS8jKouJ2d8UIYzW3vg
handle: MarcInTbilisi
name: Marc Jackson
url: https://www.youtube.com/@MarcInTbilisi/videos
- id: UCk9O91WwruXmgu1NQrKZZEw
handle: climbingmt.sophia
name: Climbing Mt Sophia
url: https://www.youtube.com/@climbingmt.sophia/videos
- id: UCUSyTPWW4JaG1YfUPddw47Q
handle: Skankenstein
name: Skankenstein
url: https://www.youtube.com/@Skankenstein/videos
- id: UCzw2FNI3IRphcAoVcUENOgQ
handle: UpCycleClub
name: UpCycleClub
url: https://www.youtube.com/@UpCycleClub/videos
- id: UCQ7rVoApmYIpcmU7fB9RPyw
handle: JessPurviance
name: Jesspurviance
url: https://www.youtube.com/@JessPurviance/videos
- id: UCrZyTWGMdRM9_P26RKPvh3A
handle: greyhamilton52
name: Grey Hamilton
url: https://www.youtube.com/@greyhamilton52/videos
- id: UCDCfI162vhPvwdxW6X4nmiw
handle: paulrenenichols
name: Paul Rene Nichols
url: https://www.youtube.com/@paulrenenichols/videos
- id: UCFLovlJ8RFApfjrf2y157xg
handle: OfficialSecularKoranism
name: Secular Koranism
url: https://www.youtube.com/@OfficialSecularKoranism/videos
- id: UC_-YQbnPfBbIezMr1adZZiQ
handle: FromWhomAllBlessingsFlow
name: From Whom All Blessings Flow
url: https://www.youtube.com/@FromWhomAllBlessingsFlow/videos
- id: UCn5mf-fcpBmkepIpZ8eFRng
handle: FoodTruckEmily
name: Emily Rajeh
url: https://www.youtube.com/@FoodTruckEmily/videos
- id: UC6zHDj4D323xJkblnPTvY3Q
handle: O.G.Rose.Michelle.and.Daniel
name: OG Rose
url: https://www.youtube.com/@O.G.Rose.Michelle.and.Daniel/videos
- id: UC4GiA5Hnwy415uVRymxPK-w
handle: JonathanDumeer
name: Jonathan Dumeer
url: https://www.youtube.com/@JonathanDumeer/videos
- id: UCMzT-mdCqoyEv_-YZVtE7MQ
handle: JordanGreenhall
name: Jordan Hall
url: https://www.youtube.com/@JordanGreenhall/videos
- id: UC5goUoFM4LPim4eY4pwRXYw
handle: NechamaGluck
name: Nechama Gluck
url: https://www.youtube.com/@NechamaGluck/videos
- id: UCPUVeoQYyq8cndWwyczX6RA
handle: justinsmorningcoffee
name: Justinsmorningcoffee
url: https://www.youtube.com/@justinsmorningcoffee/videos
- id: UCB0C8DEIQlQzvSGuGriBxtA
handle: grahampardun
name: Grahampardun
url: https://www.youtube.com/@grahampardun/videos
- id: UCpLJJLVB_7v4Igq-9arja1A
handle: michaelmartin8681
name: Michaelmartin8681
url: https://www.youtube.com/@michaelmartin8681/videos
- id: UCxV18lwwh29DiWuooz7UCvg
handle: davidbusuttil9086
name: Davidbusuttil9086
url: https://www.youtube.com/@davidbusuttil9086/videos
- id: UCosBhpwwGh_ueYq4ZSi5dGw
handle: matthewparlato5626
name: Matthewparlato5626
url: https://www.youtube.com/@matthewparlato5626/videos
- id: UCwF5LWNOFou_50bT65bq4Bg
handle: lancecleaver227
name: Lancecleaver227
url: https://www.youtube.com/@lancecleaver227/videos
- id: UCaJ0CqiiMSTq4X0rycUOIjw
handle: theplebistocrat
name: the plebistocrat
url: https://www.youtube.com/@theplebistocrat/videos
- id: UCWehDXDEdUpB58P7-Bg1cHg
handle: rigelwindsongthurston
name: Rigel Windsong Thurston
url: https://www.youtube.com/@rigelwindsongthurston/videos
- id: UCZA5mUAyYcCL1kYgxbeMNrA
handle: RightInChrist
name: Rightinchrist
url: https://www.youtube.com/@RightInChrist/videos
- id: UCDIPXp88qjAV3TiaR5Uo3iQ
handle: RafeKelley
name: Rafekelley
url: https://www.youtube.com/@RafeKelley/videos
- id: UCedgru6YCto3zyXjlbuQuqA
handle: WavesOfObsession
name: Wavesofobsession
url: https://www.youtube.com/@WavesOfObsession/videos
- handle: LeviathanForPlay
name: LeviathanForPlay
url: https://www.youtube.com/@LeviathanForPlay/videos
- id: UCehAungJpAeC-F3R5FwvvCQ
name: Wholly Unfocused
url: https://www.youtube.com/channel/UCehAungJpAeC-F3R5FwvvCQ/videos
- id: UC4YwC5zA9S_2EwthE27Xlew
name: CMA
url: https://www.youtube.com/channel/UC4YwC5zA9S_2EwthE27Xlew/videos

View File

@@ -6,7 +6,13 @@ Environment Variables:
ELASTIC_USERNAME / ELASTIC_PASSWORD: Optional basic auth credentials. ELASTIC_USERNAME / ELASTIC_PASSWORD: Optional basic auth credentials.
ELASTIC_INDEX: Target index name (default: this_little_corner_py). ELASTIC_INDEX: Target index name (default: this_little_corner_py).
LOCAL_DATA_DIR: Root folder containing JSON metadata (default: ../data/video_metadata). LOCAL_DATA_DIR: Root folder containing JSON metadata (default: ../data/video_metadata).
CHANNELS_PATH: Path to the canonical channel list (default: ./channels.yml).
RSS_FEED_URL: Public URL/path for the unified RSS feed (default: /rss/youtube-unified).
RSS_FEED_UPSTREAM: Base URL to proxy feed requests (default: http://localhost:8097).
YOUTUBE_API_KEY: Optional API key for pulling metadata directly from YouTube. YOUTUBE_API_KEY: Optional API key for pulling metadata directly from YouTube.
RATE_LIMIT_ENABLED: Toggle API rate limiting (default: 1).
RATE_LIMIT_REQUESTS: Max requests per window per client (default: 60).
RATE_LIMIT_WINDOW_SECONDS: Window size in seconds (default: 60).
""" """
from __future__ import annotations from __future__ import annotations
@@ -16,6 +22,20 @@ from dataclasses import dataclass
from pathlib import Path from pathlib import Path
from typing import Optional from typing import Optional
# Load .env file if it exists
try:
from dotenv import load_dotenv
import logging
_logger = logging.getLogger(__name__)
_env_path = Path(__file__).parent / ".env"
if _env_path.exists():
_logger.info("Loading .env from: %s", _env_path)
result = load_dotenv(_env_path, override=True)
_logger.info("load_dotenv result: %s", result)
except ImportError:
pass # python-dotenv not installed
@dataclass(frozen=True) @dataclass(frozen=True)
class ElasticSettings: class ElasticSettings:
@@ -39,11 +59,27 @@ class YoutubeSettings:
api_key: Optional[str] api_key: Optional[str]
@dataclass(frozen=True)
class RateLimitSettings:
enabled: bool
requests: int
window_seconds: int
@dataclass(frozen=True) @dataclass(frozen=True)
class AppConfig: class AppConfig:
elastic: ElasticSettings elastic: ElasticSettings
data: DataSettings data: DataSettings
youtube: YoutubeSettings youtube: YoutubeSettings
rate_limit: RateLimitSettings
qdrant_url: str
qdrant_collection: str
qdrant_vector_name: Optional[str]
qdrant_vector_size: int
qdrant_embed_model: str
channels_path: Path
rss_feed_url: str
rss_feed_upstream: str
def _env(name: str, default: Optional[str] = None) -> Optional[str]: def _env(name: str, default: Optional[str] = None) -> Optional[str]:
@@ -75,7 +111,30 @@ def load_config() -> AppConfig:
) )
data = DataSettings(root=data_root) data = DataSettings(root=data_root)
youtube = YoutubeSettings(api_key=_env("YOUTUBE_API_KEY")) youtube = YoutubeSettings(api_key=_env("YOUTUBE_API_KEY"))
return AppConfig(elastic=elastic, data=data, youtube=youtube) rate_limit = RateLimitSettings(
enabled=_env("RATE_LIMIT_ENABLED", "1") in {"1", "true", "True"},
requests=max(int(_env("RATE_LIMIT_REQUESTS", "60")), 0),
window_seconds=max(int(_env("RATE_LIMIT_WINDOW_SECONDS", "60")), 1),
)
channels_path = Path(
_env("CHANNELS_PATH", str(Path(__file__).parent / "channels.yml"))
).expanduser()
rss_feed_url = _env("RSS_FEED_URL", "/rss/youtube-unified")
rss_feed_upstream = _env("RSS_FEED_UPSTREAM", "http://localhost:8097")
return AppConfig(
elastic=elastic,
data=data,
youtube=youtube,
rate_limit=rate_limit,
qdrant_url=_env("QDRANT_URL", "http://localhost:6333"),
qdrant_collection=_env("QDRANT_COLLECTION", "tlc_embeddings"),
qdrant_vector_name=_env("QDRANT_VECTOR_NAME"),
qdrant_vector_size=int(_env("QDRANT_VECTOR_SIZE", "1024")),
qdrant_embed_model=_env("QDRANT_EMBED_MODEL", "BAAI/bge-large-en-v1.5"),
channels_path=channels_path,
rss_feed_url=rss_feed_url or "",
rss_feed_upstream=rss_feed_upstream or "",
)
CONFIG = load_config() CONFIG = load_config()

69
docker-compose.yml Normal file
View File

@@ -0,0 +1,69 @@
version: "3.9"
# TLC Search + Feed Master - Complete YouTube content indexing & RSS aggregation
# Provide ELASTIC_URL / QDRANT_URL (and related) via environment or a .env file.
services:
# RSS Bridge - Converts YouTube channels to RSS feeds
rss-bridge:
image: rssbridge/rss-bridge:latest
container_name: tlc-rss-bridge
hostname: rss-bridge
restart: unless-stopped
logging:
driver: json-file
options:
max-size: "10m"
max-file: "5"
ports:
- "3001:80"
# Feed Master - Aggregates multiple RSS feeds into unified feed
feed-master:
image: umputun/feed-master:latest
container_name: tlc-feed-master
hostname: feed-master
restart: unless-stopped
depends_on:
- rss-bridge
logging:
driver: json-file
options:
max-size: "10m"
max-file: "5"
environment:
- DEBUG=false
- FM_DB=/srv/var/feed-master.bdb
- FM_CONF=/srv/etc/fm.yml
volumes:
- ./feed-master-config:/srv/etc
- ./feed-master-config/var:/srv/var
- ./feed-master-config/images:/srv/images
ports:
- "8097:8080"
# TLC Search - Flask app for searching YouTube transcripts
app:
build:
context: .
dockerfile: Dockerfile
ports:
- "8080:8080"
environment:
ELASTIC_URL: ${ELASTIC_URL:?set ELASTIC_URL to your remote Elasticsearch URL}
ELASTIC_INDEX: ${ELASTIC_INDEX:-this_little_corner_py}
ELASTIC_USERNAME: ${ELASTIC_USERNAME:-}
ELASTIC_PASSWORD: ${ELASTIC_PASSWORD:-}
ELASTIC_API_KEY: ${ELASTIC_API_KEY:-}
ELASTIC_VERIFY_CERTS: ${ELASTIC_VERIFY_CERTS:-0}
CHANNELS_PATH: ${CHANNELS_PATH:-/app/python_app/channels.yml}
RSS_FEED_URL: ${RSS_FEED_URL:-/rss/youtube-unified}
RSS_FEED_UPSTREAM: ${RSS_FEED_UPSTREAM:-http://feed-master:8080}
QDRANT_URL: ${QDRANT_URL:?set QDRANT_URL to your remote Qdrant URL}
QDRANT_COLLECTION: ${QDRANT_COLLECTION:-tlc-captions-full}
QDRANT_VECTOR_NAME: ${QDRANT_VECTOR_NAME:-}
QDRANT_VECTOR_SIZE: ${QDRANT_VECTOR_SIZE:-1024}
QDRANT_EMBED_MODEL: ${QDRANT_EMBED_MODEL:-BAAI/bge-large-en-v1.5}
LOCAL_DATA_DIR: ${LOCAL_DATA_DIR:-/app/data/video_metadata}
volumes:
- ./channels.yml:/app/python_app/channels.yml:ro
- ./data:/app/data:ro

168
feed-master-config/fm.yml Normal file
View File

@@ -0,0 +1,168 @@
# Feed Master Configuration
# Auto-generated from channels.yml
# Do not edit manually - regenerate using generate_feed_config_simple.py
feeds:
youtube-unified:
title: YouTube Unified Feed
description: Aggregated feed from all YouTube channels
link: https://youtube.com
language: "en-us"
sources:
- name: A Quality Existence
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC6vg0HkKKlgsWk-3HfV-vnw&format=Mrss
- name: Adams Fall
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCbD1Pm0TOcRK2zaCrwgcTTg&format=Mrss
- name: Andrea with the Bangs
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCeWWxwzgLYUbfjWowXhVdYw&format=Mrss
- name: Aphrael Pilotson
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCGHuURJ1XFHzPSeokf6510A&format=Mrss
- name: Cassidy van der Kamp
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCM9Z05vuQhMEwsV03u6DrLA&format=Mrss
- name: Channel UCCebR16tXbv
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCCebR16tXbv5Ykk9_WtCCug&format=Mrss
- name: Channel UCiJmdXTb76i
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCiJmdXTb76i8eIPXdJyf8ZQ&format=Mrss
- name: Charlie's Little Corner
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC952hDf_C4nYJdqwK7VzTxA&format=Mrss
- name: Chris Howard
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC704NVL2DyzYg3rMU9r1f7A&format=Mrss
- name: Christian Baxter
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCU5SNBfTo4umhjYz6M0Jsmg&format=Mrss
- name: Climbing Mt Sophia
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCk9O91WwruXmgu1NQrKZZEw&format=Mrss
- name: Corner Citizen
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCAXyF_HFeMgwS8nkGVeroAA&format=Mrss
- name: Davidbusuttil9086
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCxV18lwwh29DiWuooz7UCvg&format=Mrss
- name: Ein Sof - Infinite Reflections
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC4Rmxg7saTfwIpvq3QEzylQ&format=Mrss
- name: Emily Rajeh
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCn5mf-fcpBmkepIpZ8eFRng&format=Mrss
- name: Eric Seitz
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCTdH4nh6JTcfKUAWvmnPoIQ&format=Mrss
- name: Ethan Caughey
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCv2Qft5mZrmA9XAwnl9PU-g&format=Mrss
- name: faturechi
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCprytROeCztMOMe8plyJRMg&format=Mrss
- name: Finding Ideas
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC6Tvr9mBXNaAxLGRA_sUSRA&format=Mrss
- name: Free Rilian
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCT8Lq3ufaGEnCSS8WpFatqw&format=Mrss
- name: From Whom All Blessings Flow
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC_-YQbnPfBbIezMr1adZZiQ&format=Mrss
- name: Grahampardun
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCB0C8DEIQlQzvSGuGriBxtA&format=Mrss
- name: Grail Country
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCsi_x8c12NW9FR7LL01QXKA&format=Mrss
- name: Grey Hamilton
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCrZyTWGMdRM9_P26RKPvh3A&format=Mrss
- name: Grizwald Grim
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCAqTQ5yLHHH44XWwWXLkvHQ&format=Mrss
- name: Jesspurviance
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCQ7rVoApmYIpcmU7fB9RPyw&format=Mrss
- name: John Vervaeke
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCpqDUjTsof-kTNpnyWper_Q&format=Mrss
- name: Jonathan Dumeer
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC4GiA5Hnwy415uVRymxPK-w&format=Mrss
- name: Jonathan Pageau
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCtCTSf3UwRU14nYWr_xm-dQ&format=Mrss
- name: Jordan B Peterson
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCL_f53ZEJxp8TtlOkHwMV9Q&format=Mrss
- name: Jordan Hall
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCMzT-mdCqoyEv_-YZVtE7MQ&format=Mrss
- name: Joseph Lambrecht
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCiOZYvBGHw1Y6wyzffwEp9g&format=Mrss
- name: Justinsmorningcoffee
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCPUVeoQYyq8cndWwyczX6RA&format=Mrss
- name: Kale Zelden
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCiukuaNd_qzRDTW9qe2OC1w&format=Mrss
- name: Lancecleaver227
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCwF5LWNOFou_50bT65bq4Bg&format=Mrss
- name: Lucas Vos
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCez1fzMRGctojfis2lfRYug&format=Mrss
- name: Luke Thompson
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC_dnk5D4tFCRYCrKIcQlcfw&format=Mrss
- name: Marc Jackson
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCMJCtS8jKouJ2d8UIYzW3vg&format=Mrss
- name: Mark D Parker
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCVdSgEf9bLXFMBGSMhn7x4Q&format=Mrss
- name: Mark S
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC977g6oGYIJDQnsZOGjQBBA&format=Mrss
- name: Mary Kochan
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC2leFZRD0ZlQDQxpR2Zd8oA&format=Mrss
- name: Matthewparlato5626
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCosBhpwwGh_ueYq4ZSi5dGw&format=Mrss
- name: mcmosav
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCnojyPW0IgLWTQ0SaDQ1KBA&format=Mrss
- name: Michaelmartin8681
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCpLJJLVB_7v4Igq-9arja1A&format=Mrss
- name: More Christ
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCEPOn4cgvrrerg_-q_Ygw1A&format=Mrss
- name: Neal Daedalus
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC8mJqpS_EBbMcyuzZDF0TEw&format=Mrss
- name: Nechama Gluck
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC5goUoFM4LPim4eY4pwRXYw&format=Mrss
- name: OG Rose
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC6zHDj4D323xJkblnPTvY3Q&format=Mrss
- name: Paul Anleitner
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC2yCyOMUeem-cYwliC-tLJg&format=Mrss
- name: Paul Rene Nichols
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCDCfI162vhPvwdxW6X4nmiw&format=Mrss
- name: Paul VanderKlay
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCGsDIP_K6J6VSTqlq-9IPlg&format=Mrss
- name: President Foxman
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCMVG5eqpYFVEB-a9IqAOuHA&format=Mrss
- name: Rafekelley
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCDIPXp88qjAV3TiaR5Uo3iQ&format=Mrss
- name: Randos United
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCEzWTLDYmL8soRdQec9Fsjw&format=Mrss
- name: Randos United 2
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC1KgNsMdRoIA_njVmaDdHgA&format=Mrss
- name: Rebel Wisdom
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCFQ6Gptuq-sLflbJ4YY3Umw&format=Mrss
- name: Restoring Meaning
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCzX6R3ZLQh5Zma_5AsPcqPA&format=Mrss
- name: Rigel Windsong Thurston
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCWehDXDEdUpB58P7-Bg1cHg&format=Mrss
- name: Rightinchrist
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCZA5mUAyYcCL1kYgxbeMNrA&format=Mrss
- name: Ron Copperman
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC5yLuFQCms4nb9K2bGQLqIw&format=Mrss
- name: Sartori Studios
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC8SErJkYnDsYGh1HxoZkl-g&format=Mrss
- name: Secular Koranism
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCFLovlJ8RFApfjrf2y157xg&format=Mrss
- name: Shoulder Serf
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UChptV-kf8lnncGh7DA2m8Pw&format=Mrss
- name: Skankenstein
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCUSyTPWW4JaG1YfUPddw47Q&format=Mrss
- name: Strange Theology
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCEY1vGNBPsC3dCatZyK3Jkw&format=Mrss
- name: The Anadromist
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCIAtCuzdvgNJvSYILnHtdWA&format=Mrss
- name: The Chris Show
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UClIDP7_Kzv_7tDQjTv9EhrA&format=Mrss
- name: The Meaning Code
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCgp_r6WlBwDSJrP43Mz07GQ&format=Mrss
- name: the plebistocrat
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCaJ0CqiiMSTq4X0rycUOIjw&format=Mrss
- name: The Young Levite
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC1a4VtU_SMSfdRiwMJR33YQ&format=Mrss
- name: TheCommonToad
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC-QiBn6GsM3JZJAeAQpaGAA&format=Mrss
- name: TheScrollersPodcast
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UC5uv-BxzCrN93B_5qbOdRWw&format=Mrss
- name: Transfigured
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCg7Ed0lecvko58ibuX1XHng&format=Mrss
- name: UpCycleClub
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCzw2FNI3IRphcAoVcUENOgQ&format=Mrss
- name: Wavesofobsession
url: http://rss-bridge/?action=display&bridge=YoutubeBridge&context=By+channel+id&c=UCedgru6YCto3zyXjlbuQuqA&format=Mrss
system:
update: 5m
max_per_feed: 5
max_total: 200
max_keep: 1000
base_url: http://localhost:8097

91
generate_feed_config.py Normal file
View File

@@ -0,0 +1,91 @@
#!/usr/bin/env python3
"""
Generate feed-master configuration from channels.yml.
This ensures a single source of truth for the YouTube channels.
"""
import sys
from pathlib import Path
from .channel_config import build_rss_bridge_url, load_channel_entries
def generate_fm_config(channels_file, output_file, rss_bridge_host="rss-bridge"):
"""Generate feed-master YAML configuration from channels.yml"""
print(f"Reading channels from {channels_file}")
channels = load_channel_entries(Path(channels_file))
print(f"Found {len(channels)} channels")
# Generate feed configuration
config = []
config.append("# Feed Master Configuration")
config.append("# Auto-generated from channels.yml")
config.append("# Do not edit manually - regenerate using generate_feed_config.py")
config.append("")
config.append("feeds:")
config.append(" youtube-unified:")
config.append(" title: YouTube Unified Feed")
config.append(" description: Aggregated feed from all YouTube channels")
config.append(" link: https://youtube.com")
config.append(' language: "en-us"')
config.append(" sources:")
processed = 0
skipped = 0
for channel in channels:
if not channel.get("rss_enabled", True):
skipped += 1
continue
bridge_url = build_rss_bridge_url(channel, rss_bridge_host=rss_bridge_host)
if not bridge_url:
skipped += 1
continue
name = channel.get("name", "Unknown")
config.append(f" - name: {name}")
config.append(f" url: {bridge_url}")
processed += 1
# Add system configuration
config.append("")
config.append("system:")
config.append(" update: 5m")
config.append(" max_per_feed: 5")
config.append(" max_total: 200")
config.append(" max_keep: 1000")
config.append(" base_url: http://localhost:8097")
# Write output
print(f"\nProcessed {processed} channels, skipped {skipped}")
with open(output_file, 'w') as f:
f.write('\n'.join(config))
print(f"Configuration written to {output_file}")
print(f"\nTo apply this configuration:")
print(f" 1. Copy {output_file} to feed-master/etc/fm.yml")
print(f" 2. Restart the feed-master service")
if __name__ == "__main__":
# Default paths
script_dir = Path(__file__).parent
channels_file = script_dir / "channels.yml"
output_file = script_dir / "feed-master-config" / "fm.yml"
# Allow overriding via command line
if len(sys.argv) > 1:
channels_file = Path(sys.argv[1])
if len(sys.argv) > 2:
output_file = Path(sys.argv[2])
if not channels_file.exists():
print(f"Error: {channels_file} not found", file=sys.stderr)
print(f"\nUsage: {sys.argv[0]} [channels.yml] [output.yml]", file=sys.stderr)
sys.exit(1)
# Ensure output directory exists
output_file.parent.mkdir(parents=True, exist_ok=True)
generate_fm_config(channels_file, output_file)

88
generate_feed_config_simple.py Executable file
View File

@@ -0,0 +1,88 @@
#!/usr/bin/env python3
"""
Generate feed-master configuration from channels.yml.
Simplified version that doesn't require RSS-Bridge to be running.
"""
import sys
from pathlib import Path
from .channel_config import build_rss_bridge_url, load_channel_entries
def generate_fm_config(channels_file, output_file, rss_bridge_host="rss-bridge"):
"""Generate feed-master YAML configuration from channels.yml"""
print(f"Reading channels from {channels_file}")
channels = load_channel_entries(Path(channels_file))
print(f"Found {len(channels)} channels")
# Generate feed configuration
config = []
config.append("# Feed Master Configuration")
config.append("# Auto-generated from channels.yml")
config.append("# Do not edit manually - regenerate using generate_feed_config_simple.py")
config.append("")
config.append("feeds:")
config.append(" youtube-unified:")
config.append(" title: YouTube Unified Feed")
config.append(" description: Aggregated feed from all YouTube channels")
config.append(" link: https://youtube.com")
config.append(' language: "en-us"')
config.append(" sources:")
processed = 0
skipped = 0
for channel in channels:
if not channel.get("rss_enabled", True):
skipped += 1
continue
bridge_url = build_rss_bridge_url(channel, rss_bridge_host=rss_bridge_host)
if not bridge_url:
skipped += 1
continue
name = channel.get("name", "Unknown")
config.append(f" - name: {name}")
config.append(f" url: {bridge_url}")
processed += 1
# Add system configuration
config.append("")
config.append("system:")
config.append(" update: 5m")
config.append(" max_per_feed: 5")
config.append(" max_total: 200")
config.append(" max_keep: 1000")
config.append(" base_url: http://localhost:8097")
# Write output
print(f"\nProcessed {processed} channels, skipped {skipped}")
with open(output_file, 'w') as f:
f.write('\n'.join(config))
print(f"Configuration written to {output_file}")
if __name__ == "__main__":
# Default paths
script_dir = Path(__file__).parent
channels_file = script_dir / "channels.yml"
output_file = script_dir / "feed-master-config" / "fm.yml"
# Allow overriding via command line
if len(sys.argv) > 1:
channels_file = Path(sys.argv[1])
if len(sys.argv) > 2:
output_file = Path(sys.argv[2])
if not channels_file.exists():
print(f"Error: {channels_file} not found", file=sys.stderr)
print(f"\nUsage: {sys.argv[0]} [channels.yml] [output.yml]", file=sys.stderr)
sys.exit(1)
# Ensure output directory exists
output_file.parent.mkdir(parents=True, exist_ok=True)
generate_fm_config(channels_file, output_file)

View File

@@ -90,6 +90,10 @@ def build_bulk_actions(
"transcript_full": transcript_full, "transcript_full": transcript_full,
"transcript_secondary_full": doc.get("transcript_secondary_full"), "transcript_secondary_full": doc.get("transcript_secondary_full"),
"transcript_parts": parts, "transcript_parts": parts,
"internal_references": doc.get("internal_references", []),
"internal_references_count": doc.get("internal_references_count", 0),
"referenced_by": doc.get("referenced_by", []),
"referenced_by_count": doc.get("referenced_by_count", 0),
}, },
} }
@@ -121,6 +125,10 @@ def ensure_index(client: "Elasticsearch", index: str) -> None:
"text": {"type": "text"}, "text": {"type": "text"},
}, },
}, },
"internal_references": {"type": "keyword"},
"internal_references_count": {"type": "integer"},
"referenced_by": {"type": "keyword"},
"referenced_by_count": {"type": "integer"},
} }
}, },
) )

View File

@@ -2,3 +2,5 @@ Flask>=2.3
elasticsearch>=7.0.0,<9.0.0 elasticsearch>=7.0.0,<9.0.0
youtube-transcript-api>=0.6 youtube-transcript-api>=0.6
google-api-python-client>=2.0.0 google-api-python-client>=2.0.0
python-dotenv>=0.19.0
requests>=2.31.0

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

BIN
static/favicon.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.9 KiB

View File

@@ -4,6 +4,7 @@
<meta charset="utf-8" /> <meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" /> <meta name="viewport" content="width=device-width, initial-scale=1" />
<title>Term Frequency Explorer</title> <title>Term Frequency Explorer</title>
<link rel="icon" href="/static/favicon.png" type="image/png" />
<link rel="stylesheet" href="/static/style.css" /> <link rel="stylesheet" href="/static/style.css" />
<style> <style>
#chart { #chart {
@@ -65,4 +66,3 @@
<script src="/static/frequency.js"></script> <script src="/static/frequency.js"></script>
</body> </body>
</html> </html>

96
static/graph.html Normal file
View File

@@ -0,0 +1,96 @@
<!doctype html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>TLC Reference Graph</title>
<link rel="icon" href="/static/favicon.png" type="image/png" />
<link rel="stylesheet" href="https://unpkg.com/xp.css" />
<link rel="stylesheet" href="/static/style.css" />
<script src="https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js"></script>
</head>
<body>
<div class="window graph-window" style="max-width: 1100px; margin: 20px auto;">
<div class="title-bar">
<div class="title-bar-text">Reference Graph</div>
<div class="title-bar-controls">
<a class="title-bar-link" href="/">⬅ Search</a>
</div>
</div>
<div class="window-body">
<p>
Explore how videos reference each other. Enter a <code>video_id</code> to see its immediate
neighbors (referenced and referencing videos). Choose a larger depth to expand the graph.
</p>
<form id="graphForm" class="graph-controls">
<div class="field-group">
<label for="graphVideoId">Video ID</label>
<input
id="graphVideoId"
name="video_id"
type="text"
placeholder="e.g. dQw4w9WgXcQ"
required
/>
</div>
<div class="field-group">
<label for="graphDepth">Depth</label>
<select id="graphDepth" name="depth">
<option value="1">1 hop</option>
<option value="2">2 hops</option>
<option value="3">3 hops</option>
</select>
</div>
<div class="field-group">
<label for="graphMaxNodes">Max nodes</label>
<select id="graphMaxNodes" name="max_nodes">
<option value="100">100</option>
<option value="150">150</option>
<option value="200" selected>200</option>
<option value="300">300</option>
</select>
</div>
<div class="field-group">
<label class="checkbox">
<input type="checkbox" id="graphFullToggle" name="full_graph" />
Attempt entire reference graph
</label>
<p class="field-hint">
Includes every video that references another (ignores depth; may be slow). Max nodes still
applies.
</p>
</div>
<div class="field-group">
<label for="graphLabelSize">Labels</label>
<select id="graphLabelSize" name="label_size">
<option value="off">Off</option>
<option value="tiny" selected>Tiny</option>
<option value="small">Small</option>
<option value="normal">Normal</option>
<option value="medium">Medium</option>
<option value="large">Large</option>
<option value="xlarge">Extra large</option>
</select>
</div>
<button type="submit">Build graph</button>
</form>
<div id="graphStatus" class="graph-status">Enter a video ID to begin.</div>
<div id="graphContainer" class="graph-container"></div>
</div>
<div class="status-bar">
<p class="status-bar-field">Click nodes to open the video on YouTube</p>
<p class="status-bar-field">Colors represent channels</p>
</div>
</div>
<script src="/static/graph.js"></script>
</body>
</html>

842
static/graph.js Normal file
View File

@@ -0,0 +1,842 @@
(() => {
const global = window;
const GraphUI = (global.GraphUI = global.GraphUI || {});
GraphUI.ready = false;
const form = document.getElementById("graphForm");
const videoInput = document.getElementById("graphVideoId");
const depthInput = document.getElementById("graphDepth");
const maxNodesInput = document.getElementById("graphMaxNodes");
const labelSizeInput = document.getElementById("graphLabelSize");
const fullGraphToggle = document.getElementById("graphFullToggle");
const statusEl = document.getElementById("graphStatus");
const container = document.getElementById("graphContainer");
const isEmbedded =
container && container.dataset && container.dataset.embedded === "true";
if (!form || !videoInput || !depthInput || !maxNodesInput || !labelSizeInput || !container) {
console.error("Graph: required DOM elements missing.");
return;
}
const color = d3.scaleOrdinal(d3.schemeTableau10);
const colorRange = typeof color.range === "function" ? color.range() : [];
const paletteSizeDefault = colorRange.length || 10;
const PATTERN_TYPES = [
{ key: "none", legendClass: "none" },
{ key: "diag-forward", legendClass: "diag-forward" },
{ key: "diag-back", legendClass: "diag-back" },
{ key: "cross", legendClass: "cross" },
{ key: "dots", legendClass: "dots" },
];
const ADDITIONAL_PATTERNS = PATTERN_TYPES.filter((pattern) => pattern.key !== "none");
const sanitizeDepth = (value) => {
const parsed = parseInt(value, 10);
if (Number.isNaN(parsed)) return 1;
return Math.max(0, Math.min(parsed, 3));
};
const sanitizeMaxNodes = (value) => {
const parsed = parseInt(value, 10);
if (Number.isNaN(parsed)) return 200;
return Math.max(10, Math.min(parsed, 400));
};
const LABEL_SIZE_VALUES = ["off", "tiny", "small", "normal", "medium", "large", "xlarge"];
const LABEL_FONT_SIZES = {
tiny: "7px",
small: "8px",
normal: "9px",
medium: "10px",
large: "11px",
xlarge: "13px",
};
const DEFAULT_LABEL_SIZE = "tiny";
const isValidLabelSize = (value) => LABEL_SIZE_VALUES.includes(value);
const getLabelSize = () => {
if (!labelSizeInput) return DEFAULT_LABEL_SIZE;
const value = labelSizeInput.value;
return isValidLabelSize(value) ? value : DEFAULT_LABEL_SIZE;
};
function setLabelSizeInput(value) {
if (!labelSizeInput) return;
labelSizeInput.value = isValidLabelSize(value) ? value : DEFAULT_LABEL_SIZE;
}
const getChannelLabel = (node) =>
(node && (node.channel_name || node.channel_id)) || "Unknown";
function appendPatternContent(pattern, baseColor, patternKey) {
pattern.append("rect").attr("width", 8).attr("height", 8).attr("fill", baseColor);
const strokeColor = "#1f1f1f";
const strokeOpacity = 0.35;
const addForward = () => {
pattern
.append("path")
.attr("d", "M-2,6 L2,2 M0,8 L8,0 M6,10 L10,4")
.attr("stroke", strokeColor)
.attr("stroke-width", 1)
.attr("stroke-opacity", strokeOpacity)
.attr("fill", "none");
};
const addBackward = () => {
pattern
.append("path")
.attr("d", "M-2,2 L2,6 M0,0 L8,8 M6,-2 L10,2")
.attr("stroke", strokeColor)
.attr("stroke-width", 1)
.attr("stroke-opacity", strokeOpacity)
.attr("fill", "none");
};
switch (patternKey) {
case "diag-forward":
addForward();
break;
case "diag-back":
addBackward();
break;
case "cross":
addForward();
addBackward();
break;
case "dots":
pattern
.append("circle")
.attr("cx", 4)
.attr("cy", 4)
.attr("r", 1.5)
.attr("fill", strokeColor)
.attr("fill-opacity", strokeOpacity);
break;
default:
break;
}
}
function createChannelStyle(label, baseColor, patternKey) {
const patternInfo =
PATTERN_TYPES.find((pattern) => pattern.key === patternKey) || PATTERN_TYPES[0];
return {
baseColor,
hatch: patternInfo ? patternInfo.key : "none",
legendClass: patternInfo ? patternInfo.legendClass : "none",
};
}
let currentGraphData = null;
let currentChannelStyles = new Map();
let currentDepth = sanitizeDepth(depthInput.value);
let currentMaxNodes = sanitizeMaxNodes(maxNodesInput.value);
let currentSimulation = null;
let currentFullGraph = false;
let currentIncludeExternal = true;
let previousMaxNodesValue = maxNodesInput ? maxNodesInput.value : "200";
let previousMaxNodesValue = maxNodesInput ? maxNodesInput.value : "200";
function setStatus(message, isError = false) {
if (!statusEl) return;
statusEl.textContent = message;
if (isError) {
statusEl.classList.add("error");
} else {
statusEl.classList.remove("error");
}
}
function sanitizeId(value) {
return (value || "").trim();
}
function isFullGraphMode(forceValue) {
if (typeof forceValue === "boolean") {
return forceValue;
}
return fullGraphToggle ? !!fullGraphToggle.checked : false;
}
function applyFullGraphState(forceValue) {
const enabled = isFullGraphMode(forceValue);
if (typeof forceValue === "boolean" && fullGraphToggle) {
fullGraphToggle.checked = forceValue;
}
if (depthInput) {
depthInput.disabled = enabled;
}
if (maxNodesInput) {
if (enabled) {
previousMaxNodesValue = maxNodesInput.value || previousMaxNodesValue || "200";
maxNodesInput.value = "0";
maxNodesInput.disabled = true;
} else {
if (maxNodesInput.disabled) {
maxNodesInput.value = previousMaxNodesValue || "200";
}
maxNodesInput.disabled = false;
}
}
if (videoInput) {
if (enabled) {
videoInput.removeAttribute("required");
} else {
videoInput.setAttribute("required", "required");
}
}
}
async function fetchGraph(
videoId,
depth,
maxNodes,
fullGraphMode = false,
includeExternal = true
) {
const params = new URLSearchParams();
if (videoId) {
params.set("video_id", videoId);
}
if (fullGraphMode) {
params.set("full_graph", "1");
params.set("max_nodes", "0");
} else {
params.set("depth", String(depth));
params.set("max_nodes", String(maxNodes));
}
params.set("external", includeExternal ? "1" : "0");
const response = await fetch(`/api/graph?${params.toString()}`);
if (!response.ok) {
const errorPayload = await response.json().catch(() => ({}));
const errorMessage =
errorPayload.error ||
`Graph request failed (${response.status} ${response.statusText})`;
throw new Error(errorMessage);
}
return response.json();
}
function resizeContainer() {
if (!container) return;
const minHeight = 520;
const viewportHeight = window.innerHeight;
container.style.height = `${Math.max(minHeight, Math.round(viewportHeight * 0.6))}px`;
}
function renderGraph(data, labelSize = "normal") {
if (!container) return;
if (currentSimulation) {
currentSimulation.stop();
currentSimulation = null;
}
container.innerHTML = "";
const width = container.clientWidth || 900;
const height = container.clientHeight || 600;
const svg = d3
.select(container)
.append("svg")
.attr("viewBox", [0, 0, width, height])
.attr("width", "100%")
.attr("height", height);
const defs = svg.append("defs");
defs
.append("marker")
.attr("id", "arrow-references")
.attr("viewBox", "0 -5 10 10")
.attr("refX", 18)
.attr("refY", 0)
.attr("markerWidth", 6)
.attr("markerHeight", 6)
.attr("orient", "auto")
.append("path")
.attr("d", "M0,-5L10,0L0,5")
.attr("fill", "#6c83c7");
defs
.append("marker")
.attr("id", "arrow-referenced-by")
.attr("viewBox", "0 -5 10 10")
.attr("refX", 18)
.attr("refY", 0)
.attr("markerWidth", 6)
.attr("markerHeight", 6)
.attr("orient", "auto")
.append("path")
.attr("d", "M0,-5L10,0L0,5")
.attr("fill", "#c76c6c");
const contentGroup = svg.append("g").attr("class", "graph-content");
const linkGroup = contentGroup.append("g").attr("class", "graph-links");
const nodeGroup = contentGroup.append("g").attr("class", "graph-nodes");
const labelGroup = contentGroup.append("g").attr("class", "graph-labels");
const links = data.links || [];
const nodes = data.nodes || [];
currentChannelStyles = new Map();
const uniqueChannels = [];
nodes.forEach((node) => {
const label = getChannelLabel(node);
if (!currentChannelStyles.has(label)) {
uniqueChannels.push(label);
}
});
const additionalPatternCount = ADDITIONAL_PATTERNS.length;
uniqueChannels.forEach((label, idx) => {
const baseColor = color(label);
let patternKey = "none";
if (idx >= paletteSizeDefault && additionalPatternCount > 0) {
const patternInfo =
ADDITIONAL_PATTERNS[(idx - paletteSizeDefault) % additionalPatternCount];
patternKey = patternInfo.key;
}
const style = createChannelStyle(label, baseColor, patternKey);
currentChannelStyles.set(label, style);
});
const linkSelection = linkGroup
.selectAll("line")
.data(links)
.enter()
.append("line")
.attr("stroke-width", 1.2)
.attr("stroke", (d) =>
d.relation === "references" ? "#6c83c7" : "#c76c6c"
)
.attr("stroke-opacity", 0.7)
.attr("marker-end", (d) =>
d.relation === "references" ? "url(#arrow-references)" : "url(#arrow-referenced-by)"
);
let nodePatternCounter = 0;
const nodePatternRefs = new Map();
const getNodeFill = (node) => {
const style = currentChannelStyles.get(getChannelLabel(node));
if (!style) {
return color(getChannelLabel(node));
}
if (!style.hatch || style.hatch === "none") {
return style.baseColor;
}
const patternId = `node-pattern-${nodePatternCounter++}`;
const pattern = defs
.append("pattern")
.attr("id", patternId)
.attr("patternUnits", "userSpaceOnUse")
.attr("width", 8)
.attr("height", 8);
appendPatternContent(pattern, style.baseColor, style.hatch);
pattern.attr("patternTransform", "translate(0,0)");
nodePatternRefs.set(node.id, pattern);
return `url(#${patternId})`;
};
const nodeSelection = nodeGroup
.selectAll("circle")
.data(nodes, (d) => d.id)
.enter()
.append("circle")
.attr("r", (d) => (d.is_root ? 10 : 7))
.attr("fill", (d) => getNodeFill(d))
.attr("stroke", "#1f1f1f")
.attr("stroke-width", (d) => (d.is_root ? 2 : 1))
.call(
d3
.drag()
.on("start", (event, d) => {
if (!event.active) simulation.alphaTarget(0.3).restart();
d.fx = d.x;
d.fy = d.y;
})
.on("drag", (event, d) => {
d.fx = event.x;
d.fy = event.y;
})
.on("end", (event, d) => {
if (!event.active) simulation.alphaTarget(0);
d.fx = null;
d.fy = null;
})
)
.on("click", (event, d) => {
if (d.url) {
window.open(d.url, "_blank", "noopener");
}
})
.on("contextmenu", (event, d) => {
event.preventDefault();
loadGraph(d.id, currentDepth, currentMaxNodes, {
updateInputs: true,
includeExternal: currentIncludeExternal,
});
});
nodeSelection
.append("title")
.text((d) => {
const parts = [];
parts.push(d.title || d.id);
if (d.channel_name) {
parts.push(`Channel: ${d.channel_name}`);
}
if (d.date) {
parts.push(`Date: ${d.date}`);
}
return parts.join("\n");
});
const labelSelection = labelGroup
.selectAll("text")
.data(nodes, (d) => d.id)
.enter()
.append("text")
.attr("class", "graph-node-label")
.attr("text-anchor", "middle")
.attr("fill", "#1f1f1f")
.attr("pointer-events", "none")
.text((d) => d.title || d.id);
applyLabelAppearance(labelSelection, labelSize);
const simulation = d3
.forceSimulation(nodes)
.force(
"link",
d3
.forceLink(links)
.id((d) => d.id)
.distance(120)
.strength(0.8)
)
.force("charge", d3.forceManyBody().strength(-320))
.force("center", d3.forceCenter(width / 2, height / 2))
.force(
"collide",
d3.forceCollide().radius((d) => (d.is_root ? 20 : 14)).iterations(2)
);
simulation.on("tick", () => {
linkSelection
.attr("x1", (d) => d.source.x)
.attr("y1", (d) => d.source.y)
.attr("x2", (d) => d.target.x)
.attr("y2", (d) => d.target.y);
nodeSelection.attr("cx", (d) => d.x).attr("cy", (d) => d.y);
labelSelection.attr("x", (d) => d.x).attr("y", (d) => d.y - (d.is_root ? 14 : 12));
nodeSelection.each(function (d) {
const pattern = nodePatternRefs.get(d.id);
if (pattern) {
const safeX = Number.isFinite(d.x) ? d.x : 0;
const safeY = Number.isFinite(d.y) ? d.y : 0;
pattern.attr("patternTransform", `translate(${safeX}, ${safeY})`);
}
});
});
const zoomBehavior = d3
.zoom()
.scaleExtent([0.3, 3])
.on("zoom", (event) => {
contentGroup.attr("transform", event.transform);
});
svg.call(zoomBehavior);
currentSimulation = simulation;
}
async function loadGraph(
videoId,
depth,
maxNodes,
{ updateInputs = false, fullGraph, includeExternal } = {}
) {
const wantsFull = isFullGraphMode(
typeof fullGraph === "boolean" ? fullGraph : undefined
);
const includeFlag =
typeof includeExternal === "boolean" ? includeExternal : currentIncludeExternal;
currentIncludeExternal = includeFlag;
const sanitizedId = sanitizeId(videoId);
if (!wantsFull && !sanitizedId) {
setStatus("Please enter a video ID.", true);
return;
}
const safeDepth = wantsFull ? currentDepth || 1 : sanitizeDepth(depth);
const safeMaxNodes = wantsFull ? 0 : sanitizeMaxNodes(maxNodes);
if (updateInputs) {
videoInput.value = sanitizedId;
depthInput.value = String(wantsFull ? currentDepth || 1 : safeDepth);
maxNodesInput.value = String(safeMaxNodes);
applyFullGraphState(wantsFull);
} else {
applyFullGraphState();
}
setStatus(wantsFull ? "Loading full reference graph…" : "Loading graph…");
try {
const data = await fetchGraph(
sanitizedId,
safeDepth,
safeMaxNodes,
wantsFull,
includeFlag
);
if (!data.nodes || data.nodes.length === 0) {
setStatus("No nodes returned for this video.", true);
container.innerHTML = "";
currentGraphData = null;
currentChannelStyles = new Map();
renderLegend([]);
return;
}
currentGraphData = data;
currentDepth = safeDepth;
currentMaxNodes = safeMaxNodes;
currentFullGraph = wantsFull;
renderGraph(data, getLabelSize());
renderLegend(data.nodes);
setStatus(
`Showing ${data.nodes.length} nodes and ${data.links.length} links (${
data.meta?.mode === "full" ? "full graph" : `depth ${data.depth}`
})`
);
updateUrlState(
sanitizedId,
safeDepth,
safeMaxNodes,
getLabelSize(),
wantsFull,
includeFlag
);
} catch (err) {
console.error(err);
setStatus(err.message || "Failed to build graph.", true);
container.innerHTML = "";
currentGraphData = null;
currentChannelStyles = new Map();
renderLegend([]);
}
}
async function handleSubmit(event) {
event.preventDefault();
await loadGraph(videoInput.value, depthInput.value, maxNodesInput.value, {
updateInputs: true,
fullGraph: isFullGraphMode(),
includeExternal: currentIncludeExternal,
});
}
function renderLegend(nodes) {
let legend = document.getElementById("graphLegend");
if (!legend) {
legend = document.createElement("div");
legend.id = "graphLegend";
legend.className = "graph-legend";
if (statusEl && statusEl.parentNode) {
statusEl.insertAdjacentElement("afterend", legend);
} else {
container.parentElement?.insertBefore(legend, container);
}
}
legend.innerHTML = "";
const edgesSection = document.createElement("div");
edgesSection.className = "graph-legend-section";
const edgesTitle = document.createElement("div");
edgesTitle.className = "graph-legend-title";
edgesTitle.textContent = "Edges";
edgesSection.appendChild(edgesTitle);
const createEdgeRow = (swatchClass, text) => {
const row = document.createElement("div");
row.className = "graph-legend-row";
const swatch = document.createElement("span");
swatch.className = `graph-legend-swatch ${swatchClass}`;
const label = document.createElement("span");
label.textContent = text;
row.appendChild(swatch);
row.appendChild(label);
return row;
};
edgesSection.appendChild(
createEdgeRow(
"graph-legend-swatch--references",
"Outgoing reference (video references other)"
)
);
edgesSection.appendChild(
createEdgeRow(
"graph-legend-swatch--referenced",
"Incoming reference (other video references this)"
)
);
legend.appendChild(edgesSection);
const channelSection = document.createElement("div");
channelSection.className = "graph-legend-section";
const channelTitle = document.createElement("div");
channelTitle.className = "graph-legend-title";
channelTitle.textContent = "Channels in view";
channelSection.appendChild(channelTitle);
const channelList = document.createElement("div");
channelList.className = "graph-legend-channel-list";
const channelEntries = Array.from(currentChannelStyles.entries()).sort((a, b) =>
a[0].localeCompare(b[0], undefined, { sensitivity: "base" })
);
const maxChannelItems = 20;
channelEntries.slice(0, maxChannelItems).forEach(([label, style]) => {
const item = document.createElement("div");
item.className = `graph-legend-channel graph-legend-channel--${
style.legendClass || "none"
}`;
const swatch = document.createElement("span");
swatch.className = "graph-legend-swatch graph-legend-channel-swatch";
swatch.style.backgroundColor = style.baseColor;
const text = document.createElement("span");
text.textContent = label;
item.appendChild(swatch);
item.appendChild(text);
channelList.appendChild(item);
});
const totalChannels = channelEntries.length;
if (channelList.childElementCount) {
channelSection.appendChild(channelList);
if (totalChannels > maxChannelItems) {
const note = document.createElement("div");
note.className = "graph-legend-note";
note.textContent = `+${totalChannels - maxChannelItems} more channels`;
channelSection.appendChild(note);
}
} else {
const empty = document.createElement("div");
empty.className = "graph-legend-note";
empty.textContent = "No channel data available.";
channelSection.appendChild(empty);
}
legend.appendChild(channelSection);
}
function applyLabelAppearance(selection, labelSize) {
if (labelSize === "off") {
selection.style("display", "none");
} else {
selection
.style("display", null)
.attr("font-size", LABEL_FONT_SIZES[labelSize] || LABEL_FONT_SIZES.normal);
}
}
function updateUrlState(
videoId,
depth,
maxNodes,
labelSize,
fullGraphMode,
includeExternal
) {
if (isEmbedded) {
return;
}
const next = new URL(window.location.href);
if (videoId) {
next.searchParams.set("video_id", videoId);
} else {
next.searchParams.delete("video_id");
}
if (fullGraphMode) {
next.searchParams.set("full_graph", "1");
next.searchParams.delete("depth");
next.searchParams.set("max_nodes", "0");
} else {
next.searchParams.set("depth", String(depth));
next.searchParams.delete("full_graph");
next.searchParams.set("max_nodes", String(maxNodes));
}
if (!includeExternal) {
next.searchParams.set("external", "0");
} else {
next.searchParams.delete("external");
}
if (labelSize && labelSize !== "normal") {
next.searchParams.set("label_size", labelSize);
} else {
next.searchParams.delete("label_size");
}
history.replaceState({}, "", next.toString());
}
function initFromQuery() {
const params = new URLSearchParams(window.location.search);
const videoId = sanitizeId(params.get("video_id"));
const depth = sanitizeDepth(params.get("depth") || "");
const rawMaxNodes = params.get("max_nodes");
let maxNodes = sanitizeMaxNodes(rawMaxNodes || "");
if (rawMaxNodes && rawMaxNodes.trim() === "0") {
maxNodes = 0;
}
const labelSizeParam = params.get("label_size");
const fullGraphParam = params.get("full_graph");
const viewFull =
fullGraphParam && ["1", "true", "yes"].includes(fullGraphParam.toLowerCase());
const externalParam = params.get("external");
const includeExternal =
!externalParam ||
!["0", "false", "no"].includes(externalParam.toLowerCase());
currentIncludeExternal = includeExternal;
if (videoId) {
videoInput.value = videoId;
}
depthInput.value = String(depth);
maxNodesInput.value = String(viewFull ? 0 : maxNodes);
if (fullGraphToggle) {
fullGraphToggle.checked = !!viewFull;
}
applyFullGraphState();
if (labelSizeParam && isValidLabelSize(labelSizeParam)) {
setLabelSizeInput(labelSizeParam);
} else {
setLabelSizeInput(getLabelSize());
}
if ((isEmbedded && !viewFull) || (!videoId && !viewFull)) {
return;
}
loadGraph(videoId, depth, maxNodes, {
updateInputs: false,
fullGraph: viewFull,
includeExternal,
});
}
resizeContainer();
window.addEventListener("resize", resizeContainer);
form.addEventListener("submit", handleSubmit);
if (fullGraphToggle) {
fullGraphToggle.addEventListener("change", () => {
applyFullGraphState();
});
}
labelSizeInput.addEventListener("change", () => {
const size = getLabelSize();
if (currentGraphData) {
renderGraph(currentGraphData, size);
renderLegend(currentGraphData.nodes);
}
updateUrlState(
sanitizeId(videoInput.value),
currentDepth,
currentMaxNodes,
size,
currentFullGraph,
currentIncludeExternal
);
});
initFromQuery();
Object.assign(GraphUI, {
load(videoId, depth, maxNodes, options = {}) {
const targetDepth = depth != null ? depth : currentDepth;
const targetMax = maxNodes != null ? maxNodes : currentMaxNodes;
const explicitFull =
typeof options.fullGraph === "boolean"
? options.fullGraph
: undefined;
if (fullGraphToggle && typeof explicitFull === "boolean") {
fullGraphToggle.checked = explicitFull;
}
applyFullGraphState(
typeof explicitFull === "boolean" ? explicitFull : undefined
);
const fullFlag =
typeof explicitFull === "boolean"
? explicitFull
: isFullGraphMode();
const explicitInclude =
typeof options.includeExternal === "boolean"
? options.includeExternal
: undefined;
if (typeof explicitInclude === "boolean") {
currentIncludeExternal = explicitInclude;
}
return loadGraph(videoId, targetDepth, targetMax, {
updateInputs: options.updateInputs !== false,
fullGraph: fullFlag,
includeExternal:
typeof explicitInclude === "boolean"
? explicitInclude
: currentIncludeExternal,
});
},
setLabelSize(size) {
if (!labelSizeInput || !size) return;
setLabelSizeInput(size);
labelSizeInput.dispatchEvent(new Event("change", { bubbles: true }));
},
setDepth(value) {
if (!depthInput) return;
const safe = sanitizeDepth(value);
depthInput.value = String(safe);
currentDepth = safe;
},
setMaxNodes(value) {
if (!maxNodesInput) return;
const safe = sanitizeMaxNodes(value);
maxNodesInput.value = String(safe);
currentMaxNodes = safe;
},
focusInput() {
if (videoInput) {
videoInput.focus();
videoInput.select();
}
},
stop() {
if (currentSimulation) {
currentSimulation.stop();
currentSimulation = null;
}
},
getState() {
return {
depth: currentDepth,
maxNodes: currentMaxNodes,
labelSize: getLabelSize(),
nodes: currentGraphData ? currentGraphData.nodes.slice() : [],
links: currentGraphData ? currentGraphData.links.slice() : [],
fullGraph: currentFullGraph,
includeExternal: currentIncludeExternal,
};
},
setIncludeExternal(value) {
if (typeof value !== "boolean") return;
currentIncludeExternal = value;
},
isEmbedded,
});
GraphUI.ready = true;
setTimeout(() => {
window.dispatchEvent(new CustomEvent("graph-ui-ready"));
}, 0);
})();

View File

@@ -3,22 +3,40 @@
<head> <head>
<meta charset="utf-8" /> <meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" /> <meta name="viewport" content="width=device-width, initial-scale=1" />
<title>This Little Corner (Python)</title> <title>TLC Search</title>
<link rel="stylesheet" href="https://unpkg.com/xp.css" /> <link rel="icon" href="/static/favicon.png" type="image/png" />
<link rel="stylesheet" href="https://unpkg.com/xp.css" integrity="sha384-isKk8ZXKlU28/m3uIrnyTfuPaamQIF4ONLeGSfsWGEe3qBvaeLU5wkS4J7cTIwxI" crossorigin="anonymous" />
<link rel="stylesheet" href="/static/style.css" /> <link rel="stylesheet" href="/static/style.css" />
<script src="https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js"></script> <script src="https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js" integrity="sha384-CjloA8y00+1SDAUkjs099PVfnY2KmDC2BZnws9kh8D/lX1s46w6EPhpXdqMfjK6i" crossorigin="anonymous"></script>
</head> </head>
<body> <body>
<div class="window" style="max-width: 1200px; margin: 20px auto;"> <div class="window" style="max-width: 1200px; margin: 20px auto;">
<div class="title-bar"> <div class="title-bar">
<div class="title-bar-text">This Little Corner — Elastic Search</div> <div class="title-bar-text">This Little Corner</div>
<div class="title-bar-controls"> <div class="title-bar-controls">
<button id="aboutBtn" aria-label="About">?</button>
<button id="minimizeBtn" aria-label="Minimize"></button> <button id="minimizeBtn" aria-label="Minimize"></button>
<button aria-label="Maximize"></button> <button aria-label="Maximize"></button>
<button aria-label="Close"></button> <button aria-label="Close"></button>
</div> </div>
</div> </div>
<div class="window-body"> <div class="window-body">
<div class="window-actions">
<a
id="rssButton"
class="rss-button"
href="/rss"
target="_blank"
rel="noopener"
title="Unified RSS feed"
aria-label="Unified RSS feed"
>
<svg class="rss-button__icon" viewBox="0 0 24 24" aria-hidden="true">
<path d="M6 18a2 2 0 1 0 0 4a2 2 0 0 0 0-4zm-4 6a4 4 0 0 1 4-4a4 4 0 0 1 4 4h-2a2 2 0 0 0-2-2a2 2 0 0 0-2 2zm0-8v-2c6.627 0 12 5.373 12 12h-2c0-5.523-4.477-10-10-10zm0-4V4c11.046 0 20 8.954 20 20h-2c0-9.941-8.059-18-18-18z"/>
</svg>
<span class="rss-button__label">RSS</span>
</a>
</div>
<p>Enter a phrase to query title, description, and transcript text.</p> <p>Enter a phrase to query title, description, and transcript text.</p>
<fieldset> <fieldset>
@@ -30,19 +48,22 @@
</div> </div>
<div class="field-row" style="margin-bottom: 8px; align-items: center;"> <div class="field-row" style="margin-bottom: 8px; align-items: center;">
<label style="width: 60px;">Channel:</label> <label for="channel" style="width: 60px;">Channel:</label>
<details id="channelDropdown" class="channel-dropdown" style="flex: 1;"> <select id="channel" style="flex: 1;">
<summary id="channelSummary">All Channels</summary> <option value="">All Channels</option>
<div id="channelOptions" class="channel-options"> </select>
<div>Loading channels…</div>
</div> <label for="year" style="margin-left: 8px;">Year:</label>
</details> <select id="year">
<option value="">All Years</option>
</select>
<label for="sort" style="margin-left: 8px;">Sort:</label> <label for="sort" style="margin-left: 8px;">Sort:</label>
<select id="sort"> <select id="sort">
<option value="relevant">Most relevant</option> <option value="relevant">Most relevant</option>
<option value="newer">Newest first</option> <option value="newer">Newest first</option>
<option value="older">Oldest first</option> <option value="older">Oldest first</option>
<option value="referenced">Most referenced</option>
</select> </select>
<label for="size" style="margin-left: 8px;">Size:</label> <label for="size" style="margin-left: 8px;">Size:</label>
@@ -53,18 +74,36 @@
</select> </select>
</div> </div>
<div class="field-row"> <div class="field-row toggle-row">
<div class="toggle-item toggle-item--first">
<input type="checkbox" id="exactToggle" checked /> <input type="checkbox" id="exactToggle" checked />
<label for="exactToggle">Exact</label> <label for="exactToggle">Exact</label>
<span class="toggle-help">Match all terms exactly.</span>
</div>
<div class="toggle-item">
<input type="checkbox" id="fuzzyToggle" checked /> <input type="checkbox" id="fuzzyToggle" checked />
<label for="fuzzyToggle">Fuzzy</label> <label for="fuzzyToggle">Fuzzy</label>
<span class="toggle-help">Allow small typos and variations.</span>
</div>
<div class="toggle-item">
<input type="checkbox" id="phraseToggle" checked /> <input type="checkbox" id="phraseToggle" checked />
<label for="phraseToggle">Phrase</label> <label for="phraseToggle">Phrase</label>
<span class="toggle-help">Boost exact phrases inside transcripts.</span>
</div>
<div class="toggle-item">
<input type="checkbox" id="externalToggle" />
<label for="externalToggle">External</label>
<span class="toggle-help">Include externally referenced items.</span>
</div>
<div class="toggle-item">
<input type="checkbox" id="queryStringToggle" /> <input type="checkbox" id="queryStringToggle" />
<label for="queryStringToggle">Query string mode</label> <label for="queryStringToggle">Query string mode</label>
<span class="toggle-help">Use raw Lucene syntax (overrides other toggles).</span>
</div>
</div> </div>
</fieldset> </fieldset>
@@ -78,7 +117,7 @@
</fieldset> </fieldset>
</div> </div>
<div class="summary-right"> <div class="summary-right">
<fieldset style="height: 100%;"> <fieldset>
<legend>Timeline</legend> <legend>Timeline</legend>
<div id="frequencySummary" style="font-size: 11px; margin-bottom: 8px;"></div> <div id="frequencySummary" style="font-size: 11px; margin-bottom: 8px;"></div>
<div id="frequencyChart"></div> <div id="frequencyChart"></div>
@@ -97,6 +136,114 @@
</div> </div>
</div> </div>
<div class="about-panel" id="aboutPanel" hidden>
<div class="about-panel__header">
<strong>About This App</strong>
<button id="aboutCloseBtn" aria-label="Close about panel">×</button>
</div>
<div class="about-panel__body">
<p>Use the toggles to choose exact, fuzzy, or phrase matching. Query string mode accepts raw Lucene syntax.</p>
<p>Results are ranked by your chosen sort order; the timeline summarizes the same query.</p>
<p>You can download transcripts, copy MLA citations, or explore references via the graph button.</p>
<div class="about-panel__section">
<div class="about-panel__label">Unified RSS feed</div>
<a id="rssFeedLink" href="#" target="_blank" rel="noopener">Loading…</a>
</div>
<div class="about-panel__section">
<div class="about-panel__label">Channel list</div>
<a id="channelListLink" href="/api/channel-list" target="_blank" rel="noopener">View JSON</a>
<div id="channelCount" class="about-panel__meta"></div>
</div>
</div>
</div>
<div
id="graphModalOverlay"
class="graph-modal-overlay"
aria-hidden="true"
>
<div
class="window graph-window graph-modal-window"
id="graphModalWindow"
role="dialog"
aria-modal="true"
aria-labelledby="graphModalTitle"
>
<div class="title-bar">
<div class="title-bar-text" id="graphModalTitle">Reference Graph</div>
<div class="title-bar-controls">
<button id="graphModalClose" aria-label="Close"></button>
</div>
</div>
<div class="window-body">
<p>
Explore how this video links with its neighbors. Adjust depth or node cap to expand the graph.
</p>
<form id="graphForm" class="graph-controls">
<div class="field-group">
<label for="graphVideoId">Video ID</label>
<input
id="graphVideoId"
name="video_id"
type="text"
placeholder="e.g. dQw4w9WgXcQ"
required
/>
</div>
<div class="field-group">
<label for="graphDepth">Depth</label>
<select id="graphDepth" name="depth">
<option value="1" selected>1 hop</option>
<option value="2">2 hops</option>
<option value="3">3 hops</option>
</select>
</div>
<div class="field-group">
<label for="graphMaxNodes">Max nodes</label>
<select id="graphMaxNodes" name="max_nodes">
<option value="100">100</option>
<option value="150">150</option>
<option value="200" selected>200</option>
<option value="300">300</option>
<option value="400">400</option>
</select>
</div>
<div class="field-group">
<label for="graphLabelSize">Labels</label>
<select id="graphLabelSize" name="label_size">
<option value="off">Off</option>
<option value="tiny" selected>Tiny</option>
<option value="small">Small</option>
<option value="normal">Normal</option>
<option value="medium">Medium</option>
<option value="large">Large</option>
<option value="xlarge">Extra large</option>
</select>
</div>
<button type="submit">Build graph</button>
</form>
<div id="graphStatus" class="graph-status">Enter a video ID to begin.</div>
<div
id="graphContainer"
class="graph-container"
data-embedded="true"
></div>
</div>
<div class="status-bar">
<p class="status-bar-field">Right-click a node to set a new root</p>
<p class="status-bar-field">Colors (and hatches) represent channels</p>
</div>
</div>
</div>
<script src="/static/graph.js"></script>
<script src="/static/app.js"></script> <script src="/static/app.js"></script>
</body> </body>
</html> </html>

61
static/notes.html Normal file
View File

@@ -0,0 +1,61 @@
<!doctype html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>Notes</title>
<link rel="icon" href="/static/favicon.png" type="image/png" />
<link rel="stylesheet" href="https://unpkg.com/xp.css" integrity="sha384-isKk8ZXKlU28/m3uIrnyTfuPaamQIF4ONLeGSfsWGEe3qBvaeLU5wkS4J7cTIwxI" crossorigin="anonymous" />
<link rel="stylesheet" href="/static/style.css" />
<style>
.notes-content {
line-height: 1.6;
}
.notes-content h2 {
margin-top: 1.5em;
margin-bottom: 0.5em;
border-bottom: 1px solid #ccc;
padding-bottom: 0.25em;
}
.notes-content h2:first-child {
margin-top: 0;
}
.notes-content p {
margin: 0.75em 0;
}
.notes-content ul, .notes-content ol {
margin: 0.75em 0;
padding-left: 1.5em;
}
.notes-content li {
margin: 0.25em 0;
}
</style>
</head>
<body>
<div class="window" style="max-width: 800px; margin: 20px auto;">
<div class="title-bar">
<div class="title-bar-text">Notes</div>
<div class="title-bar-controls">
<button aria-label="Minimize"></button>
<button aria-label="Maximize"></button>
<button aria-label="Close"></button>
</div>
</div>
<div class="window-body">
<p style="margin-bottom: 16px;"><a href="/">← Back to search</a></p>
<div class="notes-content">
<h2>Welcome</h2>
<p>This is a space for thoughts, observations, and notes related to this project and beyond.</p>
<!-- Add your notes below -->
</div>
</div>
<div class="status-bar">
<p class="status-bar-field">Last updated: January 2026</p>
</div>
</div>
</body>
</html>

View File

@@ -63,7 +63,7 @@ body.dimmed {
} }
.field-row input[type="text"], .field-row input[type="text"],
.field-row .channel-dropdown { .field-row select#channel {
flex: 1 1 100% !important; flex: 1 1 100% !important;
min-width: 0 !important; min-width: 0 !important;
max-width: 100% !important; max-width: 100% !important;
@@ -86,63 +86,73 @@ body.dimmed {
max-width: 100%; max-width: 100%;
min-width: 100%; min-width: 100%;
} }
.graph-controls {
flex-direction: column;
align-items: stretch;
}
.graph-controls .field-group,
.graph-controls input,
.graph-controls select {
width: 100%;
min-width: 0;
}
} }
/* Channel dropdown custom styling */ .toggle-row {
.channel-dropdown { flex-direction: column;
position: relative; align-items: flex-start;
display: inline-block; gap: 4px;
margin-top: 8px;
} }
.channel-dropdown summary { .toggle-row > * {
list-style: none; margin-left: 0 !important;
cursor: pointer;
padding: 3px 4px;
background: ButtonFace;
border: 1px solid;
border-color: ButtonHighlight ButtonShadow ButtonShadow ButtonHighlight;
min-width: 180px;
text-align: left;
} }
.channel-dropdown summary::-webkit-details-marker { .toggle-item {
display: none;
}
.channel-dropdown summary::after {
content: ' ▼';
font-size: 8px;
float: right;
}
.channel-dropdown[open] summary::after {
content: ' ▲';
}
.channel-options {
position: absolute;
margin-top: 2px;
padding: 4px;
background: ButtonFace;
border: 1px solid;
border-color: ButtonHighlight ButtonShadow ButtonShadow ButtonHighlight;
max-height: 300px;
overflow-y: auto;
box-shadow: 2px 2px 0 rgba(0, 0, 0, 0.2);
z-index: 100;
min-width: 220px;
}
.channel-option {
display: flex; display: flex;
align-items: center; align-items: center;
gap: 6px; gap: 6px;
margin-bottom: 4px; user-select: none;
font-size: 11px;
} }
.channel-option:last-child { .toggle-item label {
margin-bottom: 0; cursor: pointer;
width: auto !important;
}
.toggle-item--first {
margin-left: 0;
}
.toggle-item input[type="checkbox"] {
margin: 0;
}
.toggle-item input[type="checkbox"]:disabled + label {
color: GrayText;
opacity: 0.7;
}
.toggle-item input[type="checkbox"]:disabled {
cursor: not-allowed;
}
.toggle-item input[type="checkbox"]:disabled + label {
cursor: not-allowed;
}
.description-block {
background: Window;
border: 1px solid #919b9c;
padding: 6px 8px;
margin-top: 6px;
font-size: 11px;
white-space: pre-wrap;
max-height: 6em;
overflow-y: auto;
} }
/* Layout helpers */ /* Layout helpers */
@@ -163,15 +173,440 @@ body.dimmed {
min-width: 300px; min-width: 300px;
} }
.graph-window {
width: 95%;
}
.graph-controls {
display: flex;
flex-wrap: wrap;
gap: 12px;
align-items: flex-end;
margin-bottom: 12px;
}
.graph-controls .field-group {
display: flex;
flex-direction: column;
gap: 4px;
}
.graph-controls label {
font-size: 11px;
font-weight: bold;
}
.graph-controls .field-hint {
font-size: 10px;
color: #3c3c3c;
margin: 0;
max-width: 280px;
}
.graph-controls input,
.graph-controls select {
min-width: 160px;
}
.graph-status {
font-size: 11px;
margin-bottom: 8px;
color: #1f1f1f;
}
.graph-status.error {
color: #b00020;
}
.graph-container {
background: Window;
border: 1px solid #919b9c;
box-shadow: inset -1px -1px #0a0a0a, inset 1px 1px #fff;
position: relative;
width: 100%;
min-height: 520px;
height: auto;
overflow: visible;
}
.graph-modal-overlay {
position: fixed;
inset: 0;
display: none;
align-items: center;
justify-content: center;
padding: 24px;
background: rgba(0, 0, 0, 0.35);
z-index: 2000;
}
.graph-modal-overlay.active {
display: flex;
}
.graph-modal-window {
width: min(960px, 100%);
max-height: calc(100vh - 48px);
}
.graph-modal-window .window-body {
max-height: calc(100vh - 180px);
overflow-y: auto;
}
.graph-modal-window .graph-container {
height: 560px;
}
body.modal-open {
overflow: hidden;
}
.result-header {
display: flex;
justify-content: flex-start;
gap: 6px;
flex-wrap: wrap;
align-items: flex-start;
}
.result-header-main {
flex: 1 1 auto;
min-width: 220px;
}
.result-actions {
display: flex;
align-items: flex-start;
gap: 6px;
margin-left: auto;
}
.result-action-btn {
white-space: nowrap;
font-family: "Tahoma", "MS Sans Serif", sans-serif;
font-size: 11px;
padding: 4px 10px;
}
.result-meta {
display: flex;
align-items: center;
flex-wrap: wrap;
gap: 4px;
}
.result-status {
display: inline-flex;
align-items: center;
gap: 4px;
padding: 1px 6px;
border-radius: 3px;
font-size: 10px;
line-height: 1.3;
border: 1px solid #c4a3a3;
background: #fff6f6;
color: #6b1f1f;
}
.result-status::before {
content: "⚠";
font-size: 10px;
line-height: 1;
}
.result-status--deleted {
border-color: #d1a6a6;
background: #fff8f8;
color: #6b1f1f;
}
.graph-launch-btn {
white-space: nowrap;
}
.graph-node-label {
text-shadow: -1px -1px 0 #fff, 1px -1px 0 #fff, -1px 1px 0 #fff, 1px 1px 0 #fff;
}
.graph-nodes circle {
cursor: pointer;
}
.graph-legend {
margin: 12px 0;
font-size: 11px;
background: Window;
border: 1px solid #919b9c;
padding: 8px 10px;
display: inline-flex;
flex-direction: column;
gap: 4px;
box-shadow: inset -1px -1px #0a0a0a, inset 1px 1px #fff;
}
.graph-legend-section {
display: flex;
flex-direction: column;
gap: 4px;
}
.graph-legend-title {
font-weight: bold;
color: #1f1f1f;
}
.graph-legend-row {
display: flex;
align-items: center;
gap: 8px;
}
.graph-legend-swatch {
display: inline-block;
width: 18px;
height: 12px;
border: 1px solid #1f1f1f;
}
.graph-legend-swatch--references {
background: #6c83c7;
}
.graph-legend-swatch--referenced {
background: #c76c6c;
}
.graph-legend-channel-list {
display: flex;
flex-wrap: wrap;
gap: 8px;
}
.graph-legend-channel {
display: flex;
align-items: center;
gap: 6px;
}
.graph-legend-channel-swatch {
width: 14px;
height: 14px;
background-repeat: repeat;
background-position: 0 0;
background-size: 6px 6px;
}
.graph-legend-channel--none .graph-legend-channel-swatch {
background-image: none;
}
.graph-legend-channel--diag-forward .graph-legend-channel-swatch {
background-image: repeating-linear-gradient(
45deg,
rgba(0, 0, 0, 0.35) 0,
rgba(0, 0, 0, 0.35) 2px,
transparent 2px,
transparent 4px
);
background-blend-mode: multiply;
}
.graph-legend-channel--diag-back .graph-legend-channel-swatch {
background-image: repeating-linear-gradient(
-45deg,
rgba(0, 0, 0, 0.35) 0,
rgba(0, 0, 0, 0.35) 2px,
transparent 2px,
transparent 4px
);
background-blend-mode: multiply;
}
.graph-legend-channel--cross .graph-legend-channel-swatch {
background-image:
repeating-linear-gradient(
45deg,
rgba(0, 0, 0, 0.25) 0,
rgba(0, 0, 0, 0.25) 2px,
transparent 2px,
transparent 4px
),
repeating-linear-gradient(
-45deg,
rgba(0, 0, 0, 0.25) 0,
rgba(0, 0, 0, 0.25) 2px,
transparent 2px,
transparent 4px
);
background-blend-mode: multiply;
}
.graph-legend-channel--dots .graph-legend-channel-swatch {
background-image: radial-gradient(rgba(0, 0, 0, 0.35) 30%, transparent 31%);
background-size: 6px 6px;
background-blend-mode: multiply;
}
.graph-legend-note {
font-size: 10px;
color: #555;
font-style: italic;
}
.title-bar-link {
display: inline-block;
color: inherit;
text-decoration: none;
font-size: 11px;
padding: 2px 6px;
border: 1px solid;
border-color: ButtonHighlight ButtonShadow ButtonShadow ButtonHighlight;
background: ButtonFace;
}
.title-bar-controls #aboutBtn {
font-weight: bold;
font-size: 12px;
padding: 0 6px;
margin-right: 6px;
}
.toggle-item {
display: flex;
align-items: center;
gap: 6px;
}
.toggle-help {
font-size: 10px;
color: #555;
}
.about-panel {
position: fixed;
top: 20px;
right: 20px;
width: 280px;
background: Window;
border: 2px solid #919b9c;
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.25);
z-index: 2100;
font-size: 11px;
}
.about-panel__header {
display: flex;
justify-content: space-between;
align-items: center;
padding: 6px 8px;
background: #0055aa;
color: #fff;
}
.about-panel__body {
padding: 8px;
background: Window;
color: #000;
}
.about-panel__section {
margin-top: 8px;
padding-top: 6px;
border-top: 1px solid #c0c0c0;
}
.about-panel__label {
font-weight: bold;
margin-bottom: 2px;
}
.about-panel__meta {
font-size: 10px;
color: #555;
}
.about-panel__header button {
border: none;
background: transparent;
color: inherit;
font-weight: bold;
cursor: pointer;
}
/* Results styling */ /* Results styling */
#results .item { #results .item {
border-bottom: 1px solid ButtonShadow; background: Window;
padding: 12px 0; border: 2px solid #919b9c;
padding: 12px;
margin-bottom: 8px; margin-bottom: 8px;
max-width: 100%;
overflow: hidden;
word-wrap: break-word;
box-sizing: border-box;
box-shadow: 2px 2px 0 rgba(0, 0, 0, 0.15);
} }
#results .item:last-child { #results .item:last-child {
border-bottom: none; margin-bottom: 0;
}
#results .item strong {
word-break: break-word;
max-width: 100%;
display: inline-block;
}
.window-body {
max-width: 100%;
overflow-x: hidden;
margin: 0;
padding: 1rem;
box-sizing: border-box;
}
.window-actions {
display: flex;
justify-content: flex-end;
margin-bottom: 6px;
}
.rss-button {
display: inline-flex;
align-items: center;
gap: 4px;
padding: 2px 6px;
border: 1px solid;
border-color: ButtonHighlight ButtonShadow ButtonShadow ButtonHighlight;
background: ButtonFace;
color: #000;
text-decoration: none;
font-size: 11px;
cursor: pointer;
}
.rss-button:hover {
background: #f3f3f3;
}
.rss-button:active {
border-color: ButtonShadow ButtonHighlight ButtonHighlight ButtonShadow;
}
.rss-button.is-disabled {
opacity: 0.5;
cursor: default;
pointer-events: none;
}
.rss-button__icon {
width: 14px;
height: 14px;
fill: #f38b00;
}
.rss-button__label {
font-weight: bold;
} }
/* Badges */ /* Badges */
@@ -180,6 +615,8 @@ body.dimmed {
display: flex; display: flex;
gap: 4px; gap: 4px;
flex-wrap: wrap; flex-wrap: wrap;
max-width: 100%;
overflow: hidden;
} }
.badge { .badge {
@@ -189,6 +626,31 @@ body.dimmed {
padding: 2px 6px; padding: 2px 6px;
font-size: 10px; font-size: 10px;
font-weight: bold; font-weight: bold;
white-space: nowrap;
word-break: keep-all;
}
.badge--transcript-primary {
background: #0b6efd;
}
.badge--transcript-secondary {
background: #8f4bff;
}
.badge--external {
background: #f5d08a;
color: #000;
border: 1px solid #cfa74f;
}
.badge-clickable {
cursor: pointer;
}
.badge-clickable:focus {
outline: 2px solid rgba(11, 110, 253, 0.6);
outline-offset: 1px;
} }
/* Transcript and highlights */ /* Transcript and highlights */
@@ -212,9 +674,14 @@ body.dimmed {
} }
.highlight-row { .highlight-row {
padding: 4px; padding: 4px 6px;
cursor: pointer; cursor: pointer;
border: 1px solid transparent; border: 1px solid transparent;
display: flex;
align-items: flex-start;
gap: 8px;
max-width: 100%;
box-sizing: border-box;
} }
.highlight-row:hover { .highlight-row:hover {
@@ -223,6 +690,77 @@ body.dimmed {
border: 1px dotted WindowText; border: 1px dotted WindowText;
} }
.highlight-text {
flex: 1 1 auto;
word-break: break-word;
overflow-wrap: anywhere;
}
.highlight-source-indicator {
width: 10px;
height: 10px;
border-radius: 2px;
border: 1px solid transparent;
margin-left: auto;
flex: 0 0 auto;
}
.highlight-source-indicator--primary {
background: #0b6efd;
border-color: #084bb5;
}
.highlight-source-indicator--secondary {
background: #8f4bff;
border-color: #5d2db3;
}
.vector-chunk {
margin-top: 8px;
padding: 8px;
background: #f3f7ff;
border: 1px solid #c7d0e2;
font-size: 11px;
line-height: 1.5;
word-break: break-word;
}
@media screen and (max-width: 640px) {
.result-header {
flex-direction: column;
gap: 6px;
}
.result-header-main {
flex: 1 1 auto;
min-width: 0;
width: 100%;
}
.result-actions {
width: auto;
align-self: flex-start;
justify-content: flex-start;
flex-wrap: wrap;
gap: 4px;
margin-left: 0;
}
.result-action-btn {
width: 100%;
text-align: left;
}
.highlight-row {
flex-direction: column;
gap: 4px;
}
.highlight-source-indicator {
align-self: flex-end;
}
}
mark { mark {
background: yellow; background: yellow;
color: black; color: black;
@@ -237,8 +775,7 @@ mark {
margin-top: 12px; margin-top: 12px;
padding: 8px; padding: 8px;
background: Window; background: Window;
border: 2px solid; border: 2px solid #919b9c;
border-color: ButtonShadow ButtonHighlight ButtonHighlight ButtonShadow;
max-height: 400px; max-height: 400px;
overflow-y: auto; overflow-y: auto;
font-size: 11px; font-size: 11px;
@@ -250,6 +787,10 @@ mark {
border-bottom: 1px solid ButtonShadow; border-bottom: 1px solid ButtonShadow;
} }
.transcript-segment--matched {
background: #fff6cc;
}
.transcript-segment:last-child { .transcript-segment:last-child {
border-bottom: none; border-bottom: none;
margin-bottom: 0; margin-bottom: 0;
@@ -294,27 +835,9 @@ mark {
line-height: 1.4; line-height: 1.4;
} }
.transcript-header { .transcript-header,
font-weight: bold;
margin-bottom: 8px;
display: flex;
align-items: center;
justify-content: space-between;
background: ActiveCaption;
color: CaptionText;
padding: 2px 4px;
}
.transcript-close { .transcript-close {
cursor: pointer; display: none;
font-size: 16px;
padding: 0 4px;
font-weight: bold;
}
.transcript-close:hover {
background: Highlight;
color: HighlightText;
} }
/* Chart styling */ /* Chart styling */

188
sync_qdrant_channels.py Normal file
View File

@@ -0,0 +1,188 @@
"""
Utility to backfill channel titles/names inside the Qdrant payloads.
Usage:
python -m python_app.sync_qdrant_channels \
--batch-size 512 \
--max-batches 200 \
--dry-run
"""
from __future__ import annotations
import argparse
import logging
from typing import Dict, Iterable, List, Optional, Set, Tuple
import time
import requests
from .config import CONFIG
from .search_app import _ensure_client
LOGGER = logging.getLogger(__name__)
def chunked(iterable: Iterable, size: int):
chunk: List = []
for item in iterable:
chunk.append(item)
if len(chunk) >= size:
yield chunk
chunk = []
if chunk:
yield chunk
def resolve_channels(channel_ids: Iterable[str]) -> Dict[str, str]:
client = _ensure_client(CONFIG)
ids = list(set(channel_ids))
if not ids:
return {}
body = {
"size": len(ids) * 2,
"_source": ["channel_id", "channel_name"],
"query": {"terms": {"channel_id.keyword": ids}},
}
response = client.search(index=CONFIG.elastic.index, body=body)
resolved: Dict[str, str] = {}
for hit in response.get("hits", {}).get("hits", []):
source = hit.get("_source") or {}
cid = source.get("channel_id")
cname = source.get("channel_name")
if cid and cname and cid not in resolved:
resolved[cid] = cname
return resolved
def upsert_channel_payload(
qdrant_url: str,
collection: str,
channel_id: str,
channel_name: str,
*,
dry_run: bool = False,
) -> bool:
"""Set channel_name/channel_title for all vectors with this channel_id."""
payload = {"channel_name": channel_name, "channel_title": channel_name}
body = {
"payload": payload,
"filter": {"must": [{"key": "channel_id", "match": {"value": channel_id}}]},
}
LOGGER.info("Updating channel_id=%s -> %s", channel_id, channel_name)
if dry_run:
return True
resp = requests.post(
f"{qdrant_url}/collections/{collection}/points/payload",
json=body,
timeout=120,
)
if resp.status_code >= 400:
LOGGER.error("Failed to update %s: %s", channel_id, resp.text)
return False
return True
def scroll_missing_payloads(
qdrant_url: str,
collection: str,
batch_size: int,
*,
max_points: Optional[int] = None,
) -> Iterable[List[Tuple[str, Dict[str, any]]]]:
"""Yield batches of (point_id, payload) missing channel names."""
fetched = 0
next_page = None
while True:
current_limit = batch_size
while True:
body = {
"limit": current_limit,
"with_payload": True,
"filter": {"must": [{"is_empty": {"key": "channel_name"}}]},
}
if next_page:
body["offset"] = next_page
try:
resp = requests.post(
f"{qdrant_url}/collections/{collection}/points/scroll",
json=body,
timeout=120,
)
resp.raise_for_status()
break
except requests.HTTPError as exc:
LOGGER.warning(
"Scroll request failed at limit=%s: %s", current_limit, exc
)
if current_limit <= 5:
raise
current_limit = max(5, current_limit // 2)
LOGGER.info("Reducing scroll batch size to %s", current_limit)
time.sleep(2)
except requests.RequestException as exc: # type: ignore[attr-defined]
LOGGER.warning("Transient scroll error: %s", exc)
time.sleep(2)
payload = resp.json().get("result", {})
points = payload.get("points", [])
if not points:
break
batch: List[Tuple[str, Dict[str, any]]] = []
for point in points:
pid = point.get("id")
p_payload = point.get("payload") or {}
batch.append((pid, p_payload))
yield batch
fetched += len(points)
if max_points and fetched >= max_points:
break
next_page = payload.get("next_page_offset")
if not next_page:
break
def main() -> None:
logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s")
parser = argparse.ArgumentParser(
description="Backfill missing channel_name/channel_title in Qdrant payloads"
)
parser.add_argument("--batch-size", type=int, default=512)
parser.add_argument(
"--max-points",
type=int,
default=None,
help="Limit processing to the first N points for testing",
)
parser.add_argument("--dry-run", action="store_true")
args = parser.parse_args()
q_url = CONFIG.qdrant_url
collection = CONFIG.qdrant_collection
total_updates = 0
for batch in scroll_missing_payloads(
q_url, collection, args.batch_size, max_points=args.max_points
):
channel_ids: Set[str] = set()
for _, payload in batch:
cid = payload.get("channel_id")
if cid:
channel_ids.add(str(cid))
if not channel_ids:
continue
resolved = resolve_channels(channel_ids)
if not resolved:
LOGGER.warning("No channel names resolved for ids: %s", channel_ids)
continue
for cid, name in resolved.items():
if upsert_channel_payload(
q_url, collection, cid, name, dry_run=args.dry_run
):
total_updates += 1
LOGGER.info("Updated %s channel payloads so far", total_updates)
LOGGER.info("Finished. Total channel updates attempted: %s", total_updates)
if __name__ == "__main__":
main()

78
urls.txt Normal file
View File

@@ -0,0 +1,78 @@
https://www.youtube.com/channel/UCCebR16tXbv5Ykk9_WtCCug/videos
https://www.youtube.com/channel/UC6vg0HkKKlgsWk-3HfV-vnw/videos
https://www.youtube.com/channel/UCeWWxwzgLYUbfjWowXhVdYw/videos
https://www.youtube.com/channel/UC952hDf_C4nYJdqwK7VzTxA/videos
https://www.youtube.com/channel/UCU5SNBfTo4umhjYz6M0Jsmg/videos
https://www.youtube.com/channel/UC6Tvr9mBXNaAxLGRA_sUSRA/videos
https://www.youtube.com/channel/UC4Rmxg7saTfwIpvq3QEzylQ/videos
https://www.youtube.com/channel/UCTdH4nh6JTcfKUAWvmnPoIQ/videos
https://www.youtube.com/channel/UCsi_x8c12NW9FR7LL01QXKA/videos
https://www.youtube.com/channel/UCAqTQ5yLHHH44XWwWXLkvHQ/videos
https://www.youtube.com/channel/UCprytROeCztMOMe8plyJRMg/videos
https://www.youtube.com/channel/UCpqDUjTsof-kTNpnyWper_Q/videos
https://www.youtube.com/channel/UCL_f53ZEJxp8TtlOkHwMV9Q/videos
https://www.youtube.com/channel/UCez1fzMRGctojfis2lfRYug/videos
https://www.youtube.com/channel/UC2leFZRD0ZlQDQxpR2Zd8oA/videos
https://www.youtube.com/channel/UC8SErJkYnDsYGh1HxoZkl-g/videos
https://www.youtube.com/channel/UCEPOn4cgvrrerg_-q_Ygw1A/videos
https://www.youtube.com/channel/UC2yCyOMUeem-cYwliC-tLJg/videos
https://www.youtube.com/channel/UCGsDIP_K6J6VSTqlq-9IPlg/videos
https://www.youtube.com/channel/UCEzWTLDYmL8soRdQec9Fsjw/videos
https://www.youtube.com/channel/UC1KgNsMdRoIA_njVmaDdHgA/videos
https://www.youtube.com/channel/UCFQ6Gptuq-sLflbJ4YY3Umw/videos
https://www.youtube.com/channel/UCEY1vGNBPsC3dCatZyK3Jkw/videos
https://www.youtube.com/channel/UCIAtCuzdvgNJvSYILnHtdWA/videos
https://www.youtube.com/channel/UClIDP7_Kzv_7tDQjTv9EhrA/videos
https://www.youtube.com/channel/UC-QiBn6GsM3JZJAeAQpaGAA/videos
https://www.youtube.com/channel/UCiJmdXTb76i8eIPXdJyf8ZQ/videos
https://www.youtube.com/channel/UCM9Z05vuQhMEwsV03u6DrLA/videos
https://www.youtube.com/channel/UCgp_r6WlBwDSJrP43Mz07GQ/videos
https://www.youtube.com/channel/UC5uv-BxzCrN93B_5qbOdRWw/videos
https://www.youtube.com/channel/UCtCTSf3UwRU14nYWr_xm-dQ/videos
https://www.youtube.com/channel/UC1a4VtU_SMSfdRiwMJR33YQ/videos
https://www.youtube.com/channel/UCg7Ed0lecvko58ibuX1XHng/videos
https://www.youtube.com/channel/UCMVG5eqpYFVEB-a9IqAOuHA/videos
https://www.youtube.com/channel/UC8mJqpS_EBbMcyuzZDF0TEw/videos
https://www.youtube.com/channel/UCGHuURJ1XFHzPSeokf6510A/videos
https://www.youtube.com/@chrishoward8473/videos
https://www.youtube.com/channel/UChptV-kf8lnncGh7DA2m8Pw/videos
https://www.youtube.com/channel/UCzX6R3ZLQh5Zma_5AsPcqPA/videos
https://www.youtube.com/channel/UCiukuaNd_qzRDTW9qe2OC1w/videos
https://www.youtube.com/channel/UC5yLuFQCms4nb9K2bGQLqIw/videos
https://www.youtube.com/channel/UCVdSgEf9bLXFMBGSMhn7x4Q/videos
https://www.youtube.com/channel/UC_dnk5D4tFCRYCrKIcQlcfw/videos
https://www.youtube.com/@Freerilian/videos
https://www.youtube.com/@marks.-ry7bm/videos
https://www.youtube.com/@Adams-Fall/videos
https://www.youtube.com/@mcmosav/videos
https://www.youtube.com/@Landbeorht/videos
https://www.youtube.com/@Corner_Citizen/videos
https://www.youtube.com/@ethan.caughey/videos
https://www.youtube.com/@MarcInTbilisi/videos
https://www.youtube.com/@climbingmt.sophia/videos
https://www.youtube.com/@Skankenstein/videos
https://www.youtube.com/@UpCycleClub/videos
https://www.youtube.com/@JessPurviance/videos
https://www.youtube.com/@greyhamilton52/videos
https://www.youtube.com/@paulrenenichols/videos
https://www.youtube.com/@OfficialSecularKoranism/videos
https://www.youtube.com/@FromWhomAllBlessingsFlow/videos
https://www.youtube.com/@FoodTruckEmily/videos
https://www.youtube.com/@O.G.Rose.Michelle.and.Daniel/videos
https://www.youtube.com/@JonathanDumeer/videos
https://www.youtube.com/@JordanGreenhall/videos
https://www.youtube.com/@NechamaGluck/videos
https://www.youtube.com/@justinsmorningcoffee/videos
https://www.youtube.com/@grahampardun/videos
https://www.youtube.com/@michaelmartin8681/videos
https://www.youtube.com/@davidbusuttil9086/videos
https://www.youtube.com/@matthewparlato5626/videos
https://www.youtube.com/@lancecleaver227/videos
https://www.youtube.com/@theplebistocrat/videos
https://www.youtube.com/@rigelwindsongthurston/videos
https://www.youtube.com/@RightInChrist/videos
https://www.youtube.com/@RafeKelley/videos
https://www.youtube.com/@WavesOfObsession/videos
https://www.youtube.com/@LeviathanForPlay/videos
https://www.youtube.com/channel/UCehAungJpAeC-F3R5FwvvCQ/videos
https://www.youtube.com/channel/UC4YwC5zA9S_2EwthE27Xlew/videos