TLC-Search/AGENTS.md

2.9 KiB

Repository Guidelines

Project Structure & Module Organization

  • Core modules live under python_app/: config.py centralizes settings, transcript_collector.py gathers transcripts, ingest.py handles Elasticsearch bulk loads, and search_app.py exposes the Flask UI.
  • Static assets belong in static/ (index.html, frequency.html, companion JS/CSS). Keep HTML here and wire it up through Flask routes.
  • Runtime artifacts land in data/ (raw/ for downloads, video_metadata/ for cleaned payloads). Preserve the JSON schema emitted by the collector.
  • When adding utilities, place them in python_app/ and use package-relative imports so scripts continue to run via python -m.

Build, Test, and Development Commands

  • python -m venv .venv && source .venv/bin/activate: bootstrap the virtualenv used by all scripts.
  • pip install -r requirements.txt: install Flask, Elasticsearch tooling, Google API clients, and dotenv support.
  • python -m python_app.transcript_collector --channel UC... --output data/raw: fetch transcript JSON for a channel; rerun to refresh cached data.
  • python -m python_app.ingest --source data/video_metadata --index this_little_corner_py: index prepared metadata and auto-create mappings when needed.
  • python -m python_app.search_app: launch the Flask server on port 8080 for UI smoke tests.

Coding Style & Naming Conventions

  • Follow PEP 8 with 4-space indentation, snake_case for functions/modules, and CamelCase for classes; reserve UPPER_SNAKE_CASE for configuration constants.
  • Keep Elasticsearch payload keys lower-case with underscores, and centralize shared values in config.py rather than scattering literals.

Testing Guidelines

  • No automated suite is committed yet; when adding coverage, create tests/ modules using pytest with files named test_*.py.
  • Focus tests on collector pagination, ingest transformations, and Flask route helpers, and run python -m pytest locally before opening a PR.
  • Manually verify by ingesting a small sample into a local Elasticsearch node and checking facets, highlights, and transcript retrieval via the UI.

Commit & Pull Request Guidelines

  • Mirror the existing history: short, imperative commit subjects (e.g. “Fix results overflow”, “Add video reference tracking”).
  • PRs should describe scope, list environment variables or indices touched, link issues, and attach before/after screenshots whenever UI output changes. Highlight Elasticsearch mapping or data migration impacts for both search and frontend reviewers.

Configuration & Security Tips

  • Load credentials through environment variables (ELASTIC_URL, ELASTIC_USERNAME, ELASTIC_PASSWORD, ELASTIC_API_KEY, YOUTUBE_API_KEY) or a .env file, and keep secrets out of version control.
  • Adjust ELASTIC_VERIFY_CERTS, ELASTIC_CA_CERT, and ELASTIC_DEBUG only while debugging, and prefer branch-specific indices (this_little_corner_py_<initials>) to avoid clobbering shared data.