# Repository Guidelines ## Project Structure & Module Organization - Core modules live under `python_app/`: `config.py` centralizes settings, `transcript_collector.py` gathers transcripts, `ingest.py` handles Elasticsearch bulk loads, and `search_app.py` exposes the Flask UI. - Static assets belong in `static/` (`index.html`, `frequency.html`, companion JS/CSS). Keep HTML here and wire it up through Flask routes. - Runtime artifacts land in `data/` (`raw/` for downloads, `video_metadata/` for cleaned payloads). Preserve the JSON schema emitted by the collector. - When adding utilities, place them in `python_app/` and use package-relative imports so scripts continue to run via `python -m`. ## Build, Test, and Development Commands - `python -m venv .venv && source .venv/bin/activate`: bootstrap the virtualenv used by all scripts. - `pip install -r requirements.txt`: install Flask, Elasticsearch tooling, Google API clients, and dotenv support. - `python -m python_app.transcript_collector --channel UC... --output data/raw`: fetch transcript JSON for a channel; rerun to refresh cached data. - `python -m python_app.ingest --source data/video_metadata --index this_little_corner_py`: index prepared metadata and auto-create mappings when needed. - `python -m python_app.search_app`: launch the Flask server on port 8080 for UI smoke tests. ## Coding Style & Naming Conventions - Follow PEP 8 with 4-space indentation, `snake_case` for functions/modules, and `CamelCase` for classes; reserve UPPER_SNAKE_CASE for configuration constants. - Keep Elasticsearch payload keys lower-case with underscores, and centralize shared values in `config.py` rather than scattering literals. ## Testing Guidelines - No automated suite is committed yet; when adding coverage, create `tests/` modules using `pytest` with files named `test_*.py`. - Focus tests on collector pagination, ingest transformations, and Flask route helpers, and run `python -m pytest` locally before opening a PR. - Manually verify by ingesting a small sample into a local Elasticsearch node and checking facets, highlights, and transcript retrieval via the UI. ## Commit & Pull Request Guidelines - Mirror the existing history: short, imperative commit subjects (e.g. “Fix results overflow”, “Add video reference tracking”). - PRs should describe scope, list environment variables or indices touched, link issues, and attach before/after screenshots whenever UI output changes. Highlight Elasticsearch mapping or data migration impacts for both search and frontend reviewers. ## Configuration & Security Tips - Load credentials through environment variables (`ELASTIC_URL`, `ELASTIC_USERNAME`, `ELASTIC_PASSWORD`, `ELASTIC_API_KEY`, `YOUTUBE_API_KEY`) or a `.env` file, and keep secrets out of version control. - Adjust `ELASTIC_VERIFY_CERTS`, `ELASTIC_CA_CERT`, and `ELASTIC_DEBUG` only while debugging, and prefer branch-specific indices (`this_little_corner_py_`) to avoid clobbering shared data.