TLC-Search/AGENTS.md

32 lines
2.9 KiB
Markdown

# Repository Guidelines
## Project Structure & Module Organization
- Core modules live under `python_app/`: `config.py` centralizes settings, `transcript_collector.py` gathers transcripts, `ingest.py` handles Elasticsearch bulk loads, and `search_app.py` exposes the Flask UI.
- Static assets belong in `static/` (`index.html`, `frequency.html`, companion JS/CSS). Keep HTML here and wire it up through Flask routes.
- Runtime artifacts land in `data/` (`raw/` for downloads, `video_metadata/` for cleaned payloads). Preserve the JSON schema emitted by the collector.
- When adding utilities, place them in `python_app/` and use package-relative imports so scripts continue to run via `python -m`.
## Build, Test, and Development Commands
- `python -m venv .venv && source .venv/bin/activate`: bootstrap the virtualenv used by all scripts.
- `pip install -r requirements.txt`: install Flask, Elasticsearch tooling, Google API clients, and dotenv support.
- `python -m python_app.transcript_collector --channel UC... --output data/raw`: fetch transcript JSON for a channel; rerun to refresh cached data.
- `python -m python_app.ingest --source data/video_metadata --index this_little_corner_py`: index prepared metadata and auto-create mappings when needed.
- `python -m python_app.search_app`: launch the Flask server on port 8080 for UI smoke tests.
## Coding Style & Naming Conventions
- Follow PEP 8 with 4-space indentation, `snake_case` for functions/modules, and `CamelCase` for classes; reserve UPPER_SNAKE_CASE for configuration constants.
- Keep Elasticsearch payload keys lower-case with underscores, and centralize shared values in `config.py` rather than scattering literals.
## Testing Guidelines
- No automated suite is committed yet; when adding coverage, create `tests/` modules using `pytest` with files named `test_*.py`.
- Focus tests on collector pagination, ingest transformations, and Flask route helpers, and run `python -m pytest` locally before opening a PR.
- Manually verify by ingesting a small sample into a local Elasticsearch node and checking facets, highlights, and transcript retrieval via the UI.
## Commit & Pull Request Guidelines
- Mirror the existing history: short, imperative commit subjects (e.g. “Fix results overflow”, “Add video reference tracking”).
- PRs should describe scope, list environment variables or indices touched, link issues, and attach before/after screenshots whenever UI output changes. Highlight Elasticsearch mapping or data migration impacts for both search and frontend reviewers.
## Configuration & Security Tips
- Load credentials through environment variables (`ELASTIC_URL`, `ELASTIC_USERNAME`, `ELASTIC_PASSWORD`, `ELASTIC_API_KEY`, `YOUTUBE_API_KEY`) or a `.env` file, and keep secrets out of version control.
- Adjust `ELASTIC_VERIFY_CERTS`, `ELASTIC_CA_CERT`, and `ELASTIC_DEBUG` only while debugging, and prefer branch-specific indices (`this_little_corner_py_<initials>`) to avoid clobbering shared data.