2.9 KiB
2.9 KiB
Repository Guidelines
Project Structure & Module Organization
- Core modules live under
python_app/:config.pycentralizes settings,transcript_collector.pygathers transcripts,ingest.pyhandles Elasticsearch bulk loads, andsearch_app.pyexposes the Flask UI. - Static assets belong in
static/(index.html,frequency.html, companion JS/CSS). Keep HTML here and wire it up through Flask routes. - Runtime artifacts land in
data/(raw/for downloads,video_metadata/for cleaned payloads). Preserve the JSON schema emitted by the collector. - When adding utilities, place them in
python_app/and use package-relative imports so scripts continue to run viapython -m.
Build, Test, and Development Commands
python -m venv .venv && source .venv/bin/activate: bootstrap the virtualenv used by all scripts.pip install -r requirements.txt: install Flask, Elasticsearch tooling, Google API clients, and dotenv support.python -m python_app.transcript_collector --channel UC... --output data/raw: fetch transcript JSON for a channel; rerun to refresh cached data.python -m python_app.ingest --source data/video_metadata --index this_little_corner_py: index prepared metadata and auto-create mappings when needed.python -m python_app.search_app: launch the Flask server on port 8080 for UI smoke tests.
Coding Style & Naming Conventions
- Follow PEP 8 with 4-space indentation,
snake_casefor functions/modules, andCamelCasefor classes; reserve UPPER_SNAKE_CASE for configuration constants. - Keep Elasticsearch payload keys lower-case with underscores, and centralize shared values in
config.pyrather than scattering literals.
Testing Guidelines
- No automated suite is committed yet; when adding coverage, create
tests/modules usingpytestwith files namedtest_*.py. - Focus tests on collector pagination, ingest transformations, and Flask route helpers, and run
python -m pytestlocally before opening a PR. - Manually verify by ingesting a small sample into a local Elasticsearch node and checking facets, highlights, and transcript retrieval via the UI.
Commit & Pull Request Guidelines
- Mirror the existing history: short, imperative commit subjects (e.g. “Fix results overflow”, “Add video reference tracking”).
- PRs should describe scope, list environment variables or indices touched, link issues, and attach before/after screenshots whenever UI output changes. Highlight Elasticsearch mapping or data migration impacts for both search and frontend reviewers.
Configuration & Security Tips
- Load credentials through environment variables (
ELASTIC_URL,ELASTIC_USERNAME,ELASTIC_PASSWORD,ELASTIC_API_KEY,YOUTUBE_API_KEY) or a.envfile, and keep secrets out of version control. - Adjust
ELASTIC_VERIFY_CERTS,ELASTIC_CA_CERT, andELASTIC_DEBUGonly while debugging, and prefer branch-specific indices (this_little_corner_py_<initials>) to avoid clobbering shared data.