Skip to content
All projects
Platform & infrastructurePlatform Engineer · DAG Factory Architect

28 Apache Airflow Production Pipelines

BCHPR · CHPR_DAGS.py (1,242 lines) · 2023 – present

Production Airflow 3 orchestration for every BCHPR data pipeline — 28 DAGs running from every 10 minutes to 3-hourly, with a custom FileMtimeSensor, Slack + email alerting, and a reusable DAG factory pattern.

Highlights

  • 28 production DAGs covering Wave 11, GHIT FujiLAM II, NPOC, Start4All, Viral Load, Xpert, Truenat, Pluslife, Image Quality, Inventory Management, Specimen Transport, TB Treatment, Chest X-ray, Culture, TBRL Lab, Collaborators, UCD DDE Check, Global Outcome, and maintenance jobs.
  • Custom `FileMtimeSensor` — rescheduling sensor that triggers on file / directory mtime changes, with SHA-based baselines stored in Airflow Variables (no false positives on first run).
  • `build_script_dag` factory pattern — parameterised DAG construction keeps 1,242 lines maintainable across 28 DAGs.
  • Cadences — every 10 min (Culture) · every 15 min (cleanup, UCD DDE) · 30 min (user activity, specimen transport, lab PDF) · 45 min (FujiLAM UA) · hourly (most pipelines) · 3-hourly (reports) · daily (DQ scoring, collaborators).
  • Timezone-aware scheduling (Africa/Douala) with 60-min execution timeout (120 min for GHIT) and 180-min dagrun timeout.
  • Failure alerting — Slack via webhook or `SlackWebhookHook` connection, plus email to two addresses, with 2× exponential-backoff retries.
  • Airflow 3.x SDK with Airflow <3 fallback compatibility; cross-platform path normalisation (Windows ↔ WSL) baked into every job.
  • Runtime config resolution: env vars → Airflow Variables → defaults; `ShortCircuitOperator` guards skip runs when no upstream change detected.