Skip to content
All projects
Research programmeLead Data Engineer

GeneXpert MTB/RIF — 16-Lab Pooled Testing Pipeline

BCHPR · 16 GeneXpert laboratories, Cameroon · 2023 – present

4,942-line pipeline ingesting multi-site GeneXpert MTB/RIF CSV exports, harmonising French / English bilingual records, and consolidating pooled + individual TB test results — the data backbone of the 2026 openRxiv preprint (first & corresponding author).

Highlights

  • Four simultaneous GeneXpert CSV export formats handled with encoding auto-detection.
  • 200+ French ↔ English translation dictionary for lab-specific terms.
  • Pool-ID deconvolution with SQLite tracking — maps each positive pool back to individual participant results.
  • Fallback regex for study-ID pattern extraction from malformed Notes fields (INSPIRE · S4A · Rapid TB · FujiLAM IDs all detected).
  • Negative test-duration auto-correction for midnight-rollover timestamps.
  • MTB/RIF grade parsing (high / medium / low / trace / negative).
  • Streaming mode activation above 10,000 files to prevent OOM.
  • Site / region / lab / project hierarchy extracted from folder paths.
  • Unprocessed-file notification daemon with Teams alerts on import failures.
  • Evidence for the WHO 9 March 2026 sputum-pooling recommendation.