Skip to content
All projects
Core engineering libraryEngineer

study_id_patterns.py — Study-ID Regex Registry

BCHPR · 2023 – present

2,611-line centralised registry of 8 study-ID patterns and 14 site-code patterns across Cameroon, Nigeria, and Vietnam projects — with vectorised extraction, validation, classification, and cleaning.

Highlights

  • 8 ID patterns: GHIT Cameroon / Nigeria / Vietnam · Image Quality · Wave 11 screening & testing · RapidTB · Start4All.
  • extract_ids, extract_multi_ids, extract_all_ids for wide / long / multi-column extraction modes.
  • Five validation modes: boolean flags, type assignment, or structured DQA reports (VALID / INVALID_FORMAT / WRONG_PROJECT / WRONG_COUNTRY / DUPLICATE).
  • Pre-compiled regex with @property caching and single-pass combined patterns — O(n) not O(n × p).
  • Negative lookahead for Wave 11 screening-vs-testing mutual exclusivity (IDs ending in X).