Skip to content
All projects
Core engineering libraryArchitect

data_quality_manager.py — Enterprise DQA Framework

BCHPR · 28+ instruments · 2023 – present

11,007-line data quality platform with fluent QueryBuilder, persistent query lifecycle tracking, duplicate analysis, and double-data-entry verification across 28+ instruments — with SQLite persistence and Polars acceleration.

Highlights

  • DataFrameComparator with 7 comparison modes (date tolerance, numeric tolerance, string normalisation, blank-matching).
  • AutoQueryTracker: SQLite-backed open/closed/aged query lifecycle with stable hash IDs across runs.
  • DuplicateRecordAnalyzer: detects duplicates within / across instruments with concordance % and reconciliation plans.
  • DoubleDataEntryTracker: quality scores (concordance × completion) with persistent discordance detection.
  • DataValidationTracker: SQLite row-loss history raising DataValidationError before overwriting files with fewer rows.
  • Formal 7-dimension DQ taxonomy (Completeness · Validity · Accuracy · Consistency · Timeliness · Uniqueness · Integrity).