Validation Methodology: Comparing Python QUAL2K with the Legacy Excel Model
The Python QUAL2K engine is validated by running identical inputs through both the legacy EPA QUAL2K Excel/VBA model and the new Python implementation, then comparing all output values. This approach proves numerical equivalence without relying on field data (which tests model calibration, not implementation correctness).
Validation Pipeline
Test Basins
Validation covers multiple river basins of varying complexity:
| Model | Location | Reaches | Point Sources | Diffuse Sources | Features Tested |
|---|---|---|---|---|---|
| Paraluz | Brazil | 2 | 0 | 0 | Basic hydraulics, headwater-only, BOD/DO decay |
| Bolder | Colorado, USA | 50 | 2 | 1 | Large model, many reaches, full constituent set |
| Rio Cauca | Colombia | 17 | 3 | 2 | Tropical river, nitrification-dominant DO dynamics |
| Rio Meléndez | Colombia | 8 | 1 | 0 | Auto-calibration, urban river, WWTP effluent |
Comparison Metrics
For each output variable at each reach, three metrics are computed:
- Absolute difference:
|Python − Excel| - Relative difference (%):
100 × |Python − Excel| / Excel(when Excel ≠ 0) - Maximum error: the largest relative % across all reaches
Pass/Fail Criteria
| Variable Category | Pass Threshold | Rationale |
|---|---|---|
| Hydraulics (U, H, B, Ac) | < 0.1% | Deterministic equations, exact match expected |
| DO, CBOD | < 1.0% | ODE integration differences (Euler vs analytical) |
| Nutrients (N, P) | < 2.0% | Cascading rate interactions can amplify small differences |
| Pathogens | < 1.0% | Simple first-order, exact match expected |
| Temperature | < 0.5% | Direct input, minimal computation |
Known Differences
Running the Validation Suite
The validation suite can be run locally via the Python CLI:
python validate_all_models.pyThis processes all test basins and produces a summary report with pass/fail status and maximum errors per variable. See the Model Comparisons page for actual results.