YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
pypmml RegressionModel normalizationMethod Authority Gap β PoC
Vulnerability class: Structural Invariant / Authority Gap β Score Normalization Function Substitution
Target library: pypmml (Python PMML consumer, JVM-backed via jpmml)
PMML format: .pmml (PMML 4.4, RegressionModel)
Severity: Medium
Submission platform: huntr MFV
Summary
The normalizationMethod attribute on a PMML RegressionModel element controls how raw regression scores are transformed into output probabilities. pypmml executes this attribute verbatim from the PMML XML but does not expose it via its public Model API.
A consumer loading a PMML model via pypmml:
- Cannot detect which normalization function was applied
- Receives no warning when the normalization produces mathematically invalid outputs
- Receives no exception when the normalization produces silent NaN predictions
Two distinct failure modes
| Variant | normalizationMethod |
Effect |
|---|---|---|
mutant_none.pmml |
none |
Raw regression scores returned as probabilities β values such as 3.5, -2.8, -4.0 are returned for a 3-class model. No exception, no warning. |
mutant_logit.pmml |
logit |
All predictions return label=None, prob=NaN for every input. No exception, no warning. |
baseline.pmml |
softmax |
Valid probabilities in [0, 1], sum β 1. Correct behavior. |
API authority gap
from pypmml import Model
model = Model.fromFile("mutant_none.pmml")
model.normalizationMethod # β AttributeError: 'Model' object has no attribute 'normalizationMethod'
The normalizationMethod attribute is not in pypmml's public API surface (algorithmName, classes, dataDictionary, functionName, header, inputFields, inputNames, modelElement, modelName, outputFields, outputNames, predict, targetField). A downstream consumer has no programmatic way to detect which normalization function is being applied.
Repository Contents
| File | Description |
|---|---|
baseline.pmml |
3-class RegressionModel, normalizationMethod=softmax β valid probabilities |
mutant_none.pmml |
Same model, normalizationMethod=none β invalid probability range |
mutant_logit.pmml |
Same model, normalizationMethod=logit β silent NaN predictions |
reproduce.py |
Loads all 3 variants, verifies failure modes, outputs runtime_results.json |
inspect_artifacts.py |
Verifies structural identity (only normalizationMethod differs), outputs hash_matrix.json |
SHA256SUMS.txt |
SHA-256 hashes for all files |
Reproduction
Requirements: Python 3.8+, Java 8+ (for pypmml JVM backend)
pip install pypmml
python reproduce.py
python inspect_artifacts.py
Expected output β reproduce.py
=== baseline (baseline.pmml) ===
{x1:2.0, x2:0.5} -> label=A, A=0.9626, B=0.0083, C=0.0291 [valid]
...
warnings_emitted: []
normalizationMethod API: AttributeError
=== mutant_none (mutant_none.pmml) ===
{x1:2.0, x2:0.5} -> label=A, A=3.5, B=-1.25, C=-1.25 [INVALID_RANGE]
{x1:0.1, x2:3.0} -> label=B, A=-2.8, B=1.9, C=1.9 [INVALID_RANGE]
{x1:-1.0, x2:2.0} -> label=B, A=-4.0, B=2.5, C=2.5 [INVALID_RANGE]
warnings_emitted: []
normalizationMethod API: AttributeError
=== mutant_logit (mutant_logit.pmml) ===
ALL inputs -> label=None, A=nan, B=nan, C=nan [SILENT_NAN]
warnings_emitted: []
normalizationMethod API: AttributeError
[PASS] A1: baseline softmax: all predictions valid
[PASS] A2: mutant_none: at least one prediction has prob out of [0,1]
[PASS] A3: mutant_logit: all predictions silent NaN
[PASS] A4: no warnings emitted for any variant
[PASS] A5: normalizationMethod not accessible via public API
Overall: ALL_PASS
Model Structure
All three PMML files are structurally identical. The only difference is the normalizationMethod attribute:
<!-- baseline.pmml -->
<RegressionModel functionName="classification" normalizationMethod="softmax" targetFieldName="label">
<!-- mutant_none.pmml -->
<RegressionModel functionName="classification" normalizationMethod="none" targetFieldName="label">
<!-- mutant_logit.pmml -->
<RegressionModel functionName="classification" normalizationMethod="logit" targetFieldName="label">
The model uses 3 target classes (A, B, C), 2 numeric predictors (x1, x2), and 3 RegressionTables with fixed coefficients. The 3-class structure is required to reproduce the noneβinvalid-range and logitβNaN behaviors; pypmml handles binary 2-class normalization differently.
Impact
An attacker who can supply a malicious PMML model file can set normalizationMethod to cause:
Invalid probability outputs (
none): downstream ML pipelines treating outputs as probabilities receive values outside[0, 1]. Downstream probability-based decisions (thresholding, calibration, ensemble weighting) silently operate on mathematically invalid inputs.Silent NaN propagation (
logit): all predictions returnNone/NaNwith no exception and no warning. Downstream code checkingpredicted_labelreceivesNone; probability outputs propagateNaNthrough any further computation.Uninspectable attack surface: pypmml's public API does not expose
normalizationMethod. A consuming application cannot detect which normalization was applied and cannot validate that probability outputs are mathematically valid.
Dataset Access
This repository is gated (manual review required). To request access, submit a request through the Hugging Face interface.
License
This repository contains proof-of-concept material for a bug bounty submission. All PMML files and scripts are provided solely for security research purposes.