You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

pypmml RegressionModel normalizationMethod Authority Gap — PoC

Vulnerability class: Structural Invariant / Authority Gap — Score Normalization Function Substitution
Target library: pypmml (Python PMML consumer, JVM-backed via jpmml)
PMML format: .pmml (PMML 4.4, RegressionModel)
Severity: Medium
Submission platform: huntr MFV

Summary

The normalizationMethod attribute on a PMML RegressionModel element controls how raw regression scores are transformed into output probabilities. pypmml executes this attribute verbatim from the PMML XML but does not expose it via its public Model API.

A consumer loading a PMML model via pypmml:

Cannot detect which normalization function was applied
Receives no warning when the normalization produces mathematically invalid outputs
Receives no exception when the normalization produces silent NaN predictions

Two distinct failure modes

Variant	`normalizationMethod`	Effect
`mutant_none.pmml`	`none`	Raw regression scores returned as probabilities — values such as `3.5`, `-2.8`, `-4.0` are returned for a 3-class model. No exception, no warning.
`mutant_logit.pmml`	`logit`	All predictions return `label=None`, `prob=NaN` for every input. No exception, no warning.
`baseline.pmml`	`softmax`	Valid probabilities in `[0, 1]`, sum ≈ 1. Correct behavior.

API authority gap

from pypmml import Model
model = Model.fromFile("mutant_none.pmml")
model.normalizationMethod  # → AttributeError: 'Model' object has no attribute 'normalizationMethod'

The normalizationMethod attribute is not in pypmml's public API surface (algorithmName, classes, dataDictionary, functionName, header, inputFields, inputNames, modelElement, modelName, outputFields, outputNames, predict, targetField). A downstream consumer has no programmatic way to detect which normalization function is being applied.

Repository Contents

File	Description
`baseline.pmml`	3-class RegressionModel, `normalizationMethod=softmax` — valid probabilities
`mutant_none.pmml`	Same model, `normalizationMethod=none` — invalid probability range
`mutant_logit.pmml`	Same model, `normalizationMethod=logit` — silent NaN predictions
`reproduce.py`	Loads all 3 variants, verifies failure modes, outputs `runtime_results.json`
`inspect_artifacts.py`	Verifies structural identity (only `normalizationMethod` differs), outputs `hash_matrix.json`
`SHA256SUMS.txt`	SHA-256 hashes for all files

Reproduction

Requirements: Python 3.8+, Java 8+ (for pypmml JVM backend)

pip install pypmml
python reproduce.py
python inspect_artifacts.py

Expected output — `reproduce.py`

=== baseline (baseline.pmml) ===
  {x1:2.0, x2:0.5} -> label=A, A=0.9626, B=0.0083, C=0.0291  [valid]
  ...
  warnings_emitted: []
  normalizationMethod API: AttributeError

=== mutant_none (mutant_none.pmml) ===
  {x1:2.0, x2:0.5} -> label=A, A=3.5, B=-1.25, C=-1.25  [INVALID_RANGE]
  {x1:0.1, x2:3.0} -> label=B, A=-2.8, B=1.9, C=1.9    [INVALID_RANGE]
  {x1:-1.0, x2:2.0} -> label=B, A=-4.0, B=2.5, C=2.5   [INVALID_RANGE]
  warnings_emitted: []
  normalizationMethod API: AttributeError

=== mutant_logit (mutant_logit.pmml) ===
  ALL inputs -> label=None, A=nan, B=nan, C=nan  [SILENT_NAN]
  warnings_emitted: []
  normalizationMethod API: AttributeError

[PASS] A1: baseline softmax: all predictions valid
[PASS] A2: mutant_none: at least one prediction has prob out of [0,1]
[PASS] A3: mutant_logit: all predictions silent NaN
[PASS] A4: no warnings emitted for any variant
[PASS] A5: normalizationMethod not accessible via public API

Overall: ALL_PASS

Model Structure

All three PMML files are structurally identical. The only difference is the normalizationMethod attribute:

<!-- baseline.pmml -->
<RegressionModel functionName="classification" normalizationMethod="softmax" targetFieldName="label">

<!-- mutant_none.pmml -->
<RegressionModel functionName="classification" normalizationMethod="none" targetFieldName="label">

<!-- mutant_logit.pmml -->
<RegressionModel functionName="classification" normalizationMethod="logit" targetFieldName="label">

The model uses 3 target classes (A, B, C), 2 numeric predictors (x1, x2), and 3 RegressionTables with fixed coefficients. The 3-class structure is required to reproduce the none→invalid-range and logit→NaN behaviors; pypmml handles binary 2-class normalization differently.

Impact

An attacker who can supply a malicious PMML model file can set normalizationMethod to cause:

Invalid probability outputs (none): downstream ML pipelines treating outputs as probabilities receive values outside [0, 1]. Downstream probability-based decisions (thresholding, calibration, ensemble weighting) silently operate on mathematically invalid inputs.
Silent NaN propagation (logit): all predictions return None/NaN with no exception and no warning. Downstream code checking predicted_label receives None; probability outputs propagate NaN through any further computation.
Uninspectable attack surface: pypmml's public API does not expose normalizationMethod. A consuming application cannot detect which normalization was applied and cannot validate that probability outputs are mathematically valid.

Dataset Access

This repository is gated (manual review required). To request access, submit a request through the Hugging Face interface.

License

This repository contains proof-of-concept material for a bug bounty submission. All PMML files and scripts are provided solely for security research purposes.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support