gatohep.models#

class gatohep.models.gato_gmm_model(n_cats, dim, temperature=1.0, mean_norm='softmax', mean_range=(0.0, 1.0), cov_offdiag_damping=0.1, name='gato_gmm_model')#

Bases: Module

A differentiable category model based on a Gaussian mixture.

The model learns, for each of n_cats:
  • Mixture logits (which give the mixing weights),

  • Mean vector (of dimension dim),

  • An unconstrained lower-triangular matrix that is transformed into a positive-definite Cholesky factor for the covariance.

The per-event soft membership is computed by evaluating the log pdf of each Gaussian at the event’s feature vector and adding the log mixture weight. A temperatured softmax is then applied.

n_cats#

Number of categories (Gaussian components).

Type:

int

dim#

Dimensionality of the feature space.

Type:

int

temperature#

Temperature parameter for the softmax function.

Type:

float

mean_norm#

Strategy that constrains the component means at initialisation time:

  • "softmax" - Raw means are passed through a softmax over dim + 1 logits, so every mean lies on the dim-simplex Recommended for softmax-classifier outputs.

  • "sigmoid" - Raw means are transformed with a component-wise sigmoid and then linearly scaled into mean_range. Recommended for a feature space e.g. spanned by mutliple 1D discriminants. The range of each component can be customized with the mean_range parameter.

Default is "softmax".

Type:

{“softmax”, “sigmoid”}, optional

mean_range#

Lower and upper bounds that define the allowed interval(s) when mean_norm="sigmoid". Accepts * a single (lo, hi) tuple, applied to every dimension, or * a list/tuple of dim separate (lo, hi) pairs for per-dimension ranges.

Type:

tuple(float, float) or sequence of tuples, optional

mixture_logits#

Trainable logits for the mixture weights.

Type:

tf.Variable

means#

Trainable mean vectors for each Gaussian component.

Type:

tf.Variable

unconstrained_L#

Trainable unconstrained lower-triangular matrices for covariance factors.

Type:

tf.Variable

call(data_dict)#

Placeholder method for computing yields and loss.

Parameters:

data_dict (dict) – A dictionary of input data tensors.

Raises:

NotImplementedError – This method must be overridden in subclasses.

compute_hard_bkg_stats(data_dict, signal_labels=None, eps=1e-08)#

Compute per-bin background yields and their relative statistical uncertainties, then sort the bins by combined signal significance (or in 1D by position).

Parameters:
  • data_dict (Mapping[str, dict]) –

    Dictionary of event collections. Each value must contain

    • "NN_output" - tensor/array with shape (N, dim).

    • "weight" - tensor/array with shape (N,).

  • signal_labels (Sequence[str] or None, optional) – Names of the processes that should be treated as signal. If None (default), every key that starts with "signal" is considered a signal process.

  • eps (float, optional) – Small constant to avoid division by zero when computing relative uncertainties. Default is 1e-8.

Returns:

  • B_sorted (np.ndarray) – Background yields per bin, sorted in descending signal significance (shape (n_cats,)).

  • rel_unc_sorted (np.ndarray) – Relative statistical uncertainties for the same bins sqrt(sum w^2) / sum w (shape (n_cats,)).

  • order (np.ndarray) – Indices that map the sorted arrays back to the original bin order (dtype np.int32, shape (n_cats,)).

get_bias(data_dict, temperature=None, eps=1e-08)#

Quantify the per-bin bias introduced when the discrete arg-max assignment is approximated by a softmax with finite temperature.

The bias for bin k is defined as

\[\text{bias}_k \;=\; \frac{B^{\text{hard}}_k \, - \, B^{\text{soft}}_k} {B^{\text{hard}}_k}\]

where

  • \(B^{\text{hard}}_k\) is the sum of event weights that fall into the bin when events are assigned by argmax,

  • \(B^{\text{soft}}_k\) is the sum of the same weights multiplied by their soft-assignment probability \(\gamma_{ik}\).

Parameters:
  • data_dict (Mapping[str, dict]) – Dictionary of event collections. Each inner dict must contain the keys "NN_output" and "weight" exactly as in the training loop.

  • temperature (float or None, optional) – Softmax temperature. If None, the instance attribute self.temperature is used.

  • eps (float, optional) – Tiny constant to protect against division by zero. Default is 1e-8.

Returns:

One-dimensional array of length n_cats with the bias for every bin.

Return type:

np.ndarray

get_bin_indices(data, temperature=None)#

Convert input events into hard bin indices.

This is a convenience wrapper that first calls get_probs() to obtain the soft-assignment matrix \(\gamma_{ik}\) and then selects the bin with the largest probability for each event.

Parameters:
  • data (Union[tf.Tensor, np.ndarray, Mapping[str, Any]]) – Input data describing one or more event collections. * Tensor / array - shape (N, dim) where N is the number of events and dim is the feature dimension. * Mapping - a dictionary whose values are tensors/arrays or nested dicts that contain a key "NN_output" holding the data tensor (mimicking the structure used in gato-hep training examples).

  • temperature (float, optional) – Temperature factor for the softmax used inside get_probs(). If None (default), the instance attribute self.temperature is used.

Returns:

Hard bin indices (dtype tf.int32). The shape is (N,) when the input is a single tensor/array. If data is a mapping, the function returns a dictionary with the same keys and (N,) vectors as values.

Return type:

Union[tf.Tensor, Mapping[str, tf.Tensor]]

get_differentiable_significance(data_dict, *, signal_labels, background_reweight=None, reweight_processes=None, return_details=False)#

Compute differentiable Asimov significances for arbitrary signal sets.

Parameters:
  • data_dict (Mapping[str, dict]) – Input tensors with at least "NN_output" and "weight" fields.

  • signal_labels (Sequence[str]) – Names of the processes to treat as signal. The order is preserved in the returned mapping.

  • background_reweight (array_like, optional) – Per-category scale factors (length = n_cats) applied to the accumulated background yield. Defaults to None (no scaling).

  • reweight_processes (Sequence[str], optional) – Subset of background process names that should receive the reweighting factors. If provided, only these processes are scaled; otherwise all background processes are.

  • return_details (bool, optional) – If True, also return a dictionary with the per-bin yields used to compute the significances.

Returns:

  • OrderedDict – Mapping signal_label -> tf.Tensor with the differentiable significance for each signal.

  • tf.Tensor, optional – Per-bin background yields (only if return_details is True).

  • tf.Tensor, optional – Per-bin background sum of squared weights (if return_details is True).

get_effective_boundaries_1d(*, n_points=100000, return_mapping=False)#

Find the 1-D decision boundaries implied by the current GMM.

The method probes the physical data range, converts those probe points to hard bin indices via get_bin_indices(), and records where the index changes.

Parameters:
  • n_points (int, optional) – Number of evenly spaced probe points. Default is 5 000.

  • return_mapping (bool, optional) – If True, also return a permutation that orders categories from left to right.

Returns:

  • boundaries (tf.Tensor, shape (n_cats - 1,)) – Boundary locations in the same scale as the input data.

  • order (tf.Tensor, shape (n_cats,), optional) – Category permutation, only if return_mapping is True.

get_effective_means()#

Return the means already mapped into the user’s requested space.

Shape: (n_cats, dim)

get_effective_parameters()#

Retrieve the learned mixture weights, means, and covariance factors.

Returns:

A dictionary containing the mixture weights, means, and scale factors.

Return type:

dict

get_mixture_pdf()#

Full Gaussian-mixture distribution for the current parameters.

Returns:

Ready to call log_prob or sample.

Return type:

tfd.MixtureSameFamily

get_mixture_weight()#

Log-space mixture weights log pi_k obtained via log-softmax.

Returns:

Shape (n_cats,); tf.exp(result) sums to 1.

Return type:

tf.Tensor

get_probs(data, temperature=None)#

Return the soft assignment matrix gamma_ik for any input form.

The method evaluates the log-pdf of every GMM component via the model’s get_mixture_pdf() helper—so it automatically uses the current means, covariances, and softmax-normalised mixture weights stored in self.mixture_logits. The per-component log-probabilities log p_k(x) are combined with the log-weights and converted to probabilities through a temperature-scaled soft-max:

gamma_ik = softmax((log p_k(x_i) + log pi_k) / T)

Parameters:
  • data

    • tf.Tensor of shape (N, dim)

    • np.ndarray with the same shape

    • dict mapping process names to either tensors/arrays or nested dicts that contain a key "NN_output", exactly like the training loop uses.

  • temperature (float, optional) – Soft-max temperature T. If None, the instance attribute self.temperature is used. Smaller values make the weights approach a hard arg-max; larger values smooth them out.

Returns:

  • If data is a tensor/array: a tensor of shape (N, n_cats) containing the soft weights for each event.

  • If data is a mapping: a dict with the same keys and weight tensors as values.

Return type:

Union[tf.Tensor, dict]

get_scale_tril()#

Compute the lower-triangular scale factors for the covariance matrices.

Returns:

A tensor of shape (n_cats, dim, dim) representing the lower-triangular scale factors for each Gaussian component.

Return type:

tf.Tensor

restore(path)#

Restore the model’s trainable variables from a checkpoint.

Parameters:

path (str) – Directory path to load the checkpoint from.

save(path)#

Save the model’s trainable variables to a checkpoint.

Parameters:

path (str) – Directory path to save the checkpoint.

class gatohep.models.gato_sigmoid_model(variables_config, *, global_steepness=5.0, name='gato_sigmoid_model')#

Bases: Module

Gato model for optimisation of cuts based on sigmoid-approximated boundaries. Can be applied to multiple discriminants, each with its own number of bins and steepness.

Each discriminant j is split into n_j bins by (n_j - 1) trainable cut points b_{j,i}. The full event bin is the Cartesian product of the one-dimensional bins.

Parameters:
  • variables_config (list of dict) –

    One entry per discriminant, for example:

    [
        {"name": "disc", "bins": 3, "range": (0.0, 1.0)},
        {"name": "mjj",  "bins": 2, "range": (200.0, 3000.0)},
    ]
    
    binsint

    Number of bins (> 1) for this variable.

    rangetuple(float, float)

    Inclusive lower and upper bound of the variable.

    namestr, optional

    Plain-text label used only in logs.

    steepnessfloat, optional

    Individual initial slope k_j. If omitted the global global_steepness is used.

  • global_steepness (float, optional) – Default initial steepness k for variables that do not override it. Can be annealed during training.

  • name (str, optional) – TensorFlow name scope.

  • helpers (Public)

  • --------------

  • get_probs(x) – Soft assignment gamma_ik (shape N x n_cats).

  • get_bin_indices(x) – Hard bin index per event (shape N,).

  • get_bias(data) – Per-bin bias (hard minus soft) divided by hard yield.

calculate_boundaries(j=0)#

Return the ordered physical boundaries for variable j.

Parameters:

j (int, optional) – Index of the discriminant whose boundaries are requested.

Returns:

Tensor of shape (n_bins - 1,) with boundary locations expressed in the variable’s original scale.

Return type:

tf.Tensor

compute_hard_bkg_stats(data_dict, signal_labels=None, eps=1e-08)#

Compute per-bin background yields and their relative statistical uncertainties.

Parameters:
  • data_dict (Mapping[str, dict]) –

    Dictionary of event collections. Each value must contain

    • "NN_output" - tensor/array with shape (N, dim).

    • "weight" - tensor/array with shape (N,).

  • signal_labels (Sequence[str] or None, optional) – Names of the processes that should be treated as signal. If None (default), every key that starts with "signal" is considered a signal process.

  • eps (float, optional) – Small constant to avoid division by zero when computing relative uncertainties. Default is 1e-8.

Returns:

  • B_sorted (np.ndarray) – Background yields per bin (shape (n_cats,)).

  • rel_unc_sorted (np.ndarray) – Relative statistical uncertainties for the same bins sqrt(sum w^2) / sum w (shape (n_cats,)).

get_bias(data_dict, *, steepness_scale=None, eps=1e-08)#

Estimate the per-bin bias from using soft assignments.

Parameters:
  • data_dict (Mapping[str, dict]) – Event collections with "NN_output" and "weight" entries, identical to what get_probs() expects.

  • steepness_scale (float, optional) – Additional factor applied to every steepness value before computing probabilities. Defaults to None for no rescaling.

  • eps (float, optional) – Small positive constant that guards against division by zero.

Returns:

One-dimensional array of length n_cats containing (hard - soft) / hard for each bin.

Return type:

np.ndarray

get_bin_indices(data, *, steepness_scale=None)#

Convert input data into hard bin indices.

Parameters:
  • data (Union[tf.Tensor, np.ndarray, Mapping[str, Any]]) – Input events as a tensor/array of shape (N, n_disc) or a mapping that mirrors the training loop structure with "NN_output" keys.

  • steepness_scale (float, optional) – Multiplicative factor applied to every sigmoid steepness. None (default) keeps the model’s stored values.

Returns:

Hard bin indices with dtype tf.int32. When data is a mapping, the result is a dict with the same keys and (N,) tensors. Otherwise a single (N,) tensor is returned.

Return type:

Union[tf.Tensor, dict]

get_differentiable_significance(data_dict, *, signal_labels, background_reweight=None, reweight_processes=None, return_details=False)#

Compute differentiable Asimov significances for sigmoid-based models.

Parameters:
  • data_dict (Mapping[str, dict]) – Input tensors with at least "NN_output" and "weight" fields.

  • signal_labels (Sequence[str]) – Names of the processes treated as signal.

  • background_reweight (array_like, optional) – Per-category scale factors applied to the accumulated background yield (length = n_cats). None disables reweighting.

  • reweight_processes (Sequence[str], optional) – Background process names that should be scaled by the provided factors. If omitted, all background processes receive the reweighting.

  • return_details (bool, optional) – If True, also return the per-bin yield tensors used internally.

Returns:

  • OrderedDict – Map from signal label to differentiable significance.

  • tf.Tensor, optional – Background yield per bin (if return_details is True).

  • tf.Tensor, optional – Background sum of squared weights per bin (if return_details is True).

get_probs(data, *, steepness_scale=None)#

Soft weights gamma_ik for arbitrary input structure.

Accepts the same tensor / dict shapes used in the GMM example.

restore(path)#

Restore the model’s trainable variables from a checkpoint.

Parameters:

path (str) – Directory path to load the checkpoint from.

save(path)#

Save the model’s trainable variables to a checkpoint.

Parameters:

path (str) – Directory path to save the checkpoint.