gatohep.models#
- class gatohep.models.gato_gmm_model(n_cats, dim, temperature=1.0, mean_norm='softmax', mean_range=(0.0, 1.0), cov_offdiag_damping=0.1, name='gato_gmm_model')#
Bases:
ModuleA differentiable category model based on a Gaussian mixture.
- The model learns, for each of n_cats:
Mixture logits (which give the mixing weights),
Mean vector (of dimension dim),
An unconstrained lower-triangular matrix that is transformed into a positive-definite Cholesky factor for the covariance.
The per-event soft membership is computed by evaluating the log pdf of each Gaussian at the event’s feature vector and adding the log mixture weight. A temperatured softmax is then applied.
- n_cats#
Number of categories (Gaussian components).
- Type:
int
- dim#
Dimensionality of the feature space.
- Type:
int
- temperature#
Temperature parameter for the softmax function.
- Type:
float
- mean_norm#
Strategy that constrains the component means at initialisation time:
"softmax"- Raw means are passed through a softmax overdim + 1logits, so every mean lies on the dim-simplex Recommended for softmax-classifier outputs."sigmoid"- Raw means are transformed with a component-wise sigmoid and then linearly scaled intomean_range. Recommended for a feature space e.g. spanned by mutliple 1D discriminants. The range of each component can be customized with the mean_range parameter.
Default is
"softmax".- Type:
{“softmax”, “sigmoid”}, optional
- mean_range#
Lower and upper bounds that define the allowed interval(s) when
mean_norm="sigmoid". Accepts * a single(lo, hi)tuple, applied to every dimension, or * a list/tuple ofdimseparate(lo, hi)pairs for per-dimension ranges.- Type:
tuple(float, float) or sequence of tuples, optional
- mixture_logits#
Trainable logits for the mixture weights.
- Type:
tf.Variable
- means#
Trainable mean vectors for each Gaussian component.
- Type:
tf.Variable
- unconstrained_L#
Trainable unconstrained lower-triangular matrices for covariance factors.
- Type:
tf.Variable
- call(data_dict)#
Placeholder method for computing yields and loss.
- Parameters:
data_dict (dict) – A dictionary of input data tensors.
- Raises:
NotImplementedError – This method must be overridden in subclasses.
- compute_hard_bkg_stats(data_dict, signal_labels=None, eps=1e-08)#
Compute per-bin background yields and their relative statistical uncertainties, then sort the bins by combined signal significance (or in 1D by position).
- Parameters:
data_dict (Mapping[str, dict]) –
Dictionary of event collections. Each value must contain
"NN_output"- tensor/array with shape(N, dim)."weight"- tensor/array with shape(N,).
signal_labels (Sequence[str] or None, optional) – Names of the processes that should be treated as signal. If None (default), every key that starts with
"signal"is considered a signal process.eps (float, optional) – Small constant to avoid division by zero when computing relative uncertainties. Default is
1e-8.
- Returns:
B_sorted (np.ndarray) – Background yields per bin, sorted in descending signal significance (shape
(n_cats,)).rel_unc_sorted (np.ndarray) – Relative statistical uncertainties for the same bins
sqrt(sum w^2) / sum w(shape(n_cats,)).order (np.ndarray) – Indices that map the sorted arrays back to the original bin order (dtype
np.int32, shape(n_cats,)).
- get_bias(data_dict, temperature=None, eps=1e-08)#
Quantify the per-bin bias introduced when the discrete arg-max assignment is approximated by a softmax with finite temperature.
The bias for bin k is defined as
\[\text{bias}_k \;=\; \frac{B^{\text{hard}}_k \, - \, B^{\text{soft}}_k} {B^{\text{hard}}_k}\]where
\(B^{\text{hard}}_k\) is the sum of event weights that fall into the bin when events are assigned by argmax,
\(B^{\text{soft}}_k\) is the sum of the same weights multiplied by their soft-assignment probability \(\gamma_{ik}\).
- Parameters:
data_dict (Mapping[str, dict]) – Dictionary of event collections. Each inner dict must contain the keys
"NN_output"and"weight"exactly as in the training loop.temperature (float or None, optional) – Softmax temperature. If None, the instance attribute
self.temperatureis used.eps (float, optional) – Tiny constant to protect against division by zero. Default is
1e-8.
- Returns:
One-dimensional array of length
n_catswith the bias for every bin.- Return type:
np.ndarray
- get_bin_indices(data, temperature=None)#
Convert input events into hard bin indices.
This is a convenience wrapper that first calls
get_probs()to obtain the soft-assignment matrix \(\gamma_{ik}\) and then selects the bin with the largest probability for each event.- Parameters:
data (Union[tf.Tensor, np.ndarray, Mapping[str, Any]]) – Input data describing one or more event collections. * Tensor / array - shape
(N, dim)where N is the number of events and dim is the feature dimension. * Mapping - a dictionary whose values are tensors/arrays or nested dicts that contain a key"NN_output"holding the data tensor (mimicking the structure used in gato-hep training examples).temperature (float, optional) – Temperature factor for the softmax used inside
get_probs(). If None (default), the instance attributeself.temperatureis used.
- Returns:
Hard bin indices (dtype
tf.int32). The shape is(N,)when the input is a single tensor/array. If data is a mapping, the function returns a dictionary with the same keys and(N,)vectors as values.- Return type:
Union[tf.Tensor, Mapping[str, tf.Tensor]]
- get_differentiable_significance(data_dict, *, signal_labels, background_reweight=None, reweight_processes=None, return_details=False)#
Compute differentiable Asimov significances for arbitrary signal sets.
- Parameters:
data_dict (Mapping[str, dict]) – Input tensors with at least
"NN_output"and"weight"fields.signal_labels (Sequence[str]) – Names of the processes to treat as signal. The order is preserved in the returned mapping.
background_reweight (array_like, optional) – Per-category scale factors (length =
n_cats) applied to the accumulated background yield. Defaults toNone(no scaling).reweight_processes (Sequence[str], optional) – Subset of background process names that should receive the reweighting factors. If provided, only these processes are scaled; otherwise all background processes are.
return_details (bool, optional) – If True, also return a dictionary with the per-bin yields used to compute the significances.
- Returns:
OrderedDict – Mapping
signal_label -> tf.Tensorwith the differentiable significance for each signal.tf.Tensor, optional – Per-bin background yields (only if
return_detailsis True).tf.Tensor, optional – Per-bin background sum of squared weights (if
return_detailsis True).
- get_effective_boundaries_1d(*, n_points=100000, return_mapping=False)#
Find the 1-D decision boundaries implied by the current GMM.
The method probes the physical data range, converts those probe points to hard bin indices via
get_bin_indices(), and records where the index changes.- Parameters:
n_points (int, optional) – Number of evenly spaced probe points. Default is 5 000.
return_mapping (bool, optional) – If True, also return a permutation that orders categories from left to right.
- Returns:
boundaries (tf.Tensor, shape (n_cats - 1,)) – Boundary locations in the same scale as the input data.
order (tf.Tensor, shape (n_cats,), optional) – Category permutation, only if return_mapping is True.
- get_effective_means()#
Return the means already mapped into the user’s requested space.
Shape: (n_cats, dim)
- get_effective_parameters()#
Retrieve the learned mixture weights, means, and covariance factors.
- Returns:
A dictionary containing the mixture weights, means, and scale factors.
- Return type:
dict
- get_mixture_pdf()#
Full Gaussian-mixture distribution for the current parameters.
- Returns:
Ready to call
log_proborsample.- Return type:
tfd.MixtureSameFamily
- get_mixture_weight()#
Log-space mixture weights log pi_k obtained via log-softmax.
- Returns:
Shape
(n_cats,);tf.exp(result)sums to 1.- Return type:
tf.Tensor
- get_probs(data, temperature=None)#
Return the soft assignment matrix
gamma_ikfor any input form.The method evaluates the log-pdf of every GMM component via the model’s
get_mixture_pdf()helper—so it automatically uses the current means, covariances, and softmax-normalised mixture weights stored inself.mixture_logits. The per-component log-probabilitieslog p_k(x)are combined with the log-weights and converted to probabilities through a temperature-scaled soft-max:gamma_ik = softmax((log p_k(x_i) + log pi_k) / T)- Parameters:
data –
tf.Tensor of shape
(N, dim)np.ndarray with the same shape
dict mapping process names to either tensors/arrays or nested dicts that contain a key
"NN_output", exactly like the training loop uses.
temperature (float, optional) – Soft-max temperature T. If None, the instance attribute
self.temperatureis used. Smaller values make the weights approach a hard arg-max; larger values smooth them out.
- Returns:
If data is a tensor/array: a tensor of shape
(N, n_cats)containing the soft weights for each event.If data is a mapping: a dict with the same keys and weight tensors as values.
- Return type:
Union[tf.Tensor, dict]
- get_scale_tril()#
Compute the lower-triangular scale factors for the covariance matrices.
- Returns:
A tensor of shape (n_cats, dim, dim) representing the lower-triangular scale factors for each Gaussian component.
- Return type:
tf.Tensor
- restore(path)#
Restore the model’s trainable variables from a checkpoint.
- Parameters:
path (str) – Directory path to load the checkpoint from.
- save(path)#
Save the model’s trainable variables to a checkpoint.
- Parameters:
path (str) – Directory path to save the checkpoint.
- class gatohep.models.gato_sigmoid_model(variables_config, *, global_steepness=5.0, name='gato_sigmoid_model')#
Bases:
ModuleGato model for optimisation of cuts based on sigmoid-approximated boundaries. Can be applied to multiple discriminants, each with its own number of bins and steepness.
Each discriminant j is split into n_j bins by (n_j - 1) trainable cut points b_{j,i}. The full event bin is the Cartesian product of the one-dimensional bins.
- Parameters:
variables_config (list of dict) –
One entry per discriminant, for example:
[ {"name": "disc", "bins": 3, "range": (0.0, 1.0)}, {"name": "mjj", "bins": 2, "range": (200.0, 3000.0)}, ]
- binsint
Number of bins (> 1) for this variable.
- rangetuple(float, float)
Inclusive lower and upper bound of the variable.
- namestr, optional
Plain-text label used only in logs.
- steepnessfloat, optional
Individual initial slope k_j. If omitted the global
global_steepnessis used.
global_steepness (float, optional) – Default initial steepness k for variables that do not override it. Can be annealed during training.
name (str, optional) – TensorFlow name scope.
helpers (Public)
--------------
get_probs(x) – Soft assignment gamma_ik (shape N x n_cats).
get_bin_indices(x) – Hard bin index per event (shape N,).
get_bias(data) – Per-bin bias (hard minus soft) divided by hard yield.
- calculate_boundaries(j=0)#
Return the ordered physical boundaries for variable j.
- Parameters:
j (int, optional) – Index of the discriminant whose boundaries are requested.
- Returns:
Tensor of shape
(n_bins - 1,)with boundary locations expressed in the variable’s original scale.- Return type:
tf.Tensor
- compute_hard_bkg_stats(data_dict, signal_labels=None, eps=1e-08)#
Compute per-bin background yields and their relative statistical uncertainties.
- Parameters:
data_dict (Mapping[str, dict]) –
Dictionary of event collections. Each value must contain
"NN_output"- tensor/array with shape(N, dim)."weight"- tensor/array with shape(N,).
signal_labels (Sequence[str] or None, optional) – Names of the processes that should be treated as signal. If None (default), every key that starts with
"signal"is considered a signal process.eps (float, optional) – Small constant to avoid division by zero when computing relative uncertainties. Default is
1e-8.
- Returns:
B_sorted (np.ndarray) – Background yields per bin (shape
(n_cats,)).rel_unc_sorted (np.ndarray) – Relative statistical uncertainties for the same bins
sqrt(sum w^2) / sum w(shape(n_cats,)).
- get_bias(data_dict, *, steepness_scale=None, eps=1e-08)#
Estimate the per-bin bias from using soft assignments.
- Parameters:
data_dict (Mapping[str, dict]) – Event collections with
"NN_output"and"weight"entries, identical to whatget_probs()expects.steepness_scale (float, optional) – Additional factor applied to every steepness value before computing probabilities. Defaults to
Nonefor no rescaling.eps (float, optional) – Small positive constant that guards against division by zero.
- Returns:
One-dimensional array of length
n_catscontaining(hard - soft) / hardfor each bin.- Return type:
np.ndarray
- get_bin_indices(data, *, steepness_scale=None)#
Convert input data into hard bin indices.
- Parameters:
data (Union[tf.Tensor, np.ndarray, Mapping[str, Any]]) – Input events as a tensor/array of shape
(N, n_disc)or a mapping that mirrors the training loop structure with"NN_output"keys.steepness_scale (float, optional) – Multiplicative factor applied to every sigmoid steepness.
None(default) keeps the model’s stored values.
- Returns:
Hard bin indices with dtype
tf.int32. When data is a mapping, the result is a dict with the same keys and(N,)tensors. Otherwise a single(N,)tensor is returned.- Return type:
Union[tf.Tensor, dict]
- get_differentiable_significance(data_dict, *, signal_labels, background_reweight=None, reweight_processes=None, return_details=False)#
Compute differentiable Asimov significances for sigmoid-based models.
- Parameters:
data_dict (Mapping[str, dict]) – Input tensors with at least
"NN_output"and"weight"fields.signal_labels (Sequence[str]) – Names of the processes treated as signal.
background_reweight (array_like, optional) – Per-category scale factors applied to the accumulated background yield (length =
n_cats).Nonedisables reweighting.reweight_processes (Sequence[str], optional) – Background process names that should be scaled by the provided factors. If omitted, all background processes receive the reweighting.
return_details (bool, optional) – If True, also return the per-bin yield tensors used internally.
- Returns:
OrderedDict – Map from signal label to differentiable significance.
tf.Tensor, optional – Background yield per bin (if
return_detailsis True).tf.Tensor, optional – Background sum of squared weights per bin (if
return_detailsis True).
- get_probs(data, *, steepness_scale=None)#
Soft weights gamma_ik for arbitrary input structure.
Accepts the same tensor / dict shapes used in the GMM example.
- restore(path)#
Restore the model’s trainable variables from a checkpoint.
- Parameters:
path (str) – Directory path to load the checkpoint from.
- save(path)#
Save the model’s trainable variables to a checkpoint.
- Parameters:
path (str) – Directory path to save the checkpoint.