gatohep.utils#
- class gatohep.utils.LearningRateScheduler(optimizer, lr_initial=0.5, lr_final=0.001, *, total_epochs=100, mode='cosine', verbose=False)#
Cosine or exponential annealing for an optimizer’s learning rate.
- update(epoch)#
Update
optimizer.learning_ratebased on the current epoch.- Return type:
float
- class gatohep.utils.SteepnessScheduler(model, t_initial=1.0, t_final=0.01, *, total_epochs=100, mode='exponential', verbose=False)#
Anneal every
cfg["k"]in agato_sigmoid_model.Inherits all arguments from
TemperatureSchedulerbut updates the steepness parameters stored inmodel.var_cfg[j]["k"].Notes
Call
update(epoch)()once per epoch, exactly like the TemperatureScheduler.Works whether each
kis atf.Variableor a plain float.
- update(epoch)#
Update
model.temperaturefor the given epoch index.
- class gatohep.utils.TemperatureScheduler(model, t_initial=1.0, t_final=0.01, *, total_epochs=100, mode='exponential', verbose=False)#
Anneal a GATO model’s
temperaturevariable during training.- Parameters:
model (gato_gmm_model) – The model whose
temperature(tf.Variable) is updated in-place.t_initial (float) – Temperature at epoch 0.
t_final (float) – Temperature at total_epochs.
total_epochs (int) – Number of epochs that constitute one full annealing cycle.
mode ({"exponential", "cosine"}, optional) –
“exponential” - geometric decay \(T_e = T_0 (T_f/T_0)^{e/E}\)
”cosine” - half-cosine schedule \(T_e = T_f + 0.5\,(T_0 - T_f)\,[1+\cos(\pi e/E)]\)
verbose (bool, optional) – If True, prints the new temperature each epoch.
Notes
Call
update()once per epoch (or more often, if desired).- update(epoch)#
Update
model.temperaturefor the given epoch index.
- gatohep.utils.align_boundary_tracks(history, dist_tol=0.02, gap_max=20)#
Align boundary tracks across epochs.
- Parameters:
history (list of lists) – Each inner list contains boundary values at a specific epoch.
dist_tol (float, optional) – Maximum distance tolerance for matching boundaries. Default is 0.02.
gap_max (int, optional) – Maximum gap in epochs for considering a track inactive. Default is 20.
- Returns:
A 2D array of shape (n_epochs, n_tracks) with NaNs where no boundary exists.
- Return type:
ndarray
- gatohep.utils.asymptotic_significance(S, B, eps=1e-09)#
Compute the asymptotic significance using the Asimov formula.
- Parameters:
S (tf.Tensor) – Signal counts.
B (tf.Tensor) – Background counts.
eps (float, optional) – Small value to avoid division by zero. Default is 1e-9.
- Returns:
Asymptotic significance values.
- Return type:
tf.Tensor
- gatohep.utils.build_category_mass_maps(assignments, data_dict, n_cats, *, bins=40, mass_range=(100.0, 180.0), axis_name='mass')#
Build per-category diphoton-mass histograms for each process.
- Parameters:
assignments (dict[str, np.ndarray]) – Hard bin assignments produced by
gato_gmm_model.get_bin_indices().data_dict (dict[str, pandas.DataFrame]) – Input frames containing
"mass"and"weight".n_cats (int) – Total number of GMM categories.
bins – Passed through to
create_hist().mass_range – Passed through to
create_hist().axis_name – Passed through to
create_hist().
- Returns:
One entry per category with per-process histograms.
- Return type:
list[dict[str, hist.Hist]]
- gatohep.utils.build_mass_histograms(data_dict, *, bins=60, mass_range=(100.0, 180.0), axis_name='mass')#
Create diphoton-mass histograms for every process dataframe.
- Parameters:
data_dict (dict[str, pandas.DataFrame]) – Mapping with
"mass"and"weight"columns.bins (int, optional) – Number of uniform bins in the specified mass range.
mass_range (tuple[float, float], optional) – Inclusive histogram range in GeV.
axis_name (str, optional) – Name assigned to the histogram axis (for plotting labels).
- Returns:
One histogram per process.
- Return type:
dict[str, hist.Hist]
- gatohep.utils.compute_mass_reweight_factors(model, data_dict, *, signal_labels=None, feature_key='NN_output', mass_column='mass', weight_column='weight', mass_sb_low=100.0, mass_sb_high=180.0, mass_sig_low=123.5, mass_sig_high=126.5, nbins=10)#
Fit an exponential to each category’s diphoton-mass spectrum and return per-bin factors that map the continuum yield in the full sideband (100-180 GeV by default) to the yield expected in the signal window (125 +/- 1 sigma).
- gatohep.utils.compute_significance_from_hists(h_signal, h_bkg_list)#
Compute the significance from signal and background histograms.
- Parameters:
h_signal (hist.Hist) – Histogram of signal events.
h_bkg_list (list of hist.Hist) – List of histograms for background events.
- Returns:
Combined significance value.
- Return type:
float
- gatohep.utils.convert_mass_data_to_tensors(data_dict)#
Convert the dataframe-based storage into TensorFlow tensors.
- Parameters:
data_dict (dict[str, pandas.DataFrame]) – Mapping whose dataframes contain
NN_output,weightandmass.- Returns:
Dictionary mirroring the input keys with tensor-valued payload.
- Return type:
dict[str, dict[str, tf.Tensor]]
- gatohep.utils.create_hist(data, weights=None, bins=50, low=0.0, high=1.0, name='NN_output')#
Create a histogram from data and weights.
- Parameters:
data (array_like) – Data to be binned.
weights (array_like, optional) – Weights for the data. Default is None.
bins (int or array_like, optional) – Number of bins or bin edges. Default is 50.
low (float, optional) – Lower bound of the histogram range. Default is 0.0.
high (float, optional) – Upper bound of the histogram range. Default is 1.0.
name (str, optional) – Name of the histogram axis. Default is “NN_output”.
- Returns:
A histogram object.
- Return type:
hist.Hist
- gatohep.utils.df_dict_to_tensors(data_dict)#
Convert a dictionary of DataFrames to a dictionary of tensors.
- Parameters:
data_dict (dict) – A dictionary where keys are process names and values are pandas.DataFrames with columns “NN_output” and “weight”.
- Returns:
A dictionary where keys are process names and values are dictionaries containing tensors with keys “x” and “w”.
- Return type:
dict
- gatohep.utils.generate_resonance_toy_data(n_signal1=60000, n_signal2=60000, n_bkg=400000, *, noise_scale=0.2, mass_sigma=1.5, seed=7, background_slopes=None)#
Extend the 3-class toy dataset with Higgs-like diphoton masses.
- Parameters:
n_signal1 (int) – Event counts passed to
generate_toy_data_3class_3D().n_signal2 (int) – Event counts passed to
generate_toy_data_3class_3D().n_bkg (int) – Event counts passed to
generate_toy_data_3class_3D().noise_scale (float, optional) – Multiplicative feature noise forwarded to the base generator.
mass_sigma (float, optional) – Gaussian width of the resonant signal peak.
seed (int, optional) – Seed for deterministic feature and mass sampling.
background_slopes (sequence of float, optional) – Exponential slopes for the continuum components. If omitted, a default tuple is used and cycled over all background processes.
- Returns:
The original dataframes augmented with a
"mass"column.- Return type:
dict[str, pandas.DataFrame]
- gatohep.utils.safe_sigmoid(z, steepness)#
Compute a numerically stable sigmoid function.
- Parameters:
z (tf.Tensor) – Input tensor.
steepness (float) – Steepness of the sigmoid function.
- Returns:
Output tensor after applying the sigmoid function.
- Return type:
tf.Tensor
- gatohep.utils.sample_truncated_exponential(rng, slope, size, *, low, high)#
Draw samples from a truncated exponential distribution.
- Parameters:
rng (np.random.Generator) – Random-number generator used for sampling.
slope (float) – Positive exponential slope
λinexp(-λ·x).size (int) – Number of samples to draw.
low (float) – Lower bound of the truncation interval.
high (float) – Upper bound of the truncation interval (must exceed low).
- Returns:
Array of shape
(size,)with samples in[low, high].- Return type:
np.ndarray
- gatohep.utils.slice_to_2d_features(data_dict)#
Drop the background node of the pseudo-softmax feature vector.
- Parameters:
data_dict (dict[str, pandas.DataFrame]) – Input dictionary produced by the toy generator.
- Returns:
Shallow copies where
"NN_output"only retains the first two components per event.- Return type:
dict[str, pandas.DataFrame]