gatohep.data_generation#

gatohep.data_generation.generate_toy_data_1D(n_signal=100000, n_bkg=100000, xs_signal=0.5, xs_bkg1=100, xs_bkg2=80, xs_bkg3=50, xs_bkg4=20, xs_bkg5=10, lumi=100.0, noise_scale=0.3, seed=None)#

Generate 1D toy data for signal and background events.

Parameters:
  • n_signal (int, optional) – Number of signal events to generate. Default is 100000.

  • n_bkg (int, optional) – Number of background events to generate. Default is 300000.

  • xs_signal (float, optional) – Cross-section for signal events. Default is 0.5.

  • xs_bkg1 (float, optional) – Cross-section for the first background component. Default is 50.

  • xs_bkg2 (float, optional) – Cross-section for the second background component. Default is 15.

  • xs_bkg3 (float, optional) – Cross-section for the third background component. Default is 10.

  • xs_bkg4 (float, optional) – Cross-section for the fourth background component. Default is 20.

  • xs_bkg5 (float, optional) – Cross-section for the fifth background component. Default is 10.

  • lumi (float, optional) – Luminosity for scaling event weights. Default is 100.

  • seed (int or None, optional) – Seed for the random number generator. Default is None.

Returns:

A dictionary of DataFrames, each containing the generated toy data with columns “NN_output” and “weight”.

Return type:

dict of pandas.DataFrame

gatohep.data_generation.generate_toy_data_3class_3D(n_signal1=100000, n_signal2=100000, n_bkg=500000, xs_signal1=0.5, xs_signal2=0.1, xs_bkg1=100, xs_bkg2=80, xs_bkg3=50, xs_bkg4=20, xs_bkg5=10, lumi=100.0, noise_scale=0.3, seed=None)#

Generate 3D Gaussian data for 2 signal and 5 background classes.

For each point, compute likelihood-ratio-based 3-class scores: [score_signal1, score_signal2, score_background].

Parameters:
  • n_signal1 (int, optional) – Number of events for signal1. Default is 100000.

  • n_signal2 (int, optional) – Number of events for signal2. Default is 100000.

  • n_bkg (int, optional) – Total number of background events. Default is 500000.

  • xs_signal1 (float, optional) – Cross-section for signal1. Default is 0.5.

  • xs_signal2 (float, optional) – Cross-section for signal2. Default is 0.1.

  • xs_bkg1 (float, optional) – Cross-section for background1. Default is 100.

  • xs_bkg2 (float, optional) – Cross-section for background2. Default is 80.

  • xs_bkg3 (float, optional) – Cross-section for background3. Default is 50.

  • xs_bkg4 (float, optional) – Cross-section for background4. Default is 20.

  • xs_bkg5 (float, optional) – Cross-section for background5. Default is 10.

  • lumi (float, optional) – Luminosity for scaling event weights. Default is 100.0.

  • noise_scale (float, optional) – Scale of multiplicative noise applied to the data. Default is 0.2.

  • seed (int or None, optional) – Seed for the random number generator. Default is None.

Returns:

A dictionary of DataFrames, each containing the generated toy data with columns: - ‘NN_output’: 3-vector of scores [score_signal1, score_signal2, score_background]. - ‘weight’: Event weight.

Return type:

dict of pandas.DataFrame