Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate signal efficiency in training and prediction #58

Open
aribrill opened this issue Sep 20, 2018 · 2 comments
Open

Calculate signal efficiency in training and prediction #58

aribrill opened this issue Sep 20, 2018 · 2 comments
Assignees
Milestone

Comments

@aribrill
Copy link
Collaborator

Signal efficiency (gamma efficiency / sqrt(proton efficiency)) is a useful metric for gamma/hadron classification. In run_model.py, add a custom metric to display this quantity for the validation set with Tensorboard for each of a configurable set of classification thresholds. Also add a script to plot signal efficiency vs. classification threshold given a prediction output file. Note that efficiency should take into account the initial cuts placed on the data, which will require the changes to the metadata in point 3(d) of #57.

@aribrill aribrill added this to the v0.2.1 milestone Sep 21, 2018
@aribrill aribrill self-assigned this Sep 21, 2018
@nietootein
Copy link
Member

It would be desirable to gain insight into how the signal efficiency evolves with energy, since it may help us optimizing the classification thresholds as a function of an event's estimated energy. Even if our estimated energy will suffer from resolution and bias effects, if the classification threshold that optimizes the signal efficiency for a given energy does not substantially change in the vicinity (defined in terms of bias+resolution) of the estimated energy one may expect that dynamic classification thresholds may approximate an optimal signal efficiency. Since our data do not contain a reconstruction of the energy of the events but rather the true (MC) energy of the events one may start approaching the problem by optimizing the signal efficiency within a given (true)energy bin. Since we would like to cover the entire energy range of the instrument it would be desirable to enable an option for the plotting script (perhaps frun_model.py as well?) to take the definition of a binning (e.g. e_min, e_max, num_bins, scale={normal, log, ...}) so this signal efficiency could be computer for each bin there defined.

Visualizing a configurable set of classification thresholds may be useful, as Ari suggests, although we should also consider optimizing the signal efficiency as a function of the threshold in case the optimal threshold may not be close to any of the values in the predefined set of thresholds.

@aribrill
Copy link
Collaborator Author

Right now there's no easy way to get the MC energy of a given event. I think the natural way to do it is to include the MC energy in the auxiliary info returned by DataLoader.get_example() in array mode. What auxiliary info to include can be a config option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants