Preprocessing

#Preprocessing Pandas Tables -> numpy Training Data

name- The name of the data type (i.e. Electron, Photon, EFlowTrack, etc.)
max_size- The maximum number of objects to use in training
sort_columns- What columns to sort on (See pandas.DataFrame.sort)
sort_ascending- Whether each column will be sorted ascending or decending (See pandas.DataFrame.sort)
query - A selection query string to use before truncating the data (See pands.DataFrame.query)
shuffle- Whether or not to shuffle the data

##preprocessFromPandas_label_dir_pairs Gets training data from folders of pandas tables

Arguements:

label_dir_pairs- a list of tuples of the form (label, directory) where the directory contains tables containing data of all the same event types.
start- Where to start reading (as if all of the files are part of one long list)
num_samples- The number of samples to read
object_profiles- A list of ObjectProfile(s) corresponding to each type of observable object and its preprocessing steps. The order of the ObjectProfiles in this list dictates the order or the input X list.
observ_types- The column headers for the data to be read from the panadas table

Returns: Training data with its correspoinding labels (X_train, Y_train)

##Examples

observ_types = ['E/c', 'Px', 'Py', 'Pz', 'Charge', "PT_ET", "Eta", "Phi", "Dxy_Ehad_Eem"]
sample_start = 0
num_samples = 10000


object_profiles = [ObjectProfile("Electron",5),
                    ObjectProfile("MuonTight", 5),
                    ObjectProfile("Photon", 25),
                    ObjectProfile("MissingET", 1),
                    ObjectProfile("EFlowPhoton",1000, sort_columns=["PT_ET"], sort_ascending=False),  #1300
                    ObjectProfile("EFlowNeutralHadron",1000, sort_columns=["PT_ET"], sort_ascending=False),  #1000
                    ObjectProfile("EFlowTrack",1000, sort_columns=["PT_ET"], sort_ascending=False)]  #1050


label_dir_pairs = \
            [   ("ttbar", "/data/shared/Delphes/ttbar_lepFilter_13TeV/pandas_unjoined/"),
                ("wjet", "/data/shared/Delphes/wjets_lepFilter_13TeV/pandas_unjoined/"),
                ("qcd", "/data/shared/Delphes/qcd_lepFilter_13TeV/pandas_unjoined/")
            ]
X_train, y_train = preprocessFromPandas_label_dir_pairs(label_dir_pairs, sample_start, num_samples, object_profiles,observ_types)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocessing

Table of contents:

Clone this wiki locally