Skip to content

Latest commit

 

History

History
525 lines (386 loc) · 17.4 KB

functions.md

File metadata and controls

525 lines (386 loc) · 17.4 KB

Interacting with the framework

There are two primary methods of interacting with this framework:

  1. Apply a function and manipulate models within a q process
  2. With command-line arguments and customized configuration

Run within a q process

The top-level functions in the repository are:

.automl Top-level functions

Generate, retrieve, delete models

  • fit Apply AutoML to provided features and associated targets
  • getModel Retrieve a previously fit AutoML model
  • deleteModels Delete model/s

Generate configuration

  • newConfig Generate a new JSON parameter file for use with .automl.fit

Updates

You can call .automl.fit with arguments to suit a specific use case. The functions listed above cover a wide range of options. You can also extend them.

👉 Advanced section

The examples following outline the most basic applications of AutoML: non-timeseries-specific machine-learning examples, and timeseries examples which use the FRESH algorithm and NLP Library.

Model prediction

The AutoML library contains no explicit predict function callable as a standalone entity. Instead, predictions are made based on the output of a previously fit model. As for .automl.fit and .automl.getModel, there are two methods by which such models can be made available to a user.

  1. As the output of an in process run of the AutoML framework.
  2. By retrieving the model information and its associated prediction function from disk.

In each case the output is a dictionary containing the predict function required to make predictions based on newly-retrieved data. Below are example invocations. For simplicity, any unnecessary text which would normally be printed to screen is ignored.

q)trainingFeatures:([]1000?1f;asc 1000?1f)
q)trainingTargets:desc 1000?1f
q)testingFeatures:([]100?1f;100?1f)

q)// Fit a regression model within the current process
q)fitModel:.automl.fit[trainingFeatures;trainingTargets;`normal;`reg;::]
q)fitModel
modelInfo| `startDate`startTime`featureExtractionType`problemType`saveO..
predict  | {[config;features]
  original_print:utils.printing;
  utils.printi..

q)// Predict targets for the testing features
q)show fitPredictions:fitModel.predict[testingFeatures]
0.7963151 0.734172 0.9847206 0.9817364 0.9709857 0.2008781 0.9781675 0...

q)// Retrieve the same model from disk (latest fit model)
q)retrievedModel:.automl.getModel[`startDate`startTime!(.z.D;.z.t)]
q)retrievedModel
modelInfo| `startDate`startTime`featureExtractionType`problemType`saveO..
predict  | {[config;features]
  original_print:utils.printing;
  utils.printi..

q)// Predict targets for the testing features
q)show retrievedPredictions:retrievedModel.predict[testingFeatures]
0.7963151 0.734172 0.9847206 0.9817364 0.9709857 0.2008781 0.9781675 0...

q)// Show that both methods are the same
q)fitPredictions~retrievedPredictions
1b

.automl.deleteModels

Delete a model/set of models from disk

.automl.deleteModels modelDetails

Where modelDetails is a dictionary containing information related to previously fit models to facilitate models being deleted from disk, returns null on successful invocation, otherwise errors with an appropriate response

Options for modelDetails:

  • A startDate and startTime to denote the dates/times to be deleted: either an exact match or a regex string matching appropriate saved model dates/times
  • In the case of a model saved according to a specified name models can be deleted individually by passing in an exact match denoting the model name or a regex string where multiple models are to be deleted.
q)// Delete a single dated/timed model
q)modelDetails:`startDate`startTime!(2020.08.01;14:10:10.100)
q).automl.deleteModels[modelDetails]

q)// Delete all models on a specific date any time between 4pm and 5pm
q)modelDetails:`startDate`startTime!(2020.08.01;"16:*")
q).automl.deleteModels[modelDetails]

q)// Delete all models for dates within a certain range
q)modelDetails:`startDate`startTime!("2020.08.0[1-9]";"*")
q).automl.deleteModels[modelDetails]

q)// Attempt to delete a model that does not exist
q)modelDetails:`startDate`startTime!(2000.01.01;10:10:10.100)
q).automl.deleteModels[modelDetails]
'startDate provided was not present within the list of available dates

q)// Delete a model based on its exact name
q)modelDetails:enlist[`savedModelName]!enlist "testModel"
q).automl.deleteModels[modelDetails]

q)// Delete a set of models matching an appropriate regex string
q)modelDetails:enlist[`savedModelName]!enlist "test*"
q).automl.deleteModels[modelDetails]

q)// Attempt to delete a named model that does not exist
q)modelDetails:enlist[`savedModelName]!enlist "myModel"
q).automl.deleteModels[modelDetails]
'No files matching the user provided savedModelName were found for deletion

.automl.fit

Apply AutoML to provided features and associated targets

.automl.fit[features;target;ftype;ptype;params]

Where

  • features is an unkeyed tabular feature data or a dictionary outlining how to retrieve the data in accordance with .ml.i.loadDataset
  • target is target vector of any type or a dictionary outlining how to retrieve the target vector in accordance with .ml.i.loadDataset
  • ftype is the feature-extraction type as a symbol ( `nlp, `normal, or `fresh)
  • ptype is the problem type as a symbol ( `reg or `class)
  • params is one of
  • Path to a JSON configuration file, either relative to the working directory or in code/customization/configuration/customConfig
  • Dictionary of non-default behaviors
  • Generic null (::) – run AutoML with default parameters

returns the configuration produced within the current run of AutoML along with a prediction function which can be used to make predictions using the best model produced.

The default setup saves the following items from an individual run:

  1. The best model, saved as a HDF5 file, or ‘pickled’ byte object.
  2. A saved report indicating the procedure taken and scores achieved.
  3. A saved binary-encoded dictionary denoting the procedure to be taken for reproducing results, running on new data and outlining all important information relating to a run.
  4. Results from each step of the pipeline saved to the generated report.
  5. On application NLP techniques a word2vec model is saved outlining the text to numerical mapping for a specific run.

The following examples demonstrate how to apply data in various use cases to .automl.fit. Note that while only one example is shown for each feature-extraction type, datasets with binary-classification, multi-classification and regression targets can all be used in each case. The terminal output is shown here only for the last example.

// Non-time series (normal) regression example table
features:([]asc 100?0t;100?1f;desc 100?0b;100?1f;asc 100?1f)
// Regression target
target:asc 100?1f
// Feature extraction type
featExtractType:`normal
// Problem type
problemType:`reg
// Use default system parameters
params:(::)
// Run example
.automl.fit[features;target;featExtractType;problemType;params]

// Non-time series (normal) multi-classification example table
features:([]100?1f;100?1f)
// Multi-classification target
target:100?5
// Feature extraction type
featExtractType:`normal
// Problem type
problemType:`class
// Use default system parameters
params:(::)
// Run example
.automl.fit[features;target;featExtractType;problemType;params]

// NLP binary-classification example table
features:([]100?1f;asc 100?("Testing the application of nlp";"With different characters"))
// Binary-classification target
target:asc 100?0b
// Feature extraction type
featExtractType:`nlp
// Problem type
ptype:`class
// Use default system parameters
params:(::)
// Run example
.automl.fit[features;target;featExtractType;ptype;params]

// FRESH regression example table
features:([]5000?100?0p;asc 5000?1f;5000?1f;desc 5000?10f;5000?0b)
// Regression target
target:desc 100?1f
// Feature extraction type
featExtractType:`fresh
// Problem type
problemType:`reg
// Use default system parameters
params:(::)

// Run example
.automl.fit[features;target;featExtractType;problemType;params]
Executing node: automlConfig
Executing node: configuration
Executing node: targetDataConfig
Executing node: targetData
Executing node: featureDataConfig
Executing node: featureData
Executing node: dataCheck
Executing node: featureDescription

The following is a breakdown of information for each of the relevant columns in the dataset

  | count unique mean      std       min          max       type
--| ---------------------------------------------------------------
x1| 5000  5000   0.5004232 0.2908372 0.0001313207 0.999641  numeric
x2| 5000  5000   0.4967023 0.2897377 0.0007908894 0.9998165 numeric
x3| 5000  5000   5.036043  2.904289  0.002741043  9.998293  numeric
x | 5000  100    ::        ::        ::           ::        time
x4| 5000  2      ::        ::        ::           ::        boolean

Executing node: dataPreprocessing

Data preprocessing complete, starting feature creation

Executing node: featureCreation
Executing node: labelEncode
Executing node: featureSignificance

Total number of significant features being passed to the models = 214

Executing node: trainTestSplit
Executing node: modelGeneration
Executing node: selectModels

Starting initial model selection - allow ample time for large datasets

Executing node: runModels

Scores for all models using .ml.mse

RandomForestRegressor    | 0.04202918
GradientBoostingRegressor| 0.04534999
Lasso                    | 0.04583557
KNeighborsRegressor      | 0.04822146
AdaBoostRegressor        | 0.05129247
LinearRegression         | 0.4422226
MLPRegressor             | 848.683

Best scoring model = RandomForestRegressor

Executing node: optimizeModels

Continuing to hyperparameter search and final model fitting on testing set

Best model fitting now complete - final score on testing set = 0.2106325

Executing node: predictParams
Executing node: preprocParams
Executing node: pathConstruct
Executing node: saveGraph

Saving down graphs to automl/outputs/dateTimeModels/2020.12.17/run_14.57.20.206/images/

Executing node: saveReport

Saving down procedure report to automl/outputs/dateTimeModels/2020.12.17/run_14.57.20.206/report/

Executing node: saveMeta

Saving down model parameters to automl/outputs/dateTimeModels/2020.12.17/run_14.57.20.206/config/

Executing node: saveModels

Saving down model to automl/outputs/dateTimeModels/2020.12.17/run_14.57.20.206/models/

modelInfo| `startDate`startTime`featureExtractionType`problemType`saveOption`..
predict  | {[config;features]
  original_print:utils.printing;
  utils.printi..

.automl.getModel

Retrieve a previously fit AutoML model to use for prediction

.automl.getModel modelDetails

Where modelDetails is a dictionary containing information related to a previously fit model to facilitate model retrieval from disk, returns relevant model metadata and the prediction function associated with the model.

Options for modelDetails:

  • Provide a startDate and startTime to retrieve the closest prevailing model i.e. nearest model before this time
  • In the case of a model saved according to a specified name, retrieve this by providing a savedModelName
q)// Persisted model at a specific date/time
q)modelDetails:`startDate`startTime!(2020.12.17;14:57:20.206)
q)// Retrieve model
q).automl.getModel[modelDetails]
modelInfo| `modelLib`modelFunc`startDate`startTime`featureExt..
predict  | {[config;features]
  original_print:utils.printing;
  utils.printi..

q)// Retrieve the most recent saved model
q)modelDetails:`startDate`startTime(.z.D;.z.t)
q).automl.getModel[modelDetails]
modelInfo| `modelLib`modelFunc`startDate`startTime`featureExt..
predict  | {[config;features]
  original_print:utils.printing;
  utils.printi..

q)// Retrieve the earliest model saved
q)modelDetails:`startDate`startTime("d"$0;"t"$0)
q).automl.getModel[modelDetails]
modelInfo| `modelLib`modelFunc`startDate`startTime`featureExt..
predict  | {[config;features]
  original_print:utils.printing;
  utils.printi..

q)// Retrieve a model based on a name associated with the model
q)modelDetails:enlist[`savedModelName]!enlist "testModel"
q).automl.getModel[modelDetails]
modelInfo| `modelLib`modelFunc`startDate`startTime`featureExt..
predict  | {[config;features]
  original_print:utils.printing;
  utils.printi..

.automl.newConfig

Generate a new JSON parameter file for use with .automl.fit

.automl.newConfig fileName

Where fileName is the name of a new JSON configuration file as a string, symbol or symbolic file handle, in code/customization/configuration saves a copy of default.json to customConfig/fileName and returns generic null.

q)// Path where new JSON configuration file will be saved
q)configPath:hsym`$.automl.path,"/code/customization/configuration/customConfig/"
q)// Check files present in directory at present
q)key configPath
`symbol$()
q)// Generate new configuration file called "newConfigFile"
q).automl.newConfig[`newConfigFile]
q)// Check files present in directory - new configuration file has been generated
q)key configPath
,`newConfigFile

.automl.updateIgnoreWarnings

Update print warning severity level

.automl.updateIgnoreWarnings warningLevel

Where warningLevel is 0j, 1j or 2j, updates .automl.utils.ignoreWarnings and returns null.

Warning levels:

0   ignore warnings completely and continue evaluation
1   alert user a warning was flagged and continue
2   exit evaluation of AutoML, telling the user why 
q)// Exit pipeline on error
q).automl.updateIgnoreWarnings 2
q)// Fit AutoML
q).automl.fit[features;target;featExtractType;problemType;params]
Executing node: automlConfig
Executing node: configuration
Executing node: targetDataConfig
Executing node: targetData
Executing node: featureDataConfig
Executing node: featureData
Executing node: dataCheck
Error: The savePath chosen already exists, this run will be exited

q)// Highlight warnings
q).automl.updateIgnoreWarnings 1
q)// Fit AutoML
q).automl.fit[features;target;featExtractType;problemType;params]
Executing node: automlConfig
Executing node: configuration
Executing node: targetDataConfig
Executing node: targetData
Executing node: featureDataConfig
Executing node: featureData
Executing node: dataCheck

The savePath chosen already exists and will be overwritten

Executing node: featureDescription
..

q)// Ignore warnings
q).automl.updateIgnoreWarnings 0
q)// Fit AutoML
q).automl.fit[features;target;featExtractType;problemType;params]
Executing node: automlConfig
Executing node: configuration
Executing node: targetDataConfig
Executing node: targetData
Executing node: featureDataConfig
Executing node: featureData
Executing node: dataCheck
Executing node: featureDescription
..

.automl.updateLogging

Toggle logging state

.automl.updateLogging[]

Toggles the flag .automl.utils.logging and returns null.

.automl.utils.logging is a boolean: whether to print statements from .automl.fit to a log file. Its default value is 0b.

.automl.updatePrinting

Toggle printing state

.automl.updatePrinting[]

Toggles the flag .automl.utils.printing and returns null.

.automl.utils.printing is a boolean: whether to print statements to the console. Its default value is 1b.

Run from the command line

You may wish to run the AutoML framework from the command line:

  • to overwrite the default parameters of a process running AutoML such that each run uses these parameters
  • when running the entirety of the framework in a ‘one-shot’ manner, fitting a model and saving it to disk and exiting the process immediately

Both of the above require custom JSON files, in particular a customized version of default.json. Use .automl.newConfig to generate a named custom version of the default.json file. When editing it follow these instructions.

In the examples the custom JSON files used can be in either of two locations:

  • Within folder code/customization/configuration/customConfig relative to .automl.path
  • Relative to the working directory

Overwriting default parameters

Command to run with a custom configuration:

q automl.q -config newConfig.json

In the example following, a custom JSON file myConfig.json in folder code/customization/configuration/customConfig sets the testing set size to 0.3 and modifies the target limit to 1000.

First, start AutoML in a q process and display defaults.

$ q automl.q
q).automl.loadfile`:init.q
q).automl.paramDict[`general;`testingSize`targetLimit]
0.2
10000
q)\\

Next, start AutoML using the new configuration file

$ q automl.q -config myConfig.json
q).automl.loadfile`:init.q
q).automl.paramDict[`general;`testingSize`targetLimit]
0.3
1000

Full run from command line

The following is the command line input used when running the entirety of .automl.fit from command line.

q automl.q -config newConfig.json -run