There are two primary methods of interacting with this framework:
- Apply a function and manipulate models within a q process
- With command-line arguments and customized configuration
The top-level functions in the repository are:
.automl Top-level functions
Generate, retrieve, delete models
fit
Apply AutoML to provided features and associated targetsgetModel
Retrieve a previously fit AutoML modeldeleteModels
Delete model/s
Generate configuration
newConfig
Generate a new JSON parameter file for use with .automl.fit
Updates
updateIgnoreWarnings
Update print warning severity levelupdateLogging
Update logging stateupdatePrinting
Update printing state
You can call .automl.fit
with arguments to suit a specific use case.
The functions listed above cover a wide range of options.
You can also extend them.
The examples following outline the most basic applications of AutoML: non-timeseries-specific machine-learning examples, and timeseries examples which use the FRESH algorithm and NLP Library.
The AutoML library contains no explicit predict function callable as a standalone entity.
Instead, predictions are made based on the output of a previously fit model. As for .automl.fit
and .automl.getModel
, there are two methods by which such models can be made available to a user.
- As the output of an in process run of the AutoML framework.
- By retrieving the model information and its associated prediction function from disk.
In each case the output is a dictionary containing the predict
function required to make predictions based on newly-retrieved data. Below are example invocations.
For simplicity, any unnecessary text which would normally be printed to screen is ignored.
q)trainingFeatures:([]1000?1f;asc 1000?1f)
q)trainingTargets:desc 1000?1f
q)testingFeatures:([]100?1f;100?1f)
q)// Fit a regression model within the current process
q)fitModel:.automl.fit[trainingFeatures;trainingTargets;`normal;`reg;::]
q)fitModel
modelInfo| `startDate`startTime`featureExtractionType`problemType`saveO..
predict | {[config;features]
original_print:utils.printing;
utils.printi..
q)// Predict targets for the testing features
q)show fitPredictions:fitModel.predict[testingFeatures]
0.7963151 0.734172 0.9847206 0.9817364 0.9709857 0.2008781 0.9781675 0...
q)// Retrieve the same model from disk (latest fit model)
q)retrievedModel:.automl.getModel[`startDate`startTime!(.z.D;.z.t)]
q)retrievedModel
modelInfo| `startDate`startTime`featureExtractionType`problemType`saveO..
predict | {[config;features]
original_print:utils.printing;
utils.printi..
q)// Predict targets for the testing features
q)show retrievedPredictions:retrievedModel.predict[testingFeatures]
0.7963151 0.734172 0.9847206 0.9817364 0.9709857 0.2008781 0.9781675 0...
q)// Show that both methods are the same
q)fitPredictions~retrievedPredictions
1b
Delete a model/set of models from disk
.automl.deleteModels modelDetails
Where modelDetails
is a dictionary containing information related to previously fit models to facilitate models being deleted from disk, returns null on successful invocation, otherwise errors with an appropriate response
Options for modelDetails
:
- A
startDate
andstartTime
to denote the dates/times to be deleted: either an exact match or a regex string matching appropriate saved model dates/times - In the case of a model saved according to a specified name models can be deleted individually by passing in an exact match denoting the model name or a regex string where multiple models are to be deleted.
q)// Delete a single dated/timed model
q)modelDetails:`startDate`startTime!(2020.08.01;14:10:10.100)
q).automl.deleteModels[modelDetails]
q)// Delete all models on a specific date any time between 4pm and 5pm
q)modelDetails:`startDate`startTime!(2020.08.01;"16:*")
q).automl.deleteModels[modelDetails]
q)// Delete all models for dates within a certain range
q)modelDetails:`startDate`startTime!("2020.08.0[1-9]";"*")
q).automl.deleteModels[modelDetails]
q)// Attempt to delete a model that does not exist
q)modelDetails:`startDate`startTime!(2000.01.01;10:10:10.100)
q).automl.deleteModels[modelDetails]
'startDate provided was not present within the list of available dates
q)// Delete a model based on its exact name
q)modelDetails:enlist[`savedModelName]!enlist "testModel"
q).automl.deleteModels[modelDetails]
q)// Delete a set of models matching an appropriate regex string
q)modelDetails:enlist[`savedModelName]!enlist "test*"
q).automl.deleteModels[modelDetails]
q)// Attempt to delete a named model that does not exist
q)modelDetails:enlist[`savedModelName]!enlist "myModel"
q).automl.deleteModels[modelDetails]
'No files matching the user provided savedModelName were found for deletion
Apply AutoML to provided features and associated targets
.automl.fit[features;target;ftype;ptype;params]
Where
features
is an unkeyed tabular feature data or a dictionary outlining how to retrieve the data in accordance with.ml.i.loadDataset
target
is target vector of any type or a dictionary outlining how to retrieve the target vector in accordance with.ml.i.loadDataset
ftype
is the feature-extraction type as a symbol (`nlp
,`normal
, or`fresh
)ptype
is the problem type as a symbol (`reg
or`class
)params
is one of- Path to a JSON configuration file, either relative to the working directory or in
code/customization/configuration/customConfig
- Dictionary of non-default behaviors
- Generic null
(::)
– run AutoML with default parameters
returns the configuration produced within the current run of AutoML along with a prediction function which can be used to make predictions using the best model produced.
The default setup saves the following items from an individual run:
- The best model, saved as a HDF5 file, or ‘pickled’ byte object.
- A saved report indicating the procedure taken and scores achieved.
- A saved binary-encoded dictionary denoting the procedure to be taken for reproducing results, running on new data and outlining all important information relating to a run.
- Results from each step of the pipeline saved to the generated report.
- On application NLP techniques a word2vec model is saved outlining the text to numerical mapping for a specific run.
The following examples demonstrate how to apply data in various use cases to .automl.fit
. Note that while only one example is shown for each feature-extraction type, datasets with binary-classification, multi-classification and regression targets can all be used in each case.
The terminal output is shown here only for the last example.
// Non-time series (normal) regression example table
features:([]asc 100?0t;100?1f;desc 100?0b;100?1f;asc 100?1f)
// Regression target
target:asc 100?1f
// Feature extraction type
featExtractType:`normal
// Problem type
problemType:`reg
// Use default system parameters
params:(::)
// Run example
.automl.fit[features;target;featExtractType;problemType;params]
// Non-time series (normal) multi-classification example table
features:([]100?1f;100?1f)
// Multi-classification target
target:100?5
// Feature extraction type
featExtractType:`normal
// Problem type
problemType:`class
// Use default system parameters
params:(::)
// Run example
.automl.fit[features;target;featExtractType;problemType;params]
// NLP binary-classification example table
features:([]100?1f;asc 100?("Testing the application of nlp";"With different characters"))
// Binary-classification target
target:asc 100?0b
// Feature extraction type
featExtractType:`nlp
// Problem type
ptype:`class
// Use default system parameters
params:(::)
// Run example
.automl.fit[features;target;featExtractType;ptype;params]
// FRESH regression example table
features:([]5000?100?0p;asc 5000?1f;5000?1f;desc 5000?10f;5000?0b)
// Regression target
target:desc 100?1f
// Feature extraction type
featExtractType:`fresh
// Problem type
problemType:`reg
// Use default system parameters
params:(::)
// Run example
.automl.fit[features;target;featExtractType;problemType;params]
Executing node: automlConfig
Executing node: configuration
Executing node: targetDataConfig
Executing node: targetData
Executing node: featureDataConfig
Executing node: featureData
Executing node: dataCheck
Executing node: featureDescription
The following is a breakdown of information for each of the relevant columns in the dataset
| count unique mean std min max type
--| ---------------------------------------------------------------
x1| 5000 5000 0.5004232 0.2908372 0.0001313207 0.999641 numeric
x2| 5000 5000 0.4967023 0.2897377 0.0007908894 0.9998165 numeric
x3| 5000 5000 5.036043 2.904289 0.002741043 9.998293 numeric
x | 5000 100 :: :: :: :: time
x4| 5000 2 :: :: :: :: boolean
Executing node: dataPreprocessing
Data preprocessing complete, starting feature creation
Executing node: featureCreation
Executing node: labelEncode
Executing node: featureSignificance
Total number of significant features being passed to the models = 214
Executing node: trainTestSplit
Executing node: modelGeneration
Executing node: selectModels
Starting initial model selection - allow ample time for large datasets
Executing node: runModels
Scores for all models using .ml.mse
RandomForestRegressor | 0.04202918
GradientBoostingRegressor| 0.04534999
Lasso | 0.04583557
KNeighborsRegressor | 0.04822146
AdaBoostRegressor | 0.05129247
LinearRegression | 0.4422226
MLPRegressor | 848.683
Best scoring model = RandomForestRegressor
Executing node: optimizeModels
Continuing to hyperparameter search and final model fitting on testing set
Best model fitting now complete - final score on testing set = 0.2106325
Executing node: predictParams
Executing node: preprocParams
Executing node: pathConstruct
Executing node: saveGraph
Saving down graphs to automl/outputs/dateTimeModels/2020.12.17/run_14.57.20.206/images/
Executing node: saveReport
Saving down procedure report to automl/outputs/dateTimeModels/2020.12.17/run_14.57.20.206/report/
Executing node: saveMeta
Saving down model parameters to automl/outputs/dateTimeModels/2020.12.17/run_14.57.20.206/config/
Executing node: saveModels
Saving down model to automl/outputs/dateTimeModels/2020.12.17/run_14.57.20.206/models/
modelInfo| `startDate`startTime`featureExtractionType`problemType`saveOption`..
predict | {[config;features]
original_print:utils.printing;
utils.printi..
Retrieve a previously fit AutoML model to use for prediction
.automl.getModel modelDetails
Where modelDetails
is a dictionary containing information related to a previously fit model to facilitate model retrieval from disk, returns relevant model metadata and the prediction function associated with the model.
Options for modelDetails
:
- Provide a
startDate
andstartTime
to retrieve the closest prevailing model i.e. nearest model before this time - In the case of a model saved according to a specified name, retrieve this by providing a
savedModelName
q)// Persisted model at a specific date/time
q)modelDetails:`startDate`startTime!(2020.12.17;14:57:20.206)
q)// Retrieve model
q).automl.getModel[modelDetails]
modelInfo| `modelLib`modelFunc`startDate`startTime`featureExt..
predict | {[config;features]
original_print:utils.printing;
utils.printi..
q)// Retrieve the most recent saved model
q)modelDetails:`startDate`startTime(.z.D;.z.t)
q).automl.getModel[modelDetails]
modelInfo| `modelLib`modelFunc`startDate`startTime`featureExt..
predict | {[config;features]
original_print:utils.printing;
utils.printi..
q)// Retrieve the earliest model saved
q)modelDetails:`startDate`startTime("d"$0;"t"$0)
q).automl.getModel[modelDetails]
modelInfo| `modelLib`modelFunc`startDate`startTime`featureExt..
predict | {[config;features]
original_print:utils.printing;
utils.printi..
q)// Retrieve a model based on a name associated with the model
q)modelDetails:enlist[`savedModelName]!enlist "testModel"
q).automl.getModel[modelDetails]
modelInfo| `modelLib`modelFunc`startDate`startTime`featureExt..
predict | {[config;features]
original_print:utils.printing;
utils.printi..
Generate a new JSON parameter file for use with .automl.fit
.automl.newConfig fileName
Where fileName
is the name of a new JSON configuration file as a string, symbol or symbolic file handle, in code/customization/configuration
saves a copy of default.json
to customConfig/fileName
and returns generic null.
q)// Path where new JSON configuration file will be saved
q)configPath:hsym`$.automl.path,"/code/customization/configuration/customConfig/"
q)// Check files present in directory at present
q)key configPath
`symbol$()
q)// Generate new configuration file called "newConfigFile"
q).automl.newConfig[`newConfigFile]
q)// Check files present in directory - new configuration file has been generated
q)key configPath
,`newConfigFile
Update print warning severity level
.automl.updateIgnoreWarnings warningLevel
Where warningLevel
is 0j
, 1j
or 2j
, updates .automl.utils.ignoreWarnings
and returns null.
Warning levels:
0 ignore warnings completely and continue evaluation
1 alert user a warning was flagged and continue
2 exit evaluation of AutoML, telling the user why
q)// Exit pipeline on error
q).automl.updateIgnoreWarnings 2
q)// Fit AutoML
q).automl.fit[features;target;featExtractType;problemType;params]
Executing node: automlConfig
Executing node: configuration
Executing node: targetDataConfig
Executing node: targetData
Executing node: featureDataConfig
Executing node: featureData
Executing node: dataCheck
Error: The savePath chosen already exists, this run will be exited
q)// Highlight warnings
q).automl.updateIgnoreWarnings 1
q)// Fit AutoML
q).automl.fit[features;target;featExtractType;problemType;params]
Executing node: automlConfig
Executing node: configuration
Executing node: targetDataConfig
Executing node: targetData
Executing node: featureDataConfig
Executing node: featureData
Executing node: dataCheck
The savePath chosen already exists and will be overwritten
Executing node: featureDescription
..
q)// Ignore warnings
q).automl.updateIgnoreWarnings 0
q)// Fit AutoML
q).automl.fit[features;target;featExtractType;problemType;params]
Executing node: automlConfig
Executing node: configuration
Executing node: targetDataConfig
Executing node: targetData
Executing node: featureDataConfig
Executing node: featureData
Executing node: dataCheck
Executing node: featureDescription
..
Toggle logging state
.automl.updateLogging[]
Toggles the flag .automl.utils.logging
and returns null.
.automl.utils.logging
is a boolean: whether to print statements from .automl.fit
to a log file.
Its default value is 0b
.
Toggle printing state
.automl.updatePrinting[]
Toggles the flag .automl.utils.printing
and returns null.
.automl.utils.printing
is a boolean: whether to print statements to the console.
Its default value is 1b
.
You may wish to run the AutoML framework from the command line:
- to overwrite the default parameters of a process running AutoML such that each run uses these parameters
- when running the entirety of the framework in a ‘one-shot’ manner, fitting a model and saving it to disk and exiting the process immediately
Both of the above require custom JSON files, in particular a customized version of default.json
.
Use .automl.newConfig
to generate a named custom version of the default.json
file.
When editing it follow these instructions.
In the examples the custom JSON files used can be in either of two locations:
- Within folder
code/customization/configuration/customConfig
relative to.automl.path
- Relative to the working directory
Command to run with a custom configuration:
q automl.q -config newConfig.json
In the example following, a custom JSON file myConfig.json
in folder code/customization/configuration/customConfig
sets the testing set size to 0.3 and modifies the target limit to 1000.
First, start AutoML in a q process and display defaults.
$ q automl.q
q).automl.loadfile`:init.q
q).automl.paramDict[`general;`testingSize`targetLimit]
0.2
10000
q)\\
Next, start AutoML using the new configuration file
$ q automl.q -config myConfig.json
q).automl.loadfile`:init.q
q).automl.paramDict[`general;`testingSize`targetLimit]
0.3
1000
The following is the command line input used when running the entirety of .automl.fit
from command line.
q automl.q -config newConfig.json -run