The drum
tool has a verb called push
which requires the usage of a metadata file in your
code directory to configure the creation of a DataRobot model.
- name (required): a string used as the custom model title, try to make this unique so you can search for it later.
- type (required): a string with the value either
training
orinference
. Inference models are meant for solely deployment, while tasks will be able to be trained on the leaderboard. - environmentID (required): a hash of the execution environment to use while running your custom model.
You can find a list of available execution environments here.
Click on the
Environment Info
tab of the environment and copy the ID to your file. - targetType (required): a string indicating the type of target. Must be one of
binary
regression
anomaly
unstructured
multiclass
transform
- modelID (optional): Once you have created a model for the first time, it is best practice to use custom model versions when adding code while iterating on your model. To only create a new version instead of a whole new top level model, please include a hash here for the custom model you created.
- description (optional): A searchable note to your future self about the contents of this model. This is ignored if modelID is set.
- majorVersion (optional, default: True): Whether the model version you are creating should be a major version update or a minor version update. If the previous model version is 2.3, a major version update would create the version 3.3, and a minor version update would create the version 2.4.
NOTE: All options specific to inference models or tasks are ignored if modelID is set- they
configure the base custom model
entity only. However, they are still required to keep in the
metadata file.
- targetName (required): a string with the column of your data that your model tries to predict.
- positiveClassLabel / negativeClassLabel: Required for binary models. If your model predicts the number 0, the negativeClassLabel dictates of your prediction that corresponds to.
- predictionThreshold: Optional for binary models. The cutoff point between 0 and 1 that represents which label will be chosen as the predicted label.
- trainOnProject (optional): A hash with the pid of a project you would like to train your new model or version on. If this is supplied, the code you supplied will start to run against this pid automagically.
The validation schema is used to define input and output requirements for the task (See detailed documentation in VALIDATION-SCHEMA.md). The validation is used to
communicate the acceptable inputs for the model along with the expected output. This will be verified when running drum fit
- typeSchema (optional): Top level dictionary that contains the input and output schema definitions
- input_requirements (optional): Specifications that apply to the models input. The specifications provided as a list.
- output_requirements (optional): Specifications that define the expected output of the model. The specifications provided as a list.
All specifications contain the following fields:
- field: which specification is being defined, one of
data_types
,sparse
,number_of_columns
- condition: defines how the values in the
value
field are used - value: A list or single value, depending upon the condition used
- condition: "EQUALS", "IN", "NOT_EQUALS", "NOT_IN"
- value: "NUM", "TXT", "CAT", "IMG", "DATE", "DATE_DURATION", "COUNT_DICT", "GEO",
- condition: "EQUALS"
- value: "FORBIDDEN", "SUPPORTED", "REQUIRED"
- condition: "EQUALS"
- value: "NEVER", "DYNAMIC", "ALWAYS", "IDENTITY"
- condition: "EQUALS", "IN", "NOT_EQUALS", "NOT_IN", "GREATER_THAN", "LESS_THAN", "NOT_GREATER_THAN", "NOT_LESS_THAN"
- value: Integer value >= 0 (for value of 0 only GREATER_THAN is allowed)