A wrapper for Rubix ML to make it very approachable
Example:
$report = RubixService::train($data, 'column_with_label');
Where column_with_label
is the key of the multi dimensional array $data
that contains the value that you want to predict.
Let's make a simple example:
$apartment_data = [
['space_m2' => 10, 'price' => 100],
['space_m2' => 20, 'price' => 200],
['space_m2' => 30, 'price' => 300],
['space_m2' => 40, 'price' => 400],
//...
['space_m2' => 280, 'price' => 2800],
['space_m2' => 290, 'price' => 2900],
['space_m2' => 300, 'price' => 3000],
];
$report = RubixService::train($apartment_data, 'price');
var_export($report);
/*
array (
'mean absolute error' => 68.88888888888889,
...
'r squared' => 0.9796739130434783,
...
)
*/
$prediction = RubixService::predict(['space_m2' => 250]);
//$prediciton ~2440
See full example of above code here
Mean absolute error is basically the actual error you can expect in average. So in average if trying to predict an apartment given the space, you'd be off, in average, by 68.88$
r squared
on the other hand gives more of a feeling how good the algorithm is in %. A high r squared means it works well. For categorical features like cat
or dog
a different report is returned
RubixService::train()
will use a default estimator (machine learning algorithm) depending on the data. If you want to choose a different estimator I recommend reading here
rubix ml choosing an estimator
Notice: Neural network is called Multilayer Perceptron in Rubix. Linear regression is called Ridge.
Per default it uses K-d Neighbors or K-d Neighbors Regressor
RubixService::train()
takes as well transformers
In detail RubixService:train()
does
- shuffle of
$data
- train against 70% of
$data
- test against 30% of
$data
You can change that behaviour by using the argument train_part_size
e.g. if you want to train on 80%, and test on 20% you would do RubixService::train(... train_part_size: 0.8)
.