In a regression task, the model learns to predict numeric scores. An example is predicting the price of a stock on future days given past price history and other information about the company and the market.
This section will deal with two ways of measuring regression performance:
The most commonly used metric for regression tasks is RMSE (Root Mean Square Error). This is defined as the square root of the average squared distance between the actual score and the predicted score:
Here,
import graphlab as gl
y = gl.SArray([3.1, 2.4, 7.6, 1.9])
yhat = gl.SArray([4.1, 2.3, 7.4, 1.7])
print gl.evaluation.rmse(y, yhat)
0.522015325446
While RMSE is the most common metric, it can be hard to interpret. One alternative is to look at quantiles of the distribution of the absolute percentage errors. The Max-Error metric is the worst case error between the predicted value and the true value.
import graphlab as gl
y = gl.SArray([3.1, 2.4, 7.6, 1.9])
yhat = gl.SArray([4.1, 2.3, 7.4, 1.7])
print gl.evaluation.max_error(y, yhat)
1.0