To predict
- Make random matrix
$\mathbf W_1$ - Return the least squares estimate with feature vector
$\sigma\left(\mathbf W_1 \mathbf x\right)$ , where$\sigma$ is an activation function.
Turns out this has already been done here. That version seems more compliant with sklearn conventions, but doesn't seem to have a regularisation option. Regularisation is particularly interesting in that it allows for model overparameterisation (more features than training samples).