-
Notifications
You must be signed in to change notification settings - Fork 0
vw hyperopt plans
We have vw-hypersearch, but it can handle only one hyperparameter and the golden-section search works only for unimodal (e.g. convex) functions.
vw-experiment is a simple script, which computes test and train loss. It will be used from vw-hyperopt
, but it is useful by itself.
vw-experiment
--train=train.dat
--test=test.dat
--vw=../vw
--train_loss_examples=1e5
Example usage:
vw-hyperopt --train=train.dat --test=test.dat \
vw --loss_function=[hinge,logistic,squared] \
--l1=[1e-10..0.005]L -q=[ff]O -b=[18..23]IO --passes=[2,4,8]O
Semantics:
-
[a,b,c]
... try the listed values (numbers or strings) for a given parameter -
[a,b,c]O
... try also omitting the parameter -
[min..max]
... range of real values -
[min..max]I
... range of integer values -
[min..max]L
... range of real values with logarithmic scale -
[min..max]O
... try also omitting the parameter - modifiers I, L and O can be combined
VW parameters with special handling:
ALWAYS:
-
-c --cache
is always added for speedup
FORBIDDEN:
-
-k --kill_cache
is not forwarded to vw (but the cache file is deleted) -
-d --data
is overriden by --train and --test -t --testonly
-f --final_regressor
-a --audit
--readable_model arg
--invert_hash arg
QUESTIONABLE:
-i --initial_regressor
--holdout_off
--save_resume
--cache_file
vw-hyperopt parameters:
-
--train
training data [required] -
--test
development test data [recommended] -
--train_loss_examples=N
number of examples for computing train loss (viavw --examples -t -d train.dat
). 0 means do not compute train loss. "all" means use the whole train.dat. Default is 100,000. -
--save_models
all/only the best --save_logs
-
--jobs=N
... N parallel jobs, default=autodetect based on number of cores -
--noise
compute also the irreducible error (loss) via vw-overfit -
--plot
tikz,png -
--search
exhaustive, random,... We could have also--randseed
,--timeout
,--rounds
(of hill-climbing)
- http://fastml.com/optimizing-hyperparams-with-hyperopt/
- http://www.eng.uwaterloo.ca/~jbergstr/research.html#modelsearch
- http://nlpers.blogspot.cz/2014/10/hyperparameter-search-bayesian.html
- https://github.com/HIPS/Spearmint
- http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
It would be nice if vw-hyperopt
could produce (e.g. png) plots with:
- test loss if
--test
- train loss if
--train_loss_examples
- irreducible error (loss) if
--noise
- progressive validation error if
--pve
- time_train if
--time_train
- time_test if
--time_test
(Note that "corresponds to" here means "is an estimate of".) Train loss corresponds to Bias^2 + noise. Test loss corresponds to Bias^2 + noise + Variance. The difference between train loss and test loss corresponds to the Variance. The amount of Variance corresponds to the amount of over-training. (http://scott.fortmann-roe.com/docs/BiasVariance.html)
If test loss curve is close to the noise cure, no more hyperparameter tuning can help. You must add new features to the train data.
If over-training is the problem, there are several things you can do about it:
- get more training data
- apply (higher) regularization (
--l1
or--l2
) - try bagging with
-B
- restrain the options below for fighting high Bias (except the first one or two)
If high Bias is the problem (i.e. underfitting):
- make sure the training data is shuffled
- higher
-b
(--bit_precision
) - lower/no regularization
- more
--passes
or higher--learning_rate
- get more features, either truly new features or nonlinear combinations via
--quadratic
,--cubic
,--stage_poly
,--lrq
,--ngram
,--nn
etc.