Skip to content
Martin Popel edited this page Oct 28, 2015 · 2 revisions

Plans for a new vw-hyperopt script

We have vw-hypersearch, but it can handle only one hyperparameter and the golden-section search works only for unimodal (e.g. convex) functions.

vw-experiment

vw-experiment is a simple script, which computes test and train loss. It will be used from vw-hyperopt, but it is useful by itself.

vw-experiment 
  --train=train.dat
  --test=test.dat
  --vw=../vw
  --train_loss_examples=1e5

vw-hyperopt

Example usage:

vw-hyperopt --train=train.dat --test=test.dat \
  vw --loss_function=[hinge,logistic,squared] \
  --l1=[1e-10..0.005]L -q=[ff]O -b=[18..23]IO --passes=[2,4,8]O

Semantics:

  • [a,b,c] ... try the listed values (numbers or strings) for a given parameter
  • [a,b,c]O ... try also omitting the parameter
  • [min..max] ... range of real values
  • [min..max]I ... range of integer values
  • [min..max]L ... range of real values with logarithmic scale
  • [min..max]O ... try also omitting the parameter
  • modifiers I, L and O can be combined

VW parameters with special handling: ALWAYS:

  • -c --cache is always added for speedup

FORBIDDEN:

  • -k --kill_cache is not forwarded to vw (but the cache file is deleted)
  • -d --data is overriden by --train and --test
  • -t --testonly
  • -f --final_regressor
  • -a --audit
  • --readable_model arg
  • --invert_hash arg

QUESTIONABLE:

  • -i --initial_regressor
  • --holdout_off
  • --save_resume
  • --cache_file

vw-hyperopt parameters:

  • --train training data [required]
  • --test development test data [recommended]
  • --train_loss_examples=N number of examples for computing train loss (via vw --examples -t -d train.dat). 0 means do not compute train loss. "all" means use the whole train.dat. Default is 100,000.
  • --save_models all/only the best
  • --save_logs
  • --jobs=N ... N parallel jobs, default=autodetect based on number of cores
  • --noise compute also the irreducible error (loss) via vw-overfit
  • --plot tikz,png
  • --search exhaustive, random,... We could have also --randseed, --timeout, --rounds (of hill-climbing)

Related links:

Drawing plots

It would be nice if vw-hyperopt could produce (e.g. png) plots with:

  • test loss if --test
  • train loss if --train_loss_examples
  • irreducible error (loss) if --noise
  • progressive validation error if --pve
  • time_train if --time_train
  • time_test if --time_test

My understanding of Variance-Bias Tradeoff

(Note that "corresponds to" here means "is an estimate of".) Train loss corresponds to Bias^2 + noise. Test loss corresponds to Bias^2 + noise + Variance. The difference between train loss and test loss corresponds to the Variance. The amount of Variance corresponds to the amount of over-training. (http://scott.fortmann-roe.com/docs/BiasVariance.html)

Rationale:

If test loss curve is close to the noise cure, no more hyperparameter tuning can help. You must add new features to the train data.

If over-training is the problem, there are several things you can do about it:

  • get more training data
  • apply (higher) regularization (--l1 or --l2)
  • try bagging with -B
  • restrain the options below for fighting high Bias (except the first one or two)

If high Bias is the problem (i.e. underfitting):

  • make sure the training data is shuffled
  • higher -b (--bit_precision)
  • lower/no regularization
  • more --passes or higher --learning_rate
  • get more features, either truly new features or nonlinear combinations via --quadratic, --cubic, --stage_poly, --lrq, --ngram, --nn etc.

TODO: use https://metacpan.org/pod/Parallel::ForkManager