v 0.4 beta
Note that this is a major update, almost all code has been re-written. However, due to the amount of changes we have been unable to add all functionality from the previous versions. Main things that are now missing are:
- multiseq task type
- seq2seq task type
- pearson correlation metric
- specify layers
- predict-more.py to load the model once and predict on multiple files
- --resume to resume training if it is interrupted
- --raw to run a model on raw text
- label balancing
Some new functionality:
- Much easier debugging and adding of functionality
- Regression task type
- Better topn output support
- No need to install AllenNLP
- Print graphs of scores after each epoch
- Renamed validation_data_set to dev_data_set (the only difference in usage)
- Counts the number of UNKS and prints dataset statistics
- Can now also use autoregressive language models (at least the ones with a special token in position 0)
- Automatically detects size of language model
- Print machamp asci art only once
- Fixed bug with macro-f1, which used a score of 0 for the padding label before.
- Almost all code is now documented
It should be noted that this version is less thoroughly tested than our previous versions, which were mostly incremental to each other and used in countless experiments.