-
Notifications
You must be signed in to change notification settings - Fork 108
Filter Expressions Testing for Train Dataset or Eval Dataset
Hu Zhanghao edited this page Jul 3, 2019
·
4 revisions
In Shifu, the filter expression are supported to filter training dataset and eval dataset. The filter expression follows the standard - http://commons.apache.org/proper/commons-jexl/reference/syntax.html. But the expression couldn't be verified until user run some steps - like stats
, norm
, eval
. If the expression format is incorrect, or the variable in expression doesn't exists, it may bring unexpected result. For example, user may find logs like below:
Output(s):
Successfully stored 0 records (2180 bytes) in: "hdfs://.../..."
Counters:
Total records written : 0
Total bytes written : 2180
...
Since shifu-0.12.x, a test
command is added to test the filters in training dataset and eval dataset. The command is like
-
$ shifu test -fitler [EvalSetNames] [-n numOfRecords]
- If no
EvalSetNames
is specified, it will test the filter for training dataset - If need to test filters for multi eval set, just specify evalSet names with comma as delimiter -
EvalTest1,EvalTest2,EvalTest3
- By default,
test
command will test the filter expression against 100 records. If need to test on more records, use -n to change it. -
*
could be used as EvalSetNames. In that way, Shifu will test all possible filters in ModelConfig.json.
- If no
By leveraging the shifu test
command, the filter expression could be validated in very early stage.