Skip to content

Shifu 0.2.5 Eval Step Scalability Improvement

Zhang Pengshan (David) edited this page Apr 3, 2015 · 1 revision

In Shifu 0.2.4, 20 MM records in eval step, the running time is about 20 minutes. We found in the eval step, eval model scores are stored into HDFS and then read to get eval report. For some cases, we don't need to store such big data into HDFS.

One PR https://github.com/ShifuML/shifu/pull/74 from zhang7575 fix this issue. By default score dating is not persisted to HDFS in Shifu 0.2.5. This drops running time from 20 minutes to only 12 minutes with the same eval data set.

Clone this wiki locally