-
Notifications
You must be signed in to change notification settings - Fork 108
Shifu 0.2.5 Stats Step Scalability Improvement
Zhang Pengshan (David) edited this page Apr 3, 2015
·
1 revision
Default stats algorithm in Shifu 0.2.4 is 'SPDT'. While with big data in 100MM records and 1800 variables, stats job is failed. The reason is last Hadoop job cannot be scaled out well.
In Shifu 0.2.5, new stats algorithm 'SPDTI' can scale very well. In Shifu 0.2.4, for 22MM records, running time is 50 minutes while in Shifu 0.2.5, the number is 20minutes. 100MM records with 1800 variables are also being tested. The running time is only 30 minutes.