-
Notifications
You must be signed in to change notification settings - Fork 108
Shifu 0.2.4 SPDT Binning Algorithm
Zhang Pengshan (David) edited this page Apr 8, 2015
·
4 revisions
SPDT is a new method in 'stats' step to improve scalability.
Set binningMethod in ModelConfig.json to 'SPDT'.
In each mapper, all columns are split into thousands of histograms. Then get local bins in each mapper and local bins will be merged into global bins to get global bin boundaries. Using new bin boundaries to join with column data to compute statistics.
TODO paper url