Skip to content

Shifu 0.2.4 SPDT Binning Algorithm

Zhang Pengshan (David) edited this page Apr 8, 2015 · 4 revisions

What is SPDT?

SPDT is a new method in 'stats' step to improve scalability.

How to Enable?

Set binningMethod in ModelConfig.json to 'SPDT'.

How does SPDT do 'stats'?

In each mapper, all columns are split into thousands of histograms. Then get local bins in each mapper and local bins will be merged into global bins to get global bin boundaries. Using new bin boundaries to join with column data to compute statistics.

References

TODO paper url

Clone this wiki locally