Skip to content

Latest commit

 

History

History
52 lines (46 loc) · 3.3 KB

scaling_up.md

File metadata and controls

52 lines (46 loc) · 3.3 KB

Scaling up the split-apply-combine paradigm

The split-apply-combine philosophy for data tables scales up to much larger problems. Sometimes the tables themselves grow too big for one computer (or CPU); other times the tables may be reasonably sized, but the tasks are numerous. This split-apply-combine at scale has been called map-reduce, hadoop and high throughput computing. The idea is to take a big task; split it into many (1000s or millions) small tasks; send each of those to a separate CPU; and combine the results back together (reduce) once all are completed. These projects typically take 1000s of compute hours, and in some cases millions.

This works best when tasks are "parallel" -- can be done independently of each other, such as separate simulations from the same model system. Unfortunately, some loosely coupled tasks depend on each other and require more complicated arrangements, perhaps even needing high performance computing such as graphical CPUs, or GPUs.

High Throughput & High Performance Computing