Skip to content

Latency

Rohit edited this page Mar 16, 2017 · 7 revisions

In the Parallel Programming course we learned about:

  • Data Parallelism in the single machine, multi-core, multiprocessor world.
  • Parallel Collections as an implementation of this paradigm.

Here we will learn:

  • Data Parallelism in a distributed (multi node) setting
  • Distributed collections abstraction from Apache Spark as an implementation of this paradigm.

Because of the distribution, we have 2 new issues:

  1. Partial Failure: crash failures on a subset of machines in the cluster.
  2. Latency: network communication causes higher latency in some operations - cannot be masked and always present; impacts programming model as well as code directly as we try to reduce network communication.

Apache Spark stands out in the way it handles these issues.

Latency Numbers Every Programmer Should Know

Latency Comparison Numbers
--------------------------
L1 cache reference                           0.5 ns
Branch mispredict                            5   ns
L2 cache reference                           7   ns                      14x L1 cache
Mutex lock/unlock                           25   ns
Main memory reference                      100   ns                      20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy             3,000   ns        3 us
Send 1K bytes over 1 Gbps network       10,000   ns       10 us
Read 4K randomly from SSD*             150,000   ns      150 us          ~1GB/sec SSD
Read 1 MB sequentially from memory     250,000   ns      250 us
Round trip within same datacenter      500,000   ns      500 us
Read 1 MB sequentially from SSD*     1,000,000   ns    1,000 us    1 ms  ~1GB/sec SSD, 4X memory
Disk seek                           10,000,000   ns   10,000 us   10 ms  20x datacenter roundtrip
Read 1 MB sequentially from disk    20,000,000   ns   20,000 us   20 ms  80x memory, 20X SSD
Send packet US -> Europe -> US     150,000,000   ns  150,000 us  150 ms
  • See that reading 1MB sequentially from disk is 100x more expensive than reading 1MB sequentially from memory.
  • Also, sending packet over the network from US -> Europe -> US is a million times expensive than a main memory reference.
  • In general, most of the times,
    • memory operations = fastest
    • disk operations = slow
    • network operations = slowest