-
Notifications
You must be signed in to change notification settings - Fork 59
Latency
Rohit edited this page Mar 16, 2017
·
7 revisions
In the Parallel Programming course we learned about:
- Data Parallelism in the single machine, multi-core, multiprocessor world.
- Parallel Collections as an implementation of this paradigm.
Here we will learn:
- Data Parallelism in a distributed (multi node) setting
- Distributed collections abstraction from Apache Spark as an implementation of this paradigm.
Because of the distribution, we have 2 new issues:
- Partial Failure: crash failures on a subset of machines in the cluster.
- Latency: network communication causes higher latency in some operations - cannot be masked and always present; impacts programming model as well as code directly as we try to reduce network communication.
Apache Spark stands out in the way it handles these issues.
Latency Comparison Numbers
--------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
Read 1 MB sequentially from memory 250,000 ns 250 us
Round trip within same datacenter 500,000 ns 500 us
Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory
Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip
Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD
Send packet US -> Europe -> US 150,000,000 ns 150,000 us 150 ms
- See that reading 1MB sequentially from disk is 100x more expensive than reading 1MB sequentially from memory.
- Also, sending packet over the network from US -> Europe -> US is a million times expensive than a main memory reference.
- The general trend is:
- memory operations = fastest
- disk operations = slow
- network operations = slowest
[latency_numbers]
Lets multiply all these durations by a billion:
# Minute:
L1 cache reference 0.5 s One heart beat (0.5 s)
Branch mispredict 5 s Yawn
L2 cache reference 7 s Long yawn
Mutex lock/unlock 25 s Making a coffee
# Hour:
Main memory reference 100 s Brushing your teeth
Compress 1K bytes with Zippy 50 min One episode of a TV show (including ad breaks)
# Day
Send 2K bytes over 1 Gbps network 5.5 hr From lunch to end of work day
# Week
SSD random read 1.7 days A normal weekend
Read 1 MB sequentially from memory 2.9 days A long weekend
Round trip within same datacenter 5.8 days A medium vacation
Read 1 MB sequentially from SSD 11.6 days Waiting for almost 2 weeks for a delivery
# Year
Disk seek 16.5 weeks A semester in university
Read 1 MB sequentially from disk 7.8 months Almost producing a new human being
The above 2 together 1 year
# Decade
Send packet CA->Netherlands->CA 4.8 years Average time it takes to complete a bachelor's degree
Week 1
- Introduction
- Data Parallel to Distributed Data Parallel
- Latency
- RDDs: Spark's Distributed Collection
- RDDs: Transformation and Action
- Evaluation in Spark: Unlike Scala Collections!
- Cluster Topology Matters!
Week 2
- Reduction Operations (fold, foldLeft, aggregate)
- Pair RDDs
- Pair RDDs: Transformations and Actions
- Pair RDDs: Joins
Week 3
- Shuffling: What it is and why it's important
- Partitioning
- Optimizing with Partitioners
- Wide vs Narrow Dependencies
Week 4