Skip to content
This repository has been archived by the owner on Feb 21, 2024. It is now read-only.

Commit

Permalink
Add notes chapter 8
Browse files Browse the repository at this point in the history
  • Loading branch information
lealceldeiro committed Nov 6, 2023
1 parent 95ce5ac commit 07b075c
Showing 1 changed file with 27 additions and 0 deletions.
27 changes: 27 additions & 0 deletions DesigningDataIntensiveApplications/Chapter8/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Chapter 8: The Trouble with Distributed Systems

### Problems that can occur in distributed systems

- Whenever you try to send a packet over the network, it may be lost or arbitrarily delayed. Likewise, the reply may be
lost or delayed, so if you don’t get a reply, you have no idea whether the message got through.
- A node's clock may be significantly out of sync with other nodes (despite your best efforts to set up NTP), it may
suddenly jump forward or back in time, and relying on it is dangerous because you most likely don’t have a good
measure of your clock's error interval.
- A process may pause for a substantial amount of time at any point in its execution (perhaps due to a stop-the-world
garbage collector), be declared dead by other nodes, and then come back to life again without realizing that it was
paused.

Whenever software tries to do anything involving other nodes, there is the possibility that it may occasionally fail,
or randomly go slow, or not respond at all (and eventually time out).

To tolerate faults, the first step is to _detect_ them.

Major decisions cannot be safely made by a single node, so we require protocols that enlist help from other
nodes and try to get a quorum to agree.

It is possible to give hard real-time response guarantees and bounded delays in networks, but doing so is very
expensive and results in lower utilization of hardware resources.

Supercomputers assume reliable components and thus have to be stopped and restarted entirely when a component fails.
By contrast, distributed systems can run forever without being interrupted at the service level, because all faults and
maintenance can be handled at the node level (at least in theory).

0 comments on commit 07b075c

Please sign in to comment.