This repository has been archived by the owner on Feb 21, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
a867642
commit 95ce5ac
Showing
1 changed file
with
140 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,140 @@ | ||
# Chapter 7: Transactions | ||
|
||
Transactions are an abstraction layer that allows an application to pretend that certain concurrency problems and | ||
certain kinds of hardware and software faults don't exist. | ||
|
||
## ACID: Atomicity, Consistency, Isolation, and Durability | ||
|
||
Systems that do not meet the ACID criteria are sometimes called BASE, which stands for Basically Available, Soft state, | ||
and Eventual consistency. | ||
|
||
The ability to abort a transaction on error and have all writes from that transaction discarded is the defining feature | ||
of ACID _atomicity_. In other words, if an error occurs halfway through a sequence of writes, the transaction should | ||
be aborted, and the writes made up to that point should be discarded. This means the database saves developers from | ||
having to worry about partial failure, by giving an all-or-nothing guarantee. | ||
|
||
In the context of ACID, _consistency_ refers to an application-specific notion of the database being in a “good state”. | ||
|
||
_Isolation_ in the sense of ACID means that concurrently executing transactions are isolated from each other: they | ||
cannot step on each other's toes. In other words, concurrently running transactions shouldn’t interfere with each other. | ||
For example, if one transaction makes several writes, then another transaction should see either all or none of those | ||
writes, but not some subset. | ||
|
||
_Durability_ is the promise that once a transaction has committed successfully, any data it has written will not be | ||
forgotten, even if there is a hardware fault or the database crashes. | ||
|
||
## Race conditions | ||
|
||
- Dirty reads: One client reads another client's writes before they have been committed. The read committed isolation | ||
level and stronger levels prevent dirty reads. | ||
- Dirty writes: One client overwrites data that another client has written, but not yet committed.Almost all transaction | ||
implementations prevent dirty writes. | ||
- Read skew (nonrepeatable reads): A client sees different parts of the database at different points in time. This issue | ||
is most commonly prevented with snapshot isolation, which allows a transaction to read from a consistent snapshot at | ||
one point in time. It is usually implemented with multi-version concurrency control (MVCC). | ||
- Lost updates: Two clients concurrently perform a read-modify-write cycle. One overwrites the other's write without | ||
incorporating its changes, so data is lost. Some implementations of snapshot isolation prevent this anomaly | ||
automatically, while others require a manual lock (`SELECT FOR UPDATE`). | ||
- Write skew: A transaction reads something, makes a decision based on the value it saw, and writes the decision to the | ||
database. However, by the time the write is made, the premise of the decision is no longer true. Only serializable | ||
isolation prevents this anomaly. | ||
- Phantom reads: A transaction reads objects that match some search condition. Another client makes a write that affects | ||
the results of that search. Snapshot isolation prevents straightforward phantom reads, but phantoms in the context of | ||
write skew require special treatment, such as index-range locks. | ||
|
||
## Approaches to implementing serializable transactions | ||
|
||
- Literally executing transactions in a serial order: If you can make each transaction very fast to execute, and the | ||
transaction throughput is low enough to process on a single CPU core, this is a simple and effective option. | ||
- Two-phase locking: For decades this has been the standard way of implementing serializability, but many applications | ||
avoid using it because of its performance characteristics. | ||
- Serializable snapshot isolation (SSI): Algorithm that avoids most of the downsides of the previous approaches. It uses | ||
an optimistic approach, allowing transactions to proceed without blocking. When a transaction wants to commit, it is | ||
checked, and it is aborted if the execution was not serializable. | ||
|
||
## Weak Isolation Levels | ||
|
||
By providing transaction isolation, databases have tried to hide concurrency issues from application developers. | ||
|
||
Serializable isolation means that the database guarantees that transactions have the same effect as if they ran | ||
serially (i.e., one at a time, without any concurrency). | ||
|
||
Weaker levels of isolation protects against some concurrency issues, but not all. | ||
|
||
### Read Committed | ||
|
||
#### Guarantees: | ||
|
||
1. When reading from the database, you will only see data that has been committed (no dirty reads). | ||
2. When writing to the database, you will only overwrite data that has been committed (no dirty writes). | ||
|
||
It is the default setting in some databases such as Oracle 11g, PostgreSQL, SQL Server 2012 and MemSQL. | ||
|
||
### Snapshot Isolation and Repeatable Read | ||
|
||
_Snapshot isolation_ is called _serializable_ in Oracle and _repeatable read_ in PostgreSQL and MySQL. | ||
|
||
Nonrepeatable read or read skew: anomaly that happens when the same read to the database at different points in time | ||
returns different values within the same transaction (without being committed yet). This is considered acceptable | ||
under read committed isolation. | ||
|
||
_Snapshot isolation_ is the most common solution to the previously described anomaly. The idea is that each transaction | ||
reads from a _consistent snapshot_ of the database. | ||
|
||
From a performance point of view, a key principle of snapshot isolation is _readers never block writers_, and | ||
_writers never block readers_. | ||
|
||
To implement snapshot isolation, databases must potentially keep several different committed versions of an object, | ||
because various in-progress transactions may need to see the state of the database at different points in time. This | ||
technique is known as _multi-version concurrency control_ (MVCC). | ||
|
||
Storage engines that support snapshot isolation typically use MVCC for their read committed isolation level as well. | ||
|
||
### Preventing Lost Updates | ||
|
||
The lost update problem can occur when two transactions, concurrently, read some value from the database, modifies it, | ||
and writes back the modified value -- one of the modifications can be lost, because the second write does not include | ||
the first modification. Examples: | ||
|
||
- Incrementing a counter or updating an account balance | ||
- Making a local change to a complex value, e.g., adding an element to a list within a JSON document | ||
- Two users editing a wiki page at the same time, where each user saves their changes by sending the entire page | ||
contents to the server, overwriting whatever is currently in the database | ||
|
||
#### Possible solutions | ||
|
||
- Atomic write operations | ||
- Explicit locking | ||
- Automatically detecting lost updates (PostgreSQL's _repeatable read_, Oracle's _serializable_, and SQL Server's | ||
_snapshot isolation_ levels automatically detect when a lost update has occurred and abort the offending transaction) | ||
- Compare-and-set (in some databases) | ||
- Conflict resolution and replication | ||
|
||
In replicated databases some additional steps need to be taken to prevent lost updates. | ||
|
||
### Write Skew and Phantoms | ||
|
||
A write skew can be seen as a generalization of the lost update problem. It can occur if two transactions read the same | ||
objects, and then update some of those objects (different transactions may update different objects). | ||
|
||
## Serial execution | ||
|
||
Serial execution of transactions has become a viable way of achieving serializable isolation within certain constraints: | ||
|
||
- Every transaction must be small and fast, because it takes only one slow transaction to stall all transaction | ||
processing. | ||
- It is limited to use cases where the active dataset can fit in memory. Rarely accessed data could potentially be moved | ||
to disk, but if it needed to be accessed in a single-threaded transaction, the system would get very slow. | ||
- Write throughput must be low enough to be handled on a single CPU core, or else transactions need to be partitioned | ||
without requiring cross-partition coordination. | ||
- Cross-partition transactions are possible, but there is a hard limit to the extent to which they can be used. | ||
|
||
## Two-Phase Locking (2PL) | ||
|
||
Several transactions are allowed to concurrently read the same object as long as nobody is writing to it. But as soon | ||
as anyone wants to write (modify or delete) an object, exclusive access is required: | ||
|
||
- If transaction A has read an object and transaction B wants to write to that object, B must wait until A commits or | ||
aborts before it can continue. This ensures that B can't change the object unexpectedly behind A's back. | ||
- If transaction A has written an object and transaction B wants to read that object, B must wait until A commits or | ||
aborts before it can continue. Reading an old version of the object, is not acceptable under 2PL. |