-
Notifications
You must be signed in to change notification settings - Fork 1
README
GOAL: This project will familiarize yourself with page replacement algorithms by implementing a cache layer on top of the file-based key-value store.
In this project, you're given a file-based key-value store (KVS) that supports two operations: SET and GET.
The SET operation takes two strings for its arguments: key
and value
. It creates a file named key
and writes value
for its content. If the file already exists, it overwrites the content.
The GET operation takes key
for its argument. It reads the file key
and returns its content. If the file does not exist, it returns the empty string.
kvs_base.c
implements the base version of the KVS, where every operation interacts with the file system. It provides kvs_base_get
and kvs_base_set
functions.
int kvs_base_set(kvs_base_t* kvs, const char* key, const char* value)
performs the SET operation. It opens key
in the directory (or creates it if doesn't exist) and writes value
. It returns SUCCESS
or FAILURE
(defined in constants.h
).
int kvs_base_get(kvs_base_t* kvs, const char* key, char* value)
performs the GET operation. It opens key
in the directory and reads its content into value
. It writes the empty string if the file does not exist. It returns SUCCESS
or FAILURE
.
The base version talks directly to the file system, which can be slow. We can implement an in-memory cache layer to handle some operations without I/O.
Because memory (RAM) is typically smaller than the disk, we cannot store everything in memory. In this project, the cached versions are initialized with the parameter capacity
limiting the number of key-value pairs stored in memory. For example, if capacity = 10, ten key-value pairs can be stored in memory.
If there are more key-value pairs than capacity
, we need to decide which pair to keep in memory and which to evict to disk.
The goal of this project is to implement cached versions of the KVS with three different cache replacement policies: FIFO, Clock, and LRU.
The cached versions should support the following operations:
- SET: The SET operation updates the entry associated with the key in memory. If there is an entry in the cache, update the value. If no entry is in memory, create a new entry in the cache with the new value. The value does not have to be persisted until FLUSH is called or the entry is evicted. Here, persisted means that the value stored on disk is updated or the new key is written to the disk.
- GET: The GET operation reads the value associated with the key. If the entry is cached, return the value. If the entry is not in the cache, use the base KVS to read the value from the disk.
- FLUSH: The FLUSH operation persists (i.e., updates or stores) the modified entries in memory to the disk.
You will need to implement the following files and functions
-
kvs_{fifo, clock, lru}.c
kvs_{fifo, clock, lru}_new
kvs_{fifo, clock, lru}_free
kvs_{fifo, clock, lru}_set
kvs_{fifo, clock, lru}_get
kvs_{fifo, clock, lru}_flush
The rest of this section describes the role of each function. The "fifo", "clock", or "lru" part specifies the replacement policy the key-value store should employ.
kvs_{fifo, clock, lru}_new
is the constructor of the cached key-value store (kvs_{fifo, clock, lru}_t
). It takes the pointer to the base kvs (kvs_base_t*
) and the capacity. To talk to the disk using kvs_base_set
and kvs_base_get
, use the provided instance of kvs_base_t
.
The constructor should allocate the necessary memory to implement the cache replacement strategy and return the pointer to the kvs_{fifo, clock, lru}_t
.
kvs_{fifo, clock, lru}_free
is the destructor. It should free dynamically allocated memory.
kvs_{fifo, clock, lru}_set
performs the SET operation. It takes the key and the value.
If the key is in memory (i.e., the cache), it should update the value in memory. If not, it should create a new pair in memory with the given key and value, so when the cache is flushed, it is persisted to the disk.
It should return SUCCESS
or FAILURE
.
kvs_{fifo, clock, lru}_get
performs the GET operation. It takes the key and pointer to the char array that it writes the value to.
If the key is in memory, it should write the value from memory to the given pointer.
If the key is not in memory, it calls kvs_base_get
to read the value from the disk. In addition to writing the value to the given pointer, it should cache the key-value pair (employing the specified replacement strategy if need be).
It should return SUCCESS
or FAILURE
.
kvs_{fifo, clock, lru}_flush
performs the FLUSH operation and saves all modified key-value pairs to the disk using kvs_base_set
.
It should return SUCCESS
or FAILURE
.
This section gives brief explanations of the three replacement policies.
The First-In First-Out replacement policy means that once the cache reaches capacity, the entry that was first added to the cache is removed and replaced by the new entry. So for example, suppose my cache size is 2 and consists of the following key-value pairs in the order they were added: ("file1.txt", "hey"), ("file2.txt", "this is another file")
. Then if I wanted to add another key-value entry ("file3.txt", "Another one")
, I would first need to evict the entry added first, which in this case was ("file1.txt", "hey")
, and replace it with the new entry, so the cache now reads: ("file3.txt", "Another one"), ("file2.txt", "this is another file")
.
To understand the clock replacement policy, consider a circular list (where moving past the tail wraps around to the head) with a cursor that points to an entry in the list, beginning at the head. Each entry has a "reference" bit associated with it, and when the cache reaches capacity, the following occurs:
- If the entry under the cursor has the "reference" bit set, it's cleared (set to 0) and the cursor is moved one place to the right
- If the entry under the cursor does not have the "reference" bit set (its value is 0), then the entry is replaced with the new entry and its reference bit is set.
This strategy takes into account the fact that, although an entry might be old, that doesn't mean it wasn't recently referenced (temporal locality). Note that this replacement policy means that all new entries in the cache begin with having their "reference" bits set. Also note that this policy means that if all the entries have their reference bits set, then they will all be unset and the entry that the cursor started on is ultimately the one replaced.
As an illustration, consider the following diagram for the cache where ^
represents the cursor:
("file1.txt", "hey") [1], ("file2.txt", "this is another file") [1]
^---------------------
Upon replacing an entry, the first one has its reference bit set, so it's then unset and the cursor moves over one place
("file1.txt", "hey") [0], ("file2.txt", "this is another file") [1]
^---------------------
The second entry also has its reference bit set, it's also set to 0 and the cursor wraps back around to the head
("file1.txt", "hey") [0], ("file2.txt", "this is another file") [0]
^---------------------
Since the cursor is now under an entry with a 0
reference bit, it's replaced with the new one that has its reference bit set to 1
and the cursor remains in place.
("file3.txt", "Another one") [1], ("file2.txt", "this is another file") [0]
^---------------------
See Chapter 3.4.5 on the textbook for more information. This video also provides an explanation of the policy.
The Least-Recenty Used replacement policy means that once the cache is full, the entry that was least recently accessed is the one that's replaced with the new entry. So for example, let the cache be the following with the numbers in brackets representing how recently the entry was accessed, with a larger number meaning more recent access:
("file1.txt", "hey") [1], ("file2.txt", "this is another file") [3], ("file3.txt", "Another one") [2]
In this case, the file1.txt
entry would be replaced by a new entry since it's the one that was least recently accessed with the lowest access value.
See Chapter 3.4.6 on the textbook for more information.
The starter code contains client.c
that provides a command-line interface to the key-value store.
./client DIRECTORY POLICY CAPACITY
-
DIRECTORY
specifies the directory used by the key-value store.client
creates the directory if it doesn't exist. -
POLICY
specifies the cache replacement policy. It can beNONE
,FIFO
,CLOCK
orLRU
. IfNONE
is used, it uses the base version of the KVS (with no cache) -
CAPACITY
specifies the size of the cache. IfPOLICY
isNONE
, this value is ignored.
The client reads commands from the standard input. The commands are given in the following format:
SET {KEY} {VALUE}
GET {KEY}
-
SET {KEY} {VALUE}
performs the SET operation. -
GET {KEY}
performs the GET operation and prints the value to the standard output.
When the standard output hits EOF (e.g., by hitting ctrl+D), client
performs the FLUSH operation to persist in-memory changes. Then, it prints statistics.
To illustrate, suppose that there are file1
and file2
from the earlier examples are in the directory "data".
.
├── client
├── data
│ ├── file1.txt (hey)
│ ├── file2.txt (this is another file)
│ └── file3.txt (Another one)
We launch the client with the FIFO policy and capacity = 2.
GET file1.txt # NOTE: Loads file1 to cache
hey
GET file2.txt # NOTE: Loads file2 to cache
this is another file
GET file3.txt # NOTE: Cache full. Evict file1 and load file3, replacing file1 in rhe cache
Another one
GET file3.txt # NOTE: file3 is in cache
Another one
GET file1.txt # NOTE: Evict file2 and load file1, replacing file2 in the cache
hey
GET COUNT (CACHE): 5
GET COUNT (DISK): 4
GET CACHE HIT RATE: 20.00%
SET COUNT (CACHE): 0
SET COUNT (DISK): 0
GET CACHE HIT RATE: nan%
-
GET COUNT (CACHE)
is the total number of GET operations. -
GET COUNT (DISK)
is the number of operations that used the disk. In this case, the 4th operation is done in memory, and the other 4 operations talked to the disk. -
GET CACHE HIT RATE
is the percentage of GET operations completed only in memory. In this example, only one operation was done in memory, so it is 1/5 = 20%. -
SET COUNT (CACHE)
,SET COUNT (DISK)
andGET CACHE HIT RATE
are the same for SET operations.- The hit rate shows
nan
when no operations are performed.
- The hit rate shows
- The maximum length for keys and values are
KVS_KEY_MAX
andKVS_VALUE_MAX
, respectively, defined inconstants.h
. You can assume that keys are valid as filenames and values consist of ASCII characters. - You can assume the key-value store is called on a single thread.
-
make
must createclient
. - All source files must be formatted using
clang-format
. Runmake format
to format.c
and.h
files. - Your implementation should not leak memory. Use
valgrind
to check memory errors.