Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First version of turbo cache #2

Open
wants to merge 165 commits into
base: main
Choose a base branch
from
Open

First version of turbo cache #2

wants to merge 165 commits into from

Conversation

andectionsharechat
Copy link
Member

@andectionsharechat andectionsharechat commented Oct 18, 2023

Context

It's a fast cache fork. It reduces contention by reducing the time spent under the mutex during writing.

Fast cache implementation

Fast cache implementation is pretty simple. The key idea is that all data is stored in an array of chunks of predefined size. All items are written sequentially by this structure: {encoded key&value lengths, key, value}.
There is an index (just a map) which helps to find the item index in the chunks. All operations are executed under Read and Write Mutex. All keys are spread by 512 buckets, which is calculated by hash(key) % bucket count

image

Turbo cache: overview

Turbocache extends fast cache by several changes

  1. On writing to turbo cache, all new items are put into a channel and processed in the background gorutine. The number of goroutine equals the number of buckets.
  2. The goroutine writes consequences for new items to specific flush chunks. It doesn't need mutex, as all changes happen in the same thread. Periodically, the turbo cache flushes all items to the main chunks
  3. Turbo cache uses specific index for reading items from the flush chunks as well
    image

Turbo cache: configuration

type Config struct {
	//max bytes for storing keys in chunks
	maxBytes int
	// flush intervals to writing keys to the chunks
	flushIntervalMillis int64
	//max batch size for writing in the chunks. batch size 1 make turbo cache to sync cache
	maxWriteBatch int
	//count of the accumulating buffers (chunks) before flush. Every flush chunk has 64KB
	flushChunkCount int
}

func NewConfig(maxBytes int, flushInterval int64, maxWriteBatch int, flushChunks int) *Config {
	.....
}

Principles of thread synchronisation during reading from the index

  1. All shared structures are arrays or splices with fixed-size
  2. On processing new items, all data are appended
  3. All mutations happen only in one goroutine
  4. Cleaning flush chunks && index happens under mutex-free lock

On new key

  1. Search by index for duplication. Code pointer
  2. Write keyValue in flush chunk. It's append-only to an already allocated array, with no memory recopying. Pointer
  3. Update value to chunk with memory barrier. Code pointer
  4. Add a new value to the index. Code pointer. For writing, we first update flushChunkIndex, index inside the array and then only hash (during reading, everything is checked in the opposite order).
  5. Memory barrier for index. Code pointer

On batch and cleaning flush chunks && index

this operation needs to be synchronised with reading from the index. The key idea is 2 step locking.

  1. Set flushing flag for preventing new readings from the index. Code pointer
  2. Take a spin lock for reading. Code pointer
  3. Cleaning index and flush chunks. Code pointer
  4. Release the flushing flag.
  5. Release the spinlock.

On index reading

Reading from index happens only on cache miss

  1. Check if the flushing flag is set. Code pointer
  2. Read index with memory barrier. Code pointer
  3. Search for the key hash in the index. Code pointer
  4. On success, try to acquire the spinlock. Code pointer
  5. On success, find an index to flush the chunk and read data from it. Code pointer

Pointer in the code

Turbo cache: writing implementation

  1. Determine bucket and write to the channel. code pointer
  2. How a background goroutine processes new key. code pointer
  3. How chunks are updated periodically. code pointer

Turbo cache: reading implementation

  1. reading from the index before acquiring mutex. code pointer

Turbo cache: testing

Tests

  1. Tests over public API + tests over updating keys without gourutine processes. All tests with different batch size
    All tests are running with --race argument

Turbo cache: benchmarking

  1. During developing, there were benchmarks with comparative CPU time over mutex
  2. It misses the comparison between sync fast cache and async cache

Turbo cache: QA

What's memory overhead?

  1. it's an overhead with flush chunks + index. Overhead: 512 (bucket count) * 64KB * flushChunkCount + maxBatchSize * 84B. It's about 33MB for flushChunkCount=1, maxBatchSize=128

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants