Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement cache testing tool #253

Open
romange opened this issue Aug 23, 2022 · 7 comments
Open

implement cache testing tool #253

romange opened this issue Aug 23, 2022 · 7 comments

Comments

@romange
Copy link
Collaborator

romange commented Aug 23, 2022

The tool should be able to read traces from https://github.com/twitter/cache-trace
and send them to a redis endpoint.

the code should preferrably be structured in such way that we could easily add another trace format in the future.

The tool can probably be implemented in python since I guess we must send requests sequentially from a single connection anyway. Actually, I am not sure - the traces contain namespaces and if there are many of them, we could parallelize the flows and
then golang would be a better choice - some preliminary investigation is needed. These traces are pretty large so I would appreciate if we reduce the test run time.

The tool should provide hit/miss statistics by periodically checking INFO response and providing the final report at the end.

  • if we end up implementing the tool using golang, we should learn where to place it and where other multi-language projects put their golang code.
@romange
Copy link
Collaborator Author

romange commented Aug 26, 2022

another thing, it could be nice if it could also send synthetic traffic, without any files, probably using incrby command that will allow sending write-only traffic and still measure the hit rate.

@romange
Copy link
Collaborator Author

romange commented Sep 15, 2022

Lets start with the following tasks

  1. Implement a tool in python that sends a traffic distributed using zipfian distribution. I am not an expert in statistics, but I know many papers use zipf for skewed traffic when testing cache with alpha < 1. For some reason, default python libs do not seem to provide zip generator that fit these requirements.
    See https://stackoverflow.com/questions/1366984/generate-random-numbers-distributed-by-zipf/8788662#8788662
    and https://stackoverflow.com/questions/31027739/python-custom-zipf-number-generator-performing-poorly on how to work around this.
  2. The tool should accept alpha and N and send N incrby requests to a redis-like memory store.
    (If the response is 1 then you know it's a new key (miss), otherwise it's a hit).
  3. The tool should provide a hit/miss summary after the run is completed. Bonus points - to provide intermediate hit-ratio stats during the run by using terminal control sequences 💯
  4. Once we know the tool work, we can implement hits/misses tracking in Dragonfly. check out keyspace_hits and keyspace_misses metric in server_family.cc (similarly to redis). As you can see these are not implemented yet.
    I would guess that the right place to insert this tracking is inside DbSlice::FindExt function that is called by all other find functions. Obviously, hits/misses metrics should be equal to those that the tool counts.
  5. Once we have hits/misses tracking working, we can add to the tool support for twitter cahe traces aforementioned above. (Those do not necessarily use incrby so this is why we must have server-side stats).

Eventually, we will be able to run zipf/real-world traces against DF and Redis and compare their caching performance for the same memory usage.

@devangjhabakh
Copy link
Contributor

@romange i can take a jab at this!

@romange
Copy link
Collaborator Author

romange commented Dec 26, 2022

Thanks, we welcome contributions to the project! 🙏 Please implement items 1-3. we are interested to send zipfian distribution of keys [key:0 - key:N] like I mentioned in the issue. Here is java reference https://github.com/apavlo/h-store/blob/e49885293bf32dad701cb08a3394719d4f844a64/src/benchmarks/edu/brown/benchmark/ycsb/distributions/ZipfianGenerator.java#L41 but I am sure it's possible to find/copy python based implementations as well. And please ignore that cache-trace task.

@devangjhabakh
Copy link
Contributor

@romange looked through some papers using Zipf for Cache-related work, did you mean to say alpha < 1, not alpha < 0?

@romange
Copy link
Collaborator Author

romange commented Dec 27, 2022

Yes, alpha less than 1

@devangjhabakh
Copy link
Contributor

Hi @romange I have created a PR (#640), don't know why I can't seem to link it to this issue, perhaps because I'm not an assignee. Feel free to take a look whenever you get the chance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants