use other db or system to cache metadata #2614

httaotao · 2021-07-14T02:05:14Z

httaotao
Jul 14, 2021

hi, I want to know how to use other db or system to cache metadata ? for example, Can we use elasticsearch 、leveldb or other systems to cache some directories or files metadata and improve read speed ? However, I can't find some posxi api to do that , So in the future, Do you have plan to do this ?

By the way , there has a chinese company to make a dcache xlator.

The metadata cache is added on the server side of glusterfs, and each server process only caches the directory items and metadata of the brick, without considering the consistency of multiple clients. The scheme is implemented based on glusterfs system, and a translator is added on the server side to cache directory items and metadata. When the user's readdir (P) requests are sent to this translator, the request is processed and returned directly according to the cache information, without further sending to the local file system. In this way, the cache is used to access the low-speed disk, and the interaction between user mode and kernel mode is reduced once to reduce the latency of LS.

If you want to know more detail, you can read this article.
TaoCloud Glusterfs current cache optimization

thanks.

xhernandez · 2021-07-16T14:46:12Z

xhernandez
Jul 16, 2021
Maintainer

There's a work in progress (#218) to provide a better cache layer in the client side while enforcing consistency at the same time.

Right now there's no way to hook anything external without implementing a new xlator that does the necessary translations. However, communicating with an external entity like elasticsearch will probably add some latency that I think should be avoided. It's true that for worst case scenarios, like some 'ls' cases, it will work better, but regular lookups could actually be slower when served by an external entity instead of directly from a caching xlator in the client side (or even from kernel cache if consistency could be enforced).

Based on the article, it seems that the only advantage of server side cache is that it doesn't need locks. However it still requires a network roundtrip, and probably to more than one brick, which will need synchronization in the client side anyway in some cases. I think that a consistent client side cache can provide much more benefits and performance improvements overall.

Note that Gluster cannot rely on kernel cache mostly because it cannot guarantee consistency. If consistency could be enforced in the client side, we could allow kernel to cache data and metadata for longer time, with probably a huge performance improvement. Right now we only allow kernel to cache data for 1 second and, in some cases, we even need to set it to 0, which takes not benefit at all from kernel cache.

1 reply

httaotao Jul 21, 2021
Author

I get it , so do you have a plan to introduce this part in dev-session? thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use other db or system to cache metadata #2614

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

use other db or system to cache metadata #2614

httaotao Jul 14, 2021

Replies: 1 comment · 1 reply

xhernandez Jul 16, 2021 Maintainer

httaotao Jul 21, 2021 Author

httaotao
Jul 14, 2021

Replies: 1 comment 1 reply

xhernandez
Jul 16, 2021
Maintainer

httaotao Jul 21, 2021
Author