-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LibOS] Use RW locks in the VMA tree #1795
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, all discussions resolved, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)
libos/src/bookkeep/libos_vma.c
line 106 at r1 (raw file):
static struct avl_tree vma_tree = {.cmp = vma_tree_cmp}; static struct libos_rwlock vma_tree_lock; static bool vma_tree_lock_created = false;
Technically all these variables must have the prefix g_
. But I didn't want to add this unrelated change in this PR.
libos/src/bookkeep/libos_vma.c
line 108 at r1 (raw file):
static bool vma_tree_lock_created = false; static inline void vma_rwlock_read_lock(struct libos_rwlock* l) {
It's important to use these wrappers because at the very startup, we don't have the lock because it wasn't yet created. But at startup we have only one thread, so the lock would be redundant anyway.
Note that we can't create the lock as the very first step, because creating the lock itself requires the memory subsystem (VMA) to be fully initialized. So we disable the locking first, init the VMA subsystem, then create the lock and only then the VMA can be used in thread-safe manner.
libos/src/bookkeep/libos_vma.c
line 146 at r1 (raw file):
#endif /* VMA code is supposed to use the vma_* wrappers of RW lock; hide the actual RW lock funcs */
Not sure if these define tricks are needed -- I wanted to make sure that future developers won't accidentally use rwlock_
functions but only wrappers.
libos/src/bookkeep/libos_vma.c
line 1267 at r1 (raw file):
vma_rwlock_read_lock(&vma_tree_lock); bool is_continuous = _traverse_vmas_in_range(begin, end, adj_visitor, &ctx);
FYI: This is the main perf optimization (hopefully).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)
a discussion (no related file):
We don't see any problems with the PR and it improves perf for some workloads (MySQL and MariaDB) and doesn't degrade perf for other workloads like NginX, Tensorflow, SpecPower, Tensorflow Serving and Openvino(Latency).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dismissed @vasanth-intel from a discussion.
Reviewable status: 0 of 1 files reviewed, all discussions resolved, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)
a discussion (no related file):
Previously, vasanth-intel wrote…
We don't see any problems with the PR and it improves perf for some workloads (MySQL and MariaDB) and doesn't degrade perf for other workloads like NginX, Tensorflow, SpecPower, Tensorflow Serving and Openvino(Latency).
Thank you @vasanth-intel for the performance evaluation!
We're done with internal testing (on the Intel side); this PR is moved from Draft to Ready for Review.
Jenkins, test this please (just for sanity) |
Multi-threaded workloads with many syscalls stress the VMA subsystem of LibOS, because almost all syscalls verify their buffers for read/write access using the functions `is_user_memory_readable()`, `is_user_memory_writable()`, etc. All these functions end up in VMA-specific `is_in_adjacent_user_vmas()` that grabs a global VMA lock. On some multi-threaded apps like MongoDB, this lock contention becomes the performance bottleneck. This commit tries to remove this bottleneck by switching from a spinlock to the Read-Write (RW) lock. The intuition is that most of the time, a read-only `is_in_adjacent_user_vmas()` func is called, which now uses the read lock. Signed-off-by: Dmitrii Kuvaiskii <[email protected]>
c208c3c
to
b27f24e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)
libos/src/bookkeep/libos_vma.c
line 108 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
It's important to use these wrappers because at the very startup, we don't have the lock because it wasn't yet created. But at startup we have only one thread, so the lock would be redundant anyway.
Note that we can't create the lock as the very first step, because creating the lock itself requires the memory subsystem (VMA) to be fully initialized. So we disable the locking first, init the VMA subsystem, then create the lock and only then the VMA can be used in thread-safe manner.
can we add these explanations into the comments?
Signed-off-by: Dmitrii Kuvaiskii <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin)
libos/src/bookkeep/libos_vma.c
line 108 at r1 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
can we add these explanations into the comments?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners
a discussion (no related file):
@jkr0103 Could you point to the instructions on how to run those MongoDB and/or MySQL experiments that you did?
Instructions for MySQL:sudo chown -R $USER:$USER /var/lib/mysql-files sudo ln -s /etc/apparmor.d/usr.sbin.mysqld /etc/apparmor.d/disable/ sudo vim /etc/security/limits.conf
gramine-sgx mysqld --skip-log-bin --datadir /var/run/mysql-data numactl -N 0,1 -l gramine-sgx mysqld --skip-log-bin --datadir /var/run/mysql-data Sysbench Run:sudo apt install -y sysbench sysbench --db-driver=mysql --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-db=sbtest --time=90 sysbench --db-driver=mysql --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-db=sbtest --time=300 ======================================================================
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkr0103 Where can we take the MySQL and MongoDB makefiles/manifests for Gramine?
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners
You may find MySQL manifest amd makefiles here https://github.com/gramineproject/examples/pull/28/files#diff-3280189625504cf8039a0453946ea1b585f74319278ffd7abb72e4e71d019f57 For MongoDB: @vasanth-intel Please update here if it differ in your setup. |
Description of the changes
Multi-threaded workloads with many syscalls stress the VMA subsystem of LibOS, because almost all syscalls verify their buffers for read/write access using the functions
is_user_memory_readable()
,is_user_memory_writable()
, etc. All these functions end up in VMA-specificis_in_adjacent_user_vmas()
that grabs a global VMA lock. On some multi-threaded apps like MongoDB, this lock contention becomes the performance bottleneck.This commit tries to remove this bottleneck by switching from a spinlock to the Read-Write (RW) lock. The intuition is that most of the time, a read-only
is_in_adjacent_user_vmas()
func is called, which now uses the read lock.Fixes #1794.
How to test this PR?
CI for functionality testing. Manual benchmarks for perf testing.
My quick tests on Memcached, Blender and iperf show no visible change in performance. This makes sense: these workloads use no more than 4 threads, so almost no contention.
WIP: I asked to run big workloads.UPDATE 1: @jkr0103 reported these results (thanks!):
check_invalid_pointers = false
: 104,694 ops/seccheck_invalid_pointers = false
: 258,343 QPS (latency 4.95ms)This change is