You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please acknowledge the following before creating a ticket
[ x] I have read the GitHub issues section of REPORTING-BUGS.
Description of the bug:
When running with huge FIO that involves multiple jobs with verify we run with serialize_overlap=1
then FIO aborts at some point with the error in the title.
After looking into the code I see 4 problems:
all mutex operation errors report errno instead of pthread_mutex_X return code, which is wrong -- pthread_mutex doesn't set errno, at least in debian xs86_64
the same error is reported from several functions and it makes it hard to identify which one really fired
after adding some MACRO wrapper with funcFILELINE I figured out that ioengines.c : td_io_queue fires the messages -- and indeed, it unlocks the lock BEFORE it finally enqueues the request, so it can actually return FIO_Q_BUSY and then rate-submit.c:io_workqueue_fn will submit it again. But the lock is already released
The issue is actually not only the lock -- when FIO_Q_BUSY was returned the io_u is cleared from its' td, so another worker can allocate that LBA, so prior to calling the td_io_queue again check_overlap should be executed again
The 1 and 2 are really simple, I can create a PR tomorrow.
But 3 and 4 require more attention -- just fixing the lock-unlock scheme is not enough, I have to create a good test that proves that indeed overlap conflict may happen during the requeue, and then -- if I push the check_overlap into the requeue loop, what is the performance impact of all this?
I'll do PR soon and we'll discuss it.
Please acknowledge the following before creating a ticket
Description of the bug:
When running with huge FIO that involves multiple jobs with verify we run with serialize_overlap=1
then FIO aborts at some point with the error in the title.
After looking into the code I see 4 problems:
Environment: debian x86_64
fio version: fio-3.37-86-g7bc1
Reproduction steps
[write-and-verify]
rw=randwrite
bs=4k
direct=1
ioengine=libaio
iodepth=128
verify=crc32c
verify_backlog=100000
verify_dump=1
verify_fatal=1
verify_async=4
serialize_overlap=1
io_submit_mode=offload
blocksize_range=4k-8k
runtime=6000
size=512m
numjobs=10
filename=/dev/nvme0n8:/dev/nvme0n7:/dev/nvme0n6:/dev/nvme0n5:/dev/nvme0n4:/dev/nvme0n3:/dev/nvme0n2:/dev/nvme0n1
The text was updated successfully, but these errors were encountered: