-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle interaction between gang blocks, copies, and FDT. #17123
base: master
Are you sure you want to change the base?
Conversation
4388003
to
1e63fa6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not deeply familiar with ganging code, but I have doubts that zp_must_gang
is viable. As I understand, to provide full redundancy we must not just provide some ganged pointer, but allocate all the gang children of exactly the same sizes and extend all existing gang header copies with those DVAs, which we can not do without rewriting them also. I think the only thing we can realistically do is prevent addition of copies writing and so dedup in general in that case.
As we discussed in the issue, the goal here isn't really to provide 100% of the redundancy the user asked for. That's challenging when you need to read in the entire gang tree to even determine what redundancy level is currently provided. The goal of this PR is just to prevent the creation of mixed gang/non-gang BPs, which are illegal in ZFS. That can be done even if the two gang trees have different depths/layouts. |
No, I think it can not. You can not have in one block pointer 3 DVAs with non-identical content. |
Hm, I see what you mean. It partially works for reads; if first level of gang headers is corrupt on the earlier DVAs, it'll read in the ones for the later DVAs, but not if anything else is wrong. And frees will free all the gang headers at the first level (because it has to free all the DVAs of each BP it knows about) but it won't know about the child BPs in any gang headers but the first one. I think we could probably write additional copies of the gang header we already have in the BP? But I don't know if that's actually better than just not deduplicating the write. |
I don't think anything not overwriting the existing gang headers can be correct. And I don't think we can safely overwrite them. Not mentioning we'd have to allocate new block in exactly the same chunks, which would be quite a pain.
That is why I proposed not deduplicating it. Not perfect, but a simple choice for a pretty rare scenario. |
1e63fa6
to
a6bb723
Compare
The redundant_metadata setting in ZFS allows users to trade resilience for performance and space savings. This applies to all data and metadata blocks in zfs, with one exception: gang blocks. Gang blocks currently just take the copies property of the IO being ganged and, if it's 1, sets it to 2. This means that we always make at least two copies of a gang header, which is good for resilience. However, if the users care more about performance than resilience, their gang blocks will be even more of a penalty than usual. We add logic to calculate the number of gang headers copies directly, and store it as a separate IO property. This is stored in the IO properties and not calculated when we decide to gang because by that point we may not have easy access to the relevant information about what kind of block is being stored. We also check the redundant_metadata property when doing so, and use that to decide whether to store an extra copy of the gang headers, compared to the underlying blocks. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Paul Dagnelie <[email protected]>
With the advent of fast dedup, there are no longer separate dedup tables for different copies values. There is now logic that will add DVAs to the dedup table entry if more copies are needed for new writes. However, this interacts poorly with ganging. There are two different cases that can result in mixed gang/non-gang BPs, which are illegal in ZFS. This change forces updates of existing FDT entries to only add more DVAs of the same type; if there are already gang DVAs in the FDT, we require that the new allocations also be gangs. if there are non-gang DVAs in the FDT, then this allocation may not be gangs. If it would gang, we have to redo the whole write as a non-dedup write. Sponsored by: iXsystems, Inc. Sponsored-by: Klara, Inc. Signed-off-by: Paul Dagnelie <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
ce38507
to
2747abe
Compare
I ran into an exciting one with this last commit. The bug presents as follows:
I'm pretty sure this bug dates back to the start of FDT, and there may be more refcount-loss bugs like this lurking; I'm going to look into that next. The fix in this case is to add the references to |
This change is built on top of #17073, because the changes to add the gang_copies property interact tightly with this change, in addition to the gang blocks test group.
Motivation and Context
See #17070. tl;dr: If you have ganging and multiple different values of the copies property while you use dedup, you can sometimes end up with BPs with a mix of gang and non-gang DVAs. That is illegal in ZFS: in debug mode, it'll trip an assertion when you're syncing the write out. In non-debug mode, it will make it to disk, but cause silent and confusing errors later when BP_IS_GANG is only partially correct.
Note that this change doesn't address cases 2 and 4 from #17070; those are trickier to deal with, and are also less problematic for users. Occasionally not getting as many copies as you asked for is a much less severe bug than silently invalid blkptrs. It also doesn't address the nopwrite case.
Description
This change forces updates of existing FDT entries to only add more DVAs of the same type; if there are already gang DVAs in the FDT, we require that the new allocations also be gangs. if there are non-gang DVAs in the FDT, then this allocation may not be gangs. If it would gang, we have to redo the whole write as a non-dedup write.
This change also contains a new test that covers the relevant cases. I verified that without the patch, it kernel panics when running the test.
How Has This Been Tested?
The added test, plus manual tests/
Types of changes
Checklist:
Signed-off-by
.