Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kernel] First pass at adding media change support to kernel #1742

Merged
merged 4 commits into from
Oct 6, 2023

Conversation

ghaerr
Copy link
Owner

@ghaerr ghaerr commented Oct 5, 2023

This PR adds initial support for media change to the kernel, posted early for discussion. The support requires the use of the direct floppy driver, which uses the hardware DIR register to know whether floppy media has changed during operation.

While this first version is operational, it has quickly run into problems, detailed below. The support adds/renames two configuration settings in directfd.c: CHECK_DIR_REG and CHECK_DISK_CHANGE. The former turns on support to check the hardware DIR register for media change, along with the FDC extra code to move the head etc to turn the DISKCHG bit in the DIR register off. The latter option CHECK_DISK_CHANGE adds support for the kernel to call check_disk_change(dev) at various times to do something after the media has been noticed changed.

Because the direct floppy driver is interrupt-driven, and the various routines called to invalidate buffers, etc may sleep, the driver itself is limited in what it can do at interrupt time - thus the "upper level" requirement that the kernel call check_disk_change at various points in filesystem activity. This first pass has the kernel call before mount and block device open, but testing shows the need for calls before file read, write and probably directory operations as well (namei?).

Because all of the initial development is being done on QEMU for speed, a CTRL-P debug callback has been added to simulate the DIR register media change, since QEMU cannot be told of a media switch. The testing setup is booting to 1.2M direct floppy in drive A, and then mounting a floppy in drive B, then running various commands, issuing CTRL-P to fake a media change, then seeing the results.

After taking a hard look at the minimal BLOAT_FS support for media change in ELKS, along with support for media change in Linux 2.0, I'm finding a number of problems, which need to be discussed a bit in order to understand what kind of support can actually be made to work, given the complications.

Here's the initial list of issues:

  • The kernel can't support media change for the root device - instead, it will have to panic. This means the entire effort is only for the second floppy if booted off floppy.
  • Both Linux 2.0 and ELKS currently call invalidate_inodes and invalidate_buffers on disk change, which can become very problematic: if an inode is "in use", it can't be cleared, which means it may become intertwined with the new media inodes if they have the same number. And invalidate_buffers waits on any in-progress I/O to complete, which is no good, since the media has been removed.
  • Both the invalidate_ routines could be rewritten, but if I/O is in progress on a buffer, the direct floppy drive may go ahead and process an I/O request at the next interrupt to the new media anyway, causing data loss. Inodes could be force-cleared, but this could cause system instability if quite a few internal variables (like current directory and running processes with open inodes) aren't somehow handled.
  • Removed media that have mounted filesystems on top of them will cause big issues not easily solved. (Basic mounted media should be able to be handled by force-unmounting without too many ill effects).

If we take the approach of only handling basic cases, like trying to only handle media changes when I/O is not in progress, things become simpler, but I'm not sure they can be guaranteed to work well. Thus, the entire idea of kernel media change handling working reliably is probably not possible.

Here's what might possible to be made to work (most of the time). On media change:

  • Force unmount any filesystem on the block device. All buffers would be invalidated without waiting for I/O, and I/O cancelled if not started. All filesystem inodes could be force-cleared, but current directories or open inodes held by running processes may not be guaranteed to return an error on read/write, and instead the system could hang, depending on the exact state of the running or sleeping processes and the kernel's handling of invalid or duplicate inode pointers/numbers.
  • Somehow cancel all outstanding I/O on just the changed device. This would likely result in a bunch of kernel I/O error messages scrolling quickly on the console.
  • The block device would stay open; the current media type could either be reused, or probably better a new probe sequence started for the new media on the next I/O request.
  • Lots more code is added to the kernel: DIR register handling, reset/recalibrate head movement to reset it, disk change handling, rewriting both invalidate buffer and inode routines specially, etc.

@Mellvik, I know this sort of thing is probably interesting to you. What do you think about the kind of support that should be included for media change? Can you think of anything I haven't mentioned? Thank you!

NOTE: Much of the original kernel (not driver) code for media change is being deleted in this PR, since it can't be used: the old code tries to write out the prior super block (bad idea), then tries to read the superblock (what for, the filesystem can't be auto-mounted as the in-core superblock info won't match), as well as attempts a probe by reading the superblock (not needed, as the new direct floppy driver auto-probes on any sector read/write request).

This PR also contains a fix for the direct floppy driver access counting: previously, a separate access counter was used for each floppy drive. This caused problems with unregistering the IRQ etc when a sub-floppy drive was opened/closed, so an additional global access_count was added which tracks the total access count for the whole driver.

@Mellvik
Copy link
Contributor

Mellvik commented Oct 5, 2023

Wow @ghaerr - you've really taken a deep dive into this. Which got me thinking about what makes sense given the challenge, the complexity, the resources (code/ram) and the potential benefits (usefulness).

Some observations - from the TLVC perspective, which may or may not be different from ELKS:

  • We're talking accidental media change on a mounted drive - a catastrophe from the OS point of view. It cannot be prevented, so what we´re discussing is how to handle it.
  • IOW handling media change is an exercise in damage control.
  • How useful is media change detection? Is there anything to prevent, save, is it worth saving? Setting the metric for how much effort/complexity this is worth.
    • Root on floppy is the exception, not the rule, typically development and testing. A situation in which it's reasonable to assume that the user knows what s/he's doing. And is able to handle accidents. Your comments about root mounts apply (a panic on root media change is appropriate).
    • Also, the systems that are likely to run 'production mode' off of floppies are the earliest models (pre-AT) which don't have the DIR register in the first place - lout of reach for any damage control.
    • A competent user could enable root-mount protection by mounting root RO.
    • What are non-root floppy file systems used for? Except for HD-less systems - which again most likely don't have change detection anyway - for data transfer. Mount floppy - transfer files - umount, or 'sh.t, forgot the umount, just took out the floppy' (the DOS reflex). This is the easy and no doubt most common case. Easy because it's unlikely that there is any outstanding IO.

To me, these are the 'actions' that make sense:

  1. Shorten the SYNC interval for removable media to reduce the chances of (meta)data loss (5s, 10s ??).
  2. Possibly attempt recovery: Immediately set a RO flag on the device, trigger some sort of interactive procedure to allow the user to put the ejected media back in and continue. Given that some damage (lost metadata, lost data, corrupted data, crashed or terminated applications), this may not make any sense or just have too little benefit.
  3. Treat media removal as a 'hard error' and handle it just like other hard errors (sector not found, seek error etc.), but skip retrys. The driver returns ENXIO or ENODEV on all IO requests and keeps the error state until the (block) device has been (forced) closed (umount). The media is left in an unknown, possibly inconsistent state just like in the case of a power failure (or the mentioned hard errors). fsck will fix it later if possible, otherwise, that's the way it is. Nothing needs to be done for applications, they should handle read/write failures already. This protects new media from becoming part of the catastrophe.

I'm not sure 2) is possible with reasonable effort (unless made into an application level thing which tastes really bad), but if it is I think it would handle 90+% of the cases - inadvertently ejecting a floppy in use. If it happened during a write (no one can be that absentminded?), the media is most likely partly unreadable and must be formatted. Nothing to do about that. If during read, the app should have crashed or exited 'sanely', and there isn't much to recover.

There are some cleanup challenges - well covered in your discussion, @ghaerr. Purging the request queue is the easy part, just an extra line in the driver. The buffer level is more complicated - you got that covered.

What I plan to do short term for TLVC is to return ENODEV, purge the request queue and set an error state.

BTW - floppy change is supported in QEMU, I've used it quite a bit - I'm assuming it does trigger the DIR media change flag. Use the Machine pull down menu.

@ghaerr
Copy link
Owner Author

ghaerr commented Oct 5, 2023

Hello @Mellvik,

Thank you very much for your comments, they are very useful and exactly what I was looking for!

You've mentioned lots of information to consider in more detail. I'm thinking of taking your thirdly described approach first, of treating media removal as a hard error similar to other hard errors. Skipping the retries is a good idea and can allow for the idea that the request queue can be emptied normally by just calling end_request(0) but not displaying any I/O error messages.

I was initially thinking of keeping the block device open, but cancelling an active mount should act the same as a normal umount and close the block driver. A forced block device close (tricky at interrupt time but easy otherwise) might also allow the use of the zero reference count to immediately reject any queued remaining I/O requests without adding too much other code. Force-closing the device could also simplify the invalidate_buffers problem: rather than writing special code to purge buffers, an fsync_dev(dev) call could be made which would flush the buffer cache normally, except the I/O would all be silently ignored, as described above.

This "one day" project quickly turned into lots more work, and it got pretty late... I'll push more changes that attempt to meet a minimal goal of kernel damage control to allow it to keep running as reliably as possible returning errors to applications, and then deal with more complex scenarios individually, like zombie inodes, bad current directories, etc.

Thanks for the tip on QEMU floppy change - I didn't know that. I'll check to see whether/how it triggers the DIR media change bit.

@ghaerr
Copy link
Owner Author

ghaerr commented Oct 5, 2023

Thanks to some ideas from @Mellvik, the check_disk_change routine was completely rewritten, and now works quite well (on QEMU).

Instead of writing special versions of invalidate_inodes, invalidate_buffers and zeroing super-blocks, the global changed_floppies bit vector is used to (optionally silently) discard I/O within redo_fd_request, followed by calling the kernel do_umount function to properly unmount a filesystem and then the device is fully synced, with inodes and buffers invalidated, with the driver all the while discarding all I/O. The device is then closed and the changed_floppies bit vector is then cleared. This proved to be a far better approach and is essentially equivalent to "flushing" all the I/O to nowhere, allowed by an unmount and close.

Tested on QEMU and seems to work well. There may be an issue with the startup state of the DIR DSKCHG bit - the current code assumes the FDC starts with the bit OFF, and only sets it when media has been changed post-reset. This needs to be tested on real hardware to know for sure.

Calls to check_disk_changed are now done when reading or writing the filesystem, but testing showed this isn't always good enough to trigger the a media change, as buffered I/O won't cause actual FDC I/O. A more frequent check needs to be made, which requires an I/O request on the changed media.

When the FDC notices media changed, it now says df1: Disk media change detected, suspending I/O. Currently, discarded I/O are shown as I/O errors but can be easily turned off after testing. After the check_disk_change call is made (since it can't take place at interrupt time) and the filesystem unmounted, the message VFS: Unmounting <mount_dir>, media changed is displayed. Then the buffers are flushed and device closed, and the message VFS: Disk media change completed on dev 0401 displayed.

Thanks again for your comments @Mellvik!

@Mellvik
Copy link
Contributor

Mellvik commented Oct 6, 2023

A great piece of work, @ghaerr - and some interesting choices. I believe this goes a long way towards whatever damage control is actually possible in this setting.

A few questions:

  1. The introduction of another HW version related constant seems unnecessary. fdc_version >= FDC_TYPE_8272PC_AT && (inb(FD_DIR) & 0x80)) does the same as good old sys_caps & CAP_PC_AT. In addition, the former indicates that the FDC has anything to do with it, which it doesn't. (Yes, I know, we've been down that road before :-) ).

Calls to check_disk_changed are now done when reading or writing the filesystem, but testing showed this isn't always good enough to trigger the a media change, as buffered I/O won't cause actual FDC I/O. A more frequent check needs to be made, which requires an I/O request on the changed media.

I'm not following the logic with the check_disk_changed calls at various places. As you point out, a change can only be detected when there is an IO operation on the actual device. At that point, if media change is detected, it should trigger damage control mode immediately - signalled by the error returned via end_request() and the subsequent wakeup.
There may be substantial buildup of output blocks in the buffer system in the meanwhile, but there is no way to avoid that - except more 'risk prevention' via more frequent syncs - as has been discussed before.

@ghaerr
Copy link
Owner Author

ghaerr commented Oct 6, 2023

I'm not following the logic with the check_disk_changed calls at various places.

there is an IO operation on the actual device. At that point, if media change is detected, it should trigger damage control mode immediately

Yes, all I/O will be discarded by setting the disk bit in changed_floppies, but not actually discarded until the next call to redo_fd_request. That starts the "chained" draining of the request queue, which will then indicate to each of the waiting kernel processes that the I/O failed.

However - all of this is occuring during interrupt time - remember that even floppy_ready is running as an interrupt handler, along with all calls to redo_fd_request (except the first call from do_fd_request. We can't call general kernel procedures, like invalidate_buffers or do_umount from an interrupt handler, and that's why check_disk_change has to be called from normal kernel code, to clean things up. Thus there is a delay seen between the starting of I/O discarding and the final "media change completed" operation finish.

The problem I'm running into when trying to shorten the media change operation is that it may be awhile before the next I/O request on the changed media. I had stuck a call to check_disk_change in tty_intcheck (which processes signal-producing keystrokes) which worked well checking immediately on the next keystroke, until I realized I can't do that since tty_intcheck is called from interrupt handlers. (It seemed to work fine, but would scramble kernel buffers if for instance a keystroke occurred just when a buffer LRU list was being changed).

The check_disk_change also returns 1 if a media change has occurred, so that the pending I/O error is reported immediately to the process rather than having it also continue to issue I/O requests. By the time check_disk_change has returned, the I/O cleanup and inode/buffer invalidation are guaranteed complete, because of the wait_on_buffer calls that wait for the I/O request queue to be drained.

The introduction of another HW version related constant seems unnecessary.

Actually that code has been in the driver for weeks. Yes, this particular case is equivalent to sys_caps & CAP_PC_AT, but I am using a trick from the Linux 2.0 DF driver where they allow for >= comparisons between FDC chip types for features. Instead of using multiple compares, I essentially coded fdc_version as a virtual chip type, as at the time it wasn't certain that all PC/AT's would have a certain chip type. (We have a similar problem now, as ELKS is ported to certain non-PC compatible systems and sys_caps has to be specially coded in config.h - something I'd like to get rid of).

@Mellvik
Copy link
Contributor

Mellvik commented Oct 6, 2023

Interesting discussion @ghaerr!

However - all of this is occuring during interrupt time - remember that even floppy_ready is running as an interrupt handler, along with all calls to redo_fd_request (except the first call from do_fd_request. We can't call general kernel procedures, like invalidate_buffers or do_umount from an interrupt handler, and that's why check_disk_change has to be called from normal kernel code, to clean things up. Thus there is a delay seen between the starting of I/O discarding and the final "media change completed" operation finish.

That was actually my point:

At that point, if media change is detected, it should trigger damage control mode immediately - signalled by the error returned via end_request() and the subsequent wakeup.

The problem I'm running into when trying to shorten the media change operation is that it may be awhile before the next I/O request on the changed media. I had stuck a call to check_disk_change in tty_intcheck (which processes signal-producing keystrokes) which worked well checking immediately on the next keystroke, until I realized I can't do that since tty_intcheck is called from interrupt handlers. (It seemed to work fine, but would scramble kernel buffers if for instance a keystroke occurred just when a buffer LRU list was being changed).

The check_disk_change also returns 1 if a media change has occurred, so that the pending I/O error is reported immediately to the process rather than having it also continue to issue I/O requests. By the time check_disk_change has returned, the I/O cleanup and inode/buffer invalidation are guaranteed complete, because of the wait_on_buffer calls that wait for the I/O request queue to be drained.

It seems to me - like I quoted above, that the fast thing to do is to do the cleanup stuff immediately after the error return and avoid all the polling. At the time the end_request() wakes up the wait_on_buffer which immediately has to check for errors anyway, we're running in the kernel and can trigger the cleanup from there.

Another way is to use a timer and a callback from floppy_ready() - I haven't put much thought into what the downsides of that may be, but I'm thinking if the auto kernel sync mechanism can do it, it can be done here too. What do you think?

The introduction of another HW version related constant seems unnecessary.

Actually that code has been in the driver for weeks. Yes, this particular case is equivalent to sys_caps & CAP_PC_AT, but I am using a trick from the Linux 2.0 DF driver where they allow for >= comparisons between FDC chip types for features. Instead of using multiple compares, I essentially coded fdc_version as a virtual chip type, as at the time it wasn't certain that all PC/AT's would have a certain chip type. (We have a similar problem now, as ELKS is ported to certain non-PC compatible systems and sys_caps has to be specially coded in config.h - something I'd like to get rid of).

OK, from my (TLVC) point of view, chip variant testing beyond 765 vs 82077 is worthless for old clunkers. Even supporting 2880k format is stretching it - as we've discussed before. Many - maybe most - of the new features in later FDC generations are for tape support, ACPI stuff etc. In the name of simplicity, when the fun with testing out the 82077 is fading, I intend to revert to plain 765/8272 support, which is always safe for any PC - from the original 8088 IBM PC to the most modern variant with a floppy controller.

That said, having a better (more reliable) indicator as to the type of machine we're running on would be useful.

@ghaerr
Copy link
Owner Author

ghaerr commented Oct 6, 2023

At the time the end_request() wakes up the wait_on_buffer

I see - you're saying call check_disk_change on every buffer wait... Not a bad idea, except that's a lot of calls, especially since that would have to be called on all HD I/O as well. However, as you point out with error checking, if the call were only made when b_uptodate is false, that could be very cool. Let me try that out, perhaps all other calls could be eliminated. Thanks!!

Another way is to use a timer and a callback from floppy_ready() - I haven't put much thought into what the downsides of that may be, but I'm thinking if the auto kernel sync mechanism can do it, it can be done here too. What do you think?

The floppy_ready call already has the 6 second timer running on it, I ran into that yesterday when I cancelled all I/O and the timer ran out, which panic'd the kernel over a bad request queue... I think your above idea of cleaning up through unlock_buffer and the subsequent wait_on_buffer would be much simpler.

when the fun with testing out the 82077 is fading, I intend to revert to plain 765/8272 support, which is always safe for any PC

Agreed - I like the idea of keeping the code paths the same regardless of chip, even though I'm still taking advantage of CONFIGURE/IMPLIED SEEK on 82077 for the QEMU hack. Don't let the fun die too soon though - I'm learning lots about FDC controllers through all of this :)

@Mellvik
Copy link
Contributor

Mellvik commented Oct 6, 2023

At the time the end_request() wakes up the wait_on_buffer

I see - you're saying call check_disk_change on every buffer wait... Not a bad idea, except that's a lot of calls, especially since that would have to be called on all HD I/O as well. However, as you point out with error checking, if the call were only made when b_uptodate is false, that could be very cool. Let me try that out, perhaps all other calls could be eliminated. Thanks!!

That's the suggestion, use the error return to flag the error and take it from there. Actually I have countless times wanted a better resolution to the error return from the driver level than the b_uptodate. Maybe passing and error code back via the new b_nr_sectors/rq_nr_sectors will do it?

Another way is to use a timer and a callback from floppy_ready() - I haven't put much thought into what the downsides of that may be, but I'm thinking if the auto kernel sync mechanism can do it, it can be done here too. What do you think?

The floppy_ready call already has the 6 second timer running on it, I ran into that yesterday when I cancelled all I/O and the timer ran out, which panic'd the kernel over a bad request queue... I think your above idea of cleaning up through unlock_buffer and the subsequent wait_on_buffer would be much simpler.

when the fun with testing out the 82077 is fading, I intend to revert to plain 765/8272 support, which is always safe for any PC

Agreed - I like the idea of keeping the code paths the same regardless of chip, even though I'm still taking advantage of CONFIGURE/IMPLIED SEEK on 82077 for the QEMU hack. Don't let the fun die too soon though - I'm learning lots about FDC controllers through all of this :)

Me too - who would have thought some hardware design work in the late 70s would become this useful 40+ years later? :-)

@ghaerr
Copy link
Owner Author

ghaerr commented Oct 6, 2023

That's the suggestion, use the error return to flag the error and take it from there.

Adding an additional error flag doesn't need be involved, end_request(updtodate) indicates pass/fail. I was just suggesting adding the equivalent of if (!bh->b_uptodate) check_disk_change(bh->b_dev) to clean things up, since it would be rarely called. There's no guarantee that there are any waiting processes when the media is changed, but this will probably help.

Actually I have countless times wanted a better resolution to the error return from the driver level than the b_uptodate. Maybe passing and error code back via the new b_nr_sectors/rq_nr_sectors will do it?

I don't think we want to repurpose rq_nr_sectors to become an error code! What are you thinking would be an improvement to send to an application, a device-specific error code? None of those are listed in errno.h. If device-specific error codes were added that varied between read/write calls to devices, all applications would have to be recoded to deal with something other than testing for errno != EIO for I/O failure. If these are for very specific applications, perhaps an ioctl might be a way to get that information.

@ghaerr
Copy link
Owner Author

ghaerr commented Oct 6, 2023

@Mellvik:

I was just suggesting adding the equivalent of if (!bh->b_uptodate) check_disk_change(bh->b_dev) to clean things up, since it would be rarely called.

Just tried this - and we get recursive calls to check_disk_change then finally panic. After thinking about it, the reason is that wait_on_buffer calls check_disk_change which calls invalidate_buffers which calls wait_on_buffer - circular loop and stack overflow.

We'll need to have a fix in kernel code that is upstairs from the buffer management code... but thanks for the suggestion anyways, it was worth a try! :)

@ghaerr ghaerr merged commit d68c81b into master Oct 6, 2023
@ghaerr ghaerr deleted the diskchange branch October 6, 2023 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants