-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[kernel] First pass at adding media change support to kernel #1742
Conversation
Wow @ghaerr - you've really taken a deep dive into this. Which got me thinking about what makes sense given the challenge, the complexity, the resources (code/ram) and the potential benefits (usefulness). Some observations - from the TLVC perspective, which may or may not be different from ELKS:
To me, these are the 'actions' that make sense:
I'm not sure 2) is possible with reasonable effort (unless made into an application level thing which tastes really bad), but if it is I think it would handle 90+% of the cases - inadvertently ejecting a floppy in use. If it happened during a write (no one can be that absentminded?), the media is most likely partly unreadable and must be formatted. Nothing to do about that. If during read, the app should have crashed or exited 'sanely', and there isn't much to recover. There are some cleanup challenges - well covered in your discussion, @ghaerr. Purging the request queue is the easy part, just an extra line in the driver. The buffer level is more complicated - you got that covered. What I plan to do short term for TLVC is to return ENODEV, purge the request queue and set an error state. BTW - floppy change is supported in QEMU, I've used it quite a bit - I'm assuming it does trigger the DIR media change flag. Use the |
Hello @Mellvik, Thank you very much for your comments, they are very useful and exactly what I was looking for! You've mentioned lots of information to consider in more detail. I'm thinking of taking your thirdly described approach first, of treating media removal as a hard error similar to other hard errors. Skipping the retries is a good idea and can allow for the idea that the request queue can be emptied normally by just calling I was initially thinking of keeping the block device open, but cancelling an active mount should act the same as a normal This "one day" project quickly turned into lots more work, and it got pretty late... I'll push more changes that attempt to meet a minimal goal of kernel damage control to allow it to keep running as reliably as possible returning errors to applications, and then deal with more complex scenarios individually, like zombie inodes, bad current directories, etc. Thanks for the tip on QEMU floppy change - I didn't know that. I'll check to see whether/how it triggers the DIR media change bit. |
Thanks to some ideas from @Mellvik, the Instead of writing special versions of Tested on QEMU and seems to work well. There may be an issue with the startup state of the DIR DSKCHG bit - the current code assumes the FDC starts with the bit OFF, and only sets it when media has been changed post-reset. This needs to be tested on real hardware to know for sure. Calls to When the FDC notices media changed, it now says Thanks again for your comments @Mellvik! |
A great piece of work, @ghaerr - and some interesting choices. I believe this goes a long way towards whatever damage control is actually possible in this setting. A few questions:
I'm not following the logic with the |
Yes, all I/O will be discarded by setting the disk bit in However - all of this is occuring during interrupt time - remember that even The problem I'm running into when trying to shorten the media change operation is that it may be awhile before the next I/O request on the changed media. I had stuck a call to The
Actually that code has been in the driver for weeks. Yes, this particular case is equivalent to |
Interesting discussion @ghaerr!
That was actually my point:
It seems to me - like I quoted above, that the fast thing to do is to do the cleanup stuff immediately after the error return and avoid all the polling. At the time the Another way is to use a timer and a callback from
OK, from my (TLVC) point of view, chip variant testing beyond 765 vs 82077 is worthless for old clunkers. Even supporting 2880k format is stretching it - as we've discussed before. Many - maybe most - of the new features in later FDC generations are for tape support, ACPI stuff etc. In the name of simplicity, when the fun with testing out the 82077 is fading, I intend to revert to plain 765/8272 support, which is always safe for any PC - from the original 8088 IBM PC to the most modern variant with a floppy controller. That said, having a better (more reliable) indicator as to the type of machine we're running on would be useful. |
I see - you're saying call
The
Agreed - I like the idea of keeping the code paths the same regardless of chip, even though I'm still taking advantage of CONFIGURE/IMPLIED SEEK on 82077 for the QEMU hack. Don't let the fun die too soon though - I'm learning lots about FDC controllers through all of this :) |
That's the suggestion, use the error return to flag the error and take it from there. Actually I have countless times wanted a better resolution to the error return from the driver level than the b_uptodate. Maybe passing and error code back via the new
Me too - who would have thought some hardware design work in the late 70s would become this useful 40+ years later? :-) |
Adding an additional error flag doesn't need be involved,
I don't think we want to repurpose rq_nr_sectors to become an error code! What are you thinking would be an improvement to send to an application, a device-specific error code? None of those are listed in errno.h. If device-specific error codes were added that varied between read/write calls to devices, all applications would have to be recoded to deal with something other than testing for errno != EIO for I/O failure. If these are for very specific applications, perhaps an ioctl might be a way to get that information. |
Just tried this - and we get recursive calls to We'll need to have a fix in kernel code that is upstairs from the buffer management code... but thanks for the suggestion anyways, it was worth a try! :) |
This PR adds initial support for media change to the kernel, posted early for discussion. The support requires the use of the direct floppy driver, which uses the hardware DIR register to know whether floppy media has changed during operation.
While this first version is operational, it has quickly run into problems, detailed below. The support adds/renames two configuration settings in directfd.c:
CHECK_DIR_REG
andCHECK_DISK_CHANGE
. The former turns on support to check the hardware DIR register for media change, along with the FDC extra code to move the head etc to turn the DISKCHG bit in the DIR register off. The latter optionCHECK_DISK_CHANGE
adds support for the kernel to callcheck_disk_change(dev)
at various times to do something after the media has been noticed changed.Because the direct floppy driver is interrupt-driven, and the various routines called to invalidate buffers, etc may sleep, the driver itself is limited in what it can do at interrupt time - thus the "upper level" requirement that the kernel call
check_disk_change
at various points in filesystem activity. This first pass has the kernel call before mount and block device open, but testing shows the need for calls before file read, write and probably directory operations as well (namei?).Because all of the initial development is being done on QEMU for speed, a CTRL-P debug callback has been added to simulate the DIR register media change, since QEMU cannot be told of a media switch. The testing setup is booting to 1.2M direct floppy in drive A, and then mounting a floppy in drive B, then running various commands, issuing CTRL-P to fake a media change, then seeing the results.
After taking a hard look at the minimal BLOAT_FS support for media change in ELKS, along with support for media change in Linux 2.0, I'm finding a number of problems, which need to be discussed a bit in order to understand what kind of support can actually be made to work, given the complications.
Here's the initial list of issues:
panic
. This means the entire effort is only for the second floppy if booted off floppy.invalidate_inodes
andinvalidate_buffers
on disk change, which can become very problematic: if an inode is "in use", it can't be cleared, which means it may become intertwined with the new media inodes if they have the same number. Andinvalidate_buffers
waits on any in-progress I/O to complete, which is no good, since the media has been removed.If we take the approach of only handling basic cases, like trying to only handle media changes when I/O is not in progress, things become simpler, but I'm not sure they can be guaranteed to work well. Thus, the entire idea of kernel media change handling working reliably is probably not possible.
Here's what might possible to be made to work (most of the time). On media change:
@Mellvik, I know this sort of thing is probably interesting to you. What do you think about the kind of support that should be included for media change? Can you think of anything I haven't mentioned? Thank you!
NOTE: Much of the original kernel (not driver) code for media change is being deleted in this PR, since it can't be used: the old code tries to write out the prior super block (bad idea), then tries to read the superblock (what for, the filesystem can't be auto-mounted as the in-core superblock info won't match), as well as attempts a probe by reading the superblock (not needed, as the new direct floppy driver auto-probes on any sector read/write request).
This PR also contains a fix for the direct floppy driver access counting: previously, a separate access counter was used for each floppy drive. This caused problems with unregistering the IRQ etc when a sub-floppy drive was opened/closed, so an additional global access_count was added which tracks the total access count for the whole driver.