Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is everything fine with the the interrupt level based locking #6

Open
rakslice opened this issue Jun 17, 2024 · 1 comment
Open

Is everything fine with the the interrupt level based locking #6

rakslice opened this issue Jun 17, 2024 · 1 comment

Comments

@rakslice
Copy link
Owner

rakslice commented Jun 17, 2024

Running AIX 1.3 in virtualbox you often get hangs with an hdintr message (hd is the stock AT style hard disk controller driver and hdintr is its interrupt handler hook) even on a stock install. Running with ad Adaptec (AHA-154x) SCSI instead, there are still disk hangs of a sort but it doesn't hang the whole system, just whatever process was doing the disk iop.

I always assumed these were something to do with the speed of the disk accesses completing in fast modern processor execution like in a VM -- some kind of timing case possible that the drivers were not designed for because it would not be possible on a period-correct machine. However at least for the AT driver case, virtualbox has an option you can set to put in a mandatory minimum response delay to deal with exactly this kind of problem, and it doesn't fix the hangs.

But that made me wonder about what leaix is doing that might be affecting timing if anything.

I haven't used AIX 1.3 much without leaix since I discovered the ad driver worked with an emulated AHA-154x, so I can't rule out leaix as a factor on the SCSI side through the presence of the hangs without it like we can for hd.

For protecting data structures I'm using the spl* functions blocks interrupt calls from the corresponding interrupt group. This is the same approach that was used in the original driver in netbsd. When only one thing can be in kernel code at once since it's a uniprocessor OS, the only concern is another interrupt jumping us into the interrupt handler, so this is sufficient to provide locking of critical sections. We are using splimp which is the customary one for network interfaces in bsds which should be correct.

  • I've gone through all the calls to splimp and they are all restoring the previous level with splx in all subsequent code paths as required afaict.
  • Some of the saved level vars are spl_t, and some are int. Need to double check if that has consequences (where are we even getting spl_t)
  • There is a commented out spl_t declaration; think if some locking is missing, maybe on a special case.
  • On a bsd system the config line for the device sets what interrupt group the device it is in which determines when its interrupt handler is enabled, and it automatically sets the interrupt level during those calls. If there is any corresponding setup to do here e.g. in the instal script make sure we are doing it.
@rakslice
Copy link
Owner Author

Unlike svr4 and friends, the system priority level for a device interrupt handler isn't configured in config files such as the kaf (keyword attribute files) set up by instal here. It's actually in the code, in the leattach routine which sets up the interrupt handler ->

leaix/if_le.c

Line 492 in f0af939

intrattach(leintredge, ia->ia_irq, SPL_IMP);

As for use of spl:

The docs (aix ps/2 tech ref c.6.5-1 ~p.1760) describe splimp a little differently than what I'm getting from the BSD world:

"The splimp kernel subroutine is used to protect manipulation of mbuf chains for network device drivers in process or interrupt context."

and indeed in the sample code in the docs there it's used to protect the mbuf access

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant