-
Notifications
You must be signed in to change notification settings - Fork 8
Minutes 07 Sep 2023
Host: Paul Albertella
Participants: Igor Stoppa, Sebastian Hetze, Dana Vede, Daniel Krippner, Peter Brink
Agenda:
- Confirm our intended focus on identifying and documenting failure modes associated with memory operations in the Linux kernel, and existing mechanisms and tools for detecting and preventing them
- Next steps / execution approach
Clarify what we mean by focussing on memory
- For Linux, even if you statically allocate memory up front, still has potential issues
What are the problems we are trying to deal with?
- Some problems are well-known and have established strategies:
- e.g. Memory allocation operation when there is no remaining memory
- Other problems that are specific to Linux cannot be solved by these strategies
- If we do not have an approach for these, the other problems are irrelevant
Proposal: focus on these problems
Pete: There are some strategies that could be used to deal with them, for specific use cases
- e.g. using external safety mechanisms to verify required behaviour
Igor: Yes, but solving specific classes of problem repeatedly is not a desirable approach
Sebastian: But rather than requiring a design that we can formally prove is safe, can we not use quantitative evidence to show that it is sufficiently safe?
Pete: Kernel architecture is inherently unsafe
- Using e.g. userspace drivers that have been developed under a safety engineering regime might help with this
Igor: But even if a userspace application never makes a syscall, it still has a chance of being corrupted by the kernel
Sebastian: But Linux is so widely used and this corruption does not happen, so can we not use that as a basis for saying that it will not?
Pete: The problem is that this ‘proven in use’ argument can only be used for a specific version of a component, that has been used in an equivalent system context and for an equivalent type of use case
Proven in use approach requires huge quantities of evidence (deliberately) because it is based on the extrapolation of probabilities relating to risk:
- Specific numbers on this are given in IEC 61508 (part 7 annex D) and ISO 26262 (Part 8, 14.4.5.2)
Pete: Using Linux in an environment other than a server room also introduces a lot of different possible sources of external environmental interference, which are not necessarily reflected in the typical use cases where Linux is deployed
Also: we want to be able to update our revision of Linux (and other software) in the lifetime of the product, as this ability to develop and iterate quickly is one of the key values in using this kind of software in this way.
Sebastian: Can’t we say that the open source approach (especially for Linux) is a new ‘state of the art’, accept the risks that are entailed in that, and devise ways for us to reduce that risk?
Igor: I would prefer to remove the risk
Pete: For safety, the safety integrity level that applies doesn’t change the overall process applied to software, but it does change the level of rigour expected in the measures (methods) applied at each stage.
Things to do next time:
- Draw up some simple models describing ways in which Linux may be involved in a safety-critical system e.g.
- Nominal function only - no safety responsibilities at all
- Single safety function - no nominal function
- One or more nominal functions and one safety function
- One or more nominal functions and more than one safety function
- Summarise our conclusions regarding quantitative and probabilistic approaches to achieving confidence for safety
- Require a single ‘thing’ that is ‘proven’
- Always require a huge amount of time, which precludes regular/rapid updates
- Still need a comprehensible design that we can reason about