Skip to content

Minutes 07 Sep 2023

Paul Albertella edited this page Sep 7, 2023 · 2 revisions

Host: Paul Albertella

Participants: Igor Stoppa, Sebastian Hetze, Dana Vede, Daniel Krippner, Peter Brink

Agenda:

  • Confirm our intended focus on identifying and documenting failure modes associated with memory operations in the Linux kernel, and existing mechanisms and tools for detecting and preventing them
  • Next steps / execution approach

Clarify what we mean by focussing on memory

  • For Linux, even if you statically allocate memory up front, still has potential issues

What are the problems we are trying to deal with?

  • Some problems are well-known and have established strategies:
  • e.g. Memory allocation operation when there is no remaining memory
  • Other problems that are specific to Linux cannot be solved by these strategies
    • If we do not have an approach for these, the other problems are irrelevant

Proposal: focus on these problems

Pete: There are some strategies that could be used to deal with them, for specific use cases

  • e.g. using external safety mechanisms to verify required behaviour

Igor: Yes, but solving specific classes of problem repeatedly is not a desirable approach

Sebastian: But rather than requiring a design that we can formally prove is safe, can we not use quantitative evidence to show that it is sufficiently safe?

Pete: Kernel architecture is inherently unsafe

  • Using e.g. userspace drivers that have been developed under a safety engineering regime might help with this

Igor: But even if a userspace application never makes a syscall, it still has a chance of being corrupted by the kernel

Sebastian: But Linux is so widely used and this corruption does not happen, so can we not use that as a basis for saying that it will not?

Pete: The problem is that this ‘proven in use’ argument can only be used for a specific version of a component, that has been used in an equivalent system context and for an equivalent type of use case

Proven in use approach requires huge quantities of evidence (deliberately) because it is based on the extrapolation of probabilities relating to risk:

  • Specific numbers on this are given in IEC 61508 (part 7 annex D) and ISO 26262 (Part 8, 14.4.5.2)

Pete: Using Linux in an environment other than a server room also introduces a lot of different possible sources of external environmental interference, which are not necessarily reflected in the typical use cases where Linux is deployed

Also: we want to be able to update our revision of Linux (and other software) in the lifetime of the product, as this ability to develop and iterate quickly is one of the key values in using this kind of software in this way.

Sebastian: Can’t we say that the open source approach (especially for Linux) is a new ‘state of the art’, accept the risks that are entailed in that, and devise ways for us to reduce that risk?

Igor: I would prefer to remove the risk

Pete: For safety, the safety integrity level that applies doesn’t change the overall process applied to software, but it does change the level of rigour expected in the measures (methods) applied at each stage.

Things to do next time:

  • Draw up some simple models describing ways in which Linux may be involved in a safety-critical system e.g.
    • Nominal function only - no safety responsibilities at all
    • Single safety function - no nominal function
    • One or more nominal functions and one safety function
    • One or more nominal functions and more than one safety function
  • Summarise our conclusions regarding quantitative and probabilistic approaches to achieving confidence for safety
    • Require a single ‘thing’ that is ‘proven’
    • Always require a huge amount of time, which precludes regular/rapid updates
    • Still need a comprehensible design that we can reason about
Clone this wiki locally