Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed approach for OSEP WG #3

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions approach.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# OSEP proposed approach

a) Document scope of analysis using STPA

*In collaboration with other WGs e.g. Automotive?*

* Assumed system context
- Hardware features, role of OS, boundaries of analysis, etc
- Doesn't have to be concrete / complex
- Specific to topic: start simple and elaborate later!
* Losses
- OS-related outcomes that *may* violate a system's safety goals
- i.e. lead to harm in a safety-related system
* Hazards
- OS-level system conditions that *may* lead to these losses
* System-level constraints
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feedback from the Safety Arch WG:

  1. the STPA handbook says that from the system level analysis we should derive safety requirements for the system (in our case I guess for the OS); in this flow we do not mention this anywhere
  2. the STPA flow (from the handbook), for the whole process of hierarchical STPA iterations, it practically substitutes "safety requirements" with "system constraints"; this is done till the very last iteration and only at that stage we can define "safety requirements". The nomenclature of this process is not really fit for a hierarchical breakdown of a SW element (so maybe it would be better to rename "system-level constraints" with "safety requirements")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Agreed. The system-level constraints represent the top-level safety requirements / safety goals. I will add this.
  2. The intended purpose of using STPA is to derive more detailed safety requirements from these high level constraints, by defining controller constraints (which specify the criteria for avoiding UCAs) and Loss Scenarios (which can be used in combination with Controller Constraints to define test cases, including fault injection test cases)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WRT to point 2 I understand the goal of STPA as described in the handbook however if we want to use an iterative approach we need to define new constraints and new controllers at each iteration inside the Kernel. In doing so I find it a bit confusing to use the term "constraints" instead of "SW Safety Requirements and AoUs"....

- Criteria that must be satisfied to *prevent* or *mitigate* hazards
- May be a simple inversion of the hazard
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feedback from Safety Arch WG: here we are missing product constraints imposed by other non-FuSa requirements. It would be good to mention for sake of completeness

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I will add this.


b) Identify and document system measures and mitigations for Linux

*Out of scope for OSEP - LFSCS group?*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Gab: Instead of "out of scope" also on this bullet I'd say "In collaboration with LFSCS and Safety Architecture WGs?"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.


* Provided by kernel features, compiler, hardware, etc
* Find supporting evidence (design, tests, processes) for these
* Identify responsibilities of components (kernel, compiler, etc)

c) Perform hazard and risk analysis using STPA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feedback from Safety Arch WG: b) and c) should be flipped as first we provide an architectural description, then we evaluate hazards and finally for each hazard we look for possible countermeasures

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might make more sense, but this will be an iterative process.

For b) I was thinking that we might include measures or mitigations that we believe Linux already provides, but it may be clearer to omit these on the first pass of analysis and then add them in for a second pass, to address gaps identified.


*In collaboration with LFSCS and Safety Architecture WGs?*

* Based on inputs from (a) and (b)
* Document control structure(s) for topic
- Identify controllers and responsibilities
- Functional abstraction of system elements
- e.g. Kernel or subsystems, complier, other tools, etc
- Define interactions as control actions and feedback
- Identify Unsafe Control Actions and Loss Scenarios
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from Gab: I guess unsafe control actions are those that are associated to hazards, hence leading to loss scenarios, correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, UCAs are control actions that, for a particular set of system conditions, can lead to one or more hazards. Loss scenarios typically describe the causal factors that lead to UCAs, but may also describe other scenarios (not associated with a Control Action) that can result in hazards.

- Define Controller Constraints

Copy link
Contributor

@paolonig paolonig Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Safety Arch WG: At this stage I think that we would have some controllers with unsafe control actions for which we do not have architectural mitigation; at this stage there are two options:
a) provide a more detailed architectural description of the controller by partitioning it into multiple controllers and allocate each of the controller with safety requirements (constraints if we want to use the STPA terminology), then we iterate back to step b)
b) the single controller complexity and/or architectural description is "acceptable" for considering it as an elementary SW design element and hence we can move to step d) (TBD: to discuss about acceptable complexity and/or architectural criteria)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking a controller (or a controlled process) down into sub-components, or performing a new analysis at a more detailed level of abstraction, should certainly be an option. What I'd like to discuss in the WG at some point is when we need to do this. Your suggested trigger (we can't identify a mitigation) is only one example, in my opinion; another might be that we have identified a causal factor for a UCA during Loss Scenario definition, which reveals controllers or controlled processes that are involved at a lower level of granularity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO one of the stop criteria is when all UCAs are fully mitigated by safety mechanisms external to the controller or by AoUs on the controller. If that is not the case then we need to rely on the systematic capability of the controller to be high enough to claim that it is reasonably free of UCAs. And here the problem is about 'acceptable complexity criteria' for the controller...

d) Identify and document process measures and mitigations

* Identify engineering practices and tools to:
- Implement constraints from (a) and (c)
- Verify constraints from (a) and (c)
- Evaluate and/or increase confidence in (b)
- Identify or provide other evidence to support claims
- e.g. Quality criteria
* Find supporting evidence from FOSS communities
- Formal, verifiable process or inconsistent practice?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the Safety Arch WG: in view of an iterative approach we should highlight that missing to achieve the required process evidences to claim the systematic capability of the controller could also result in a further partitioning of the controller being required


e) Identify and document claims and use cases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the Safety Arch WG: OPEN - not clear if there is any condition during this phase that could lead to iterate back on previous phases.


* To illustrate how a+b+c+d might support an in-context safety argument
* Evidence needed to support claims
- How can other organisations use (d) to provide confidence?
- What criteria does evidence need to satisfy?
* Document use cases
- In collaboration with Domain WGs?
- With kernel config(s) and hardware / system dependencies?