ML safety requirements shall be defined to control the risk of the identified contributions of the ML component to system hazards, taking into account the defined system architecture and operating environment. This requires translating complex real-world concepts and cognitive decisions into a format and a level of detail that is amenable to ML implementation and verification [55].

Example 5 - Defining acceptable risk level Healthcare

In a system used for cancer screening based on X‐ray images, a subset of the ML safety requirements will likely focus on defining an acceptable risk level by specifying true positive and false positive rates for diagnoses within the given environmental context [46]. This will take account of the performance achievable by human experts performing the same task, and the level of checking and oversight that is in place.

Example 6 - Translating safety requirements Automotive

From the safety requirement allocated to the ML component in Example 1 the concept of identifying a pedestrian at the system level must be translated into something meaningful for the ML model. Using knowledge derived from the system architecture, the ML safety requirement becomes “all bounding boxes produced shall be no more than 10% larger in any dimension than the minimum sized box capable of including the entirety of the pedestrian” [22].

Note 5 - Semantic gap

To some extent, the process of developing the machine learning safety requirements is similar to those of complex embedded software systems (e.g. in avionics systems or infusion pumps). However, due to the increasing transfer of complex perception and decision functions from an accountable human agent to the machine learning component, the difference between the implicit intentions on the component outputs and the explicit requirements that are used to develop, validate and verify the component is significant. This “semantic gap” exists in an open context for which a credible, let alone complete, set of concrete safety requirements is very hard to formalise [12].

Note 6 - Under‐specificity

In machine learning, requirements are often seen as implicitly encoded in the data. As such, under‐specificity in requirements definition is an appealing feature. However, for rare events, such as safety incidents, this under‐specificity poses a significant assurance challenge. The developers are still expected to assure the ability of the system to control the risk of these rare events based on concrete safety requirements against which the machine learning component is developed and tested.

While there are likely to be a large range of requirements for the ML component (e.g. security, interpretability etc.) the ML safety requirements should be limited to those requirements that impact the operational safety of the system.

Note 7 - Safety criticality

Other types of requirements such as security or usability should be defined as ML safety requirements only if the behaviours or constraints captured by these requirements influence the safety criticality of the ML output.

‘Soft constraints’ such as interpretability may be crucial to the acceptance of an ML component especially where the system is part of a socio‐technical solution. All such constraints defined as ML safety requirements must be clearly linked to safety outcomes.

The ML safety requirements shall always include requirements for performance and robustness of the ML model. The requirements shall specifically relate to the ML outputs that the system safety assessment has identified as safety‐related (i.e. not just generic performance measures).

Note 8 - Performance and robustness

In this document, ML performance considers quantitative performance metrics (e.g. classification accuracy and mean squared error), whereas ML robustness considers the model’s ability to perform well when the inputs encountered are different but similar to those present in the training data, covering both environmental uncertainty (e.g. flooded roads) and system‐level variability (e.g. sensor failure [5]).

Note 9 - Measurable features

The performance of a model can only be assessed with respect to measurable features of the ML model. A model does not generally allow for us to measure risk or safety directly. Hence safety measures must be translated to relevant ML performance and robustness measures such as true positive count against a test set or point robustness to perturbations. Indeed not all misclassifications have the same impact on safety (e.g. misclassifying a speed sign of 40 mph as 30 mph is less impactful than misclassifying the same sign as 70 mph).

Note 10 - Overall performance

There is rarely a single performance measurement that can be considered in isolation for an ML component. For example, for a classifier component, one may have to define a trade‐off between false positives and false negatives. Over-reliance on a single measure is likely to lead to systems that meet acceptance criteria but exhibit unintended behaviour [1]. As such, the ML performance safety requirements should focus on reduction/elimination of sources of harm while recognising the need to maintain acceptable overall performance (without which the system, though safe, will not be fit for purpose). Performance requirements may also be driven by constraints on computational power (e.g. the number of objects that can be tracked). This is covered in more detail in Stage 6 (ML deployment).

Note 11 - Dimensions of variation

One useful approach to defining robustness requirements is to consider the dimensions of variation which exist in the input space. These may include, for example:

variation within the domain (e.g. differences between patients of different ethnicity);
variation due to external factors (e.g. differences due to limitations of sensing technologies or effects of environmental phenomenon);
variation based on a knowledge of the technologies used and their inherent failure modes.

Example 7 - Performance and robustness Automotive

The ML safety requirement presented in Example 1 may now be refined into performance and robustness requirements [22]. Example performance requirements may include:

The ML component shall determine the position of the specified feature in each input frame within 5 pixels of actual position.
The ML component shall identify the presence of any person present in the defined area with an accuracy of at least 0.93

Example robustness requirements may include:

The ML component shall perform as required in the defined range of lighting conditions experienced during operation of the system.
The ML component shall identify a person irrespective of their pose with respect to the camera.

Safety assessment shall not be limited to system‐level activities. It is not a mere top‐down process. Safety assessment shall be carried out in a continuous and iterative manner. A detailed safety analysis of the outputs of the ML model shall be performed. This may identify new failure modes. The results of this analysis shall be fed back to the system‐level safety assessment process for further examination, such as reassessing the risk rating for a hazard.

The activity of developing the ML safety requirements will likely identify implicit assumptions about the system or operating environment. Assumptions that are made shall be made explicit either as part of the description of the system environment or through defining additional safety requirements. Some domains refer to these as derived safety requirements.

Note 12 - Tolerable risk

Derived safety requirements could relate to the assumed reliability and availability of sensor outputs or the specified thresholds of tolerable risk. In the case of the latter, current societal norms might accept the delegation of the interpretation of these often qualitative criteria to an accountable human (e.g. a quantified driver or a clinical professional). Given the transfer of complex cognitive functions from human users to machine learning components, the process of developing concrete ML safety requirements will likely demand the interpretation of these thresholds at the design stage, typically through iterative interactions between domain experts and ML developers [27].

The activity of developing the ML safety requirements may also identify emergent behaviour (potential behaviour of the ML component that could not be identified at the system level). Where the emergent behaviour may contribute to a hazard, safety requirements shall be derived to ensure the emergent behaviour does not arise.

The ML safety requirements resulting from this activity shall be documented ([H])

Continue to: Activity 4. Validate ML safety requirements

To be able to track recently viewed pages, please enable cookies using the button in the banner at the bottom of your screen.

To be able to bookmark pages, please enable cookies using the button in the banner at the bottom of your screen.

Our site depends on cookies to provide our service to you. If you continue to use this site we will assume that you are happy with that. View our privacy policy.