This activity requires as input the system safety requirements allocated to the ML component ([E]).
ML safety requirements shall be defined to control the risk of the identified contributions of the ML component to system hazards, taking into account the defined system architecture and operating environment. This requires translating complex real-world concepts and cognitive decisions into a format and a level of detail that is amenable to ML implementation and verification [55].
In a system used for cancer screening based on X‐ray images, a subset of the ML safety requirements will likely focus on defining an acceptable risk level by specifying true positive and false positive rates for diagnoses within the given environmental context [46]. This will take account of the performance achievable by human experts performing the same task, and the level of checking and oversight that is in place.
From the safety requirement allocated to the ML component in Example 1 the concept of identifying a pedestrian at the system level must be translated into something meaningful for the ML model. Using knowledge derived from the system architecture, the ML safety requirement becomes “all bounding boxes produced shall be no more than 10% larger in any dimension than the minimum sized box capable of including the entirety of the pedestrian” [22].
To some extent, the process of developing the machine learning safety requirements is similar to those of complex embedded software systems (e.g. in avionics systems or infusion pumps). However, due to the increasing transfer of complex perception and decision functions from an accountable human agent to the machine learning component, the difference between the implicit intentions on the component outputs and the explicit requirements that are used to develop, validate and verify the component is significant. This “semantic gap” exists in an open context for which a credible, let alone complete, set of concrete safety requirements is very hard to formalise [12].
In machine learning, requirements are often seen as implicitly encoded in the data. As such, under‐specificity in requirements definition is an appealing feature. However, for rare events, such as safety incidents, this under‐specificity poses a significant assurance challenge. The developers are still expected to assure the ability of the system to control the risk of these rare events based on concrete safety requirements against which the machine learning component is developed and tested.
While there are likely to be a large range of requirements for the ML component (e.g. security, interpretability etc.) the ML safety requirements should be limited to those requirements that impact the operational safety of the system.
Other types of requirements such as security or usability should be defined as ML safety requirements only if the behaviours or constraints captured by these requirements influence the safety criticality of the ML output.
‘Soft constraints’ such as interpretability may be crucial to the acceptance of an ML component especially where the system is part of a socio‐technical solution. All such constraints defined as ML safety requirements must be clearly linked to safety outcomes.
The ML safety requirements shall always include requirements for performance and robustness of the ML model. The requirements shall specifically relate to the ML outputs that the system safety assessment has identified as safety‐related (i.e. not just generic performance measures).
In this document, ML performance considers quantitative performance metrics (e.g. classification accuracy and mean squared error), whereas ML robustness considers the model’s ability to perform well when the inputs encountered are different but similar to those present in the training data, covering both environmental uncertainty (e.g. flooded roads) and system‐level variability (e.g. sensor failure [5]).
The performance of a model can only be assessed with respect to measurable features of the ML model. A model does not generally allow for us to measure risk or safety directly. Hence safety measures must be translated to relevant ML performance and robustness measures such as true positive count against a test set or point robustness to perturbations. Indeed not all misclassifications have the same impact on safety (e.g. misclassifying a speed sign of 40 mph as 30 mph is less impactful than misclassifying the same sign as 70 mph).
There is rarely a single performance measurement that can be considered in isolation for an ML component. For example, for a classifier component, one may have to define a trade‐off between false positives and false negatives. Over-reliance on a single measure is likely to lead to systems that meet acceptance criteria but exhibit unintended behaviour [1]. As such, the ML performance safety requirements should focus on reduction/elimination of sources of harm while recognising the need to maintain acceptable overall performance (without which the system, though safe, will not be fit for purpose). Performance requirements may also be driven by constraints on computational power (e.g. the number of objects that can be tracked). This is covered in more detail in Stage 6 (ML deployment).
One useful approach to defining robustness requirements is to consider the dimensions of variation which exist in the input space. These may include, for example:
The ML safety requirement presented in Example 1 may now be refined into performance and robustness requirements [22]. Example performance requirements may include:
Example robustness requirements may include:
Safety assessment shall not be limited to system‐level activities. It is not a mere top‐down process. Safety assessment shall be carried out in a continuous and iterative manner. A detailed safety analysis of the outputs of the ML model shall be performed. This may identify new failure modes. The results of this analysis shall be fed back to the system‐level safety assessment process for further examination, such as reassessing the risk rating for a hazard.
The activity of developing the ML safety requirements will likely identify implicit assumptions about the system or operating environment. Assumptions that are made shall be made explicit either as part of the description of the system environment or through defining additional safety requirements. Some domains refer to these as derived safety requirements.
Derived safety requirements could relate to the assumed reliability and availability of sensor outputs or the specified thresholds of tolerable risk. In the case of the latter, current societal norms might accept the delegation of the interpretation of these often qualitative criteria to an accountable human (e.g. a quantified driver or a clinical professional). Given the transfer of complex cognitive functions from human users to machine learning components, the process of developing concrete ML safety requirements will likely demand the interpretation of these thresholds at the design stage, typically through iterative interactions between domain experts and ML developers [27].
The activity of developing the ML safety requirements may also identify emergent behaviour (potential behaviour of the ML component that could not be identified at the system level). Where the emergent behaviour may contribute to a hazard, safety requirements shall be derived to ensure the emergent behaviour does not arise.
The ML safety requirements resulting from this activity shall be documented ([H])