An ML model meeting the ML safety requirements ([H]) shall be developed using the development data ([N]).
The creation of an ML model starts with a decision as to the form of model that is most appropriate for the problem at hand and shall be most effective at satisfying the ML safety requirements. This decision may be based on expert knowledge and previous experience of best practice. The rationale shall be recorded in the model development log ([U]).
Decision trees and random forests have been shown to provide excellent results for medical prognosis problems [45, 17, 67]. For low dimensional data they allow clinicians to understand the basis for decisions made by the system and, as such, may be more appropriate than neural networks for a range of problems.
Deep Neural Networks (DNNs) have the ability to extract features from image data. A DNN which receives images as a frame from a video feed has been shown to be capable of identifying objects in a scene and may therefore be suitable for use in an automotive perception pipeline.
Typically numerous different candidate models of the selected type will be created from the development data by tuning the model hyperparameters in order to create models that may satisfy the ML safety requirements.
A common problem that is encountered when creating a model is overfitting to training data. This happens when the model performs well using the development data but poorly when presented with data not seen before. This results from creating a model that focuses on maximising its performance for the data available, but whose performance does not generalise. Techniques such as cross‐validation [2], leave‐one‐out [14] and early stopping [54] can be used in handling the development data during the creation of the model in order to improve its generalisability and thus its ability to satisfy the ML safety requirements.
Let us consider a set of data samples that have been augmented with samples from a realistic photo simulator. Those samples from the simulator have a higher proportion of hazardous scenarios since these are difficult/impossible to obtain from real‐world experimentation. A model is then created to differentiate between safe and hazardous situations. Both subsets of data are subject to noise, the first from sensor noise the second from image artefacts in the simulator. Overfitting may occur when the model creates an overly complex boundary between two classes that aims to accommodate the noise present in those classes rather than the features which define the true class boundary.
In creating an acceptable model it is important to note that it is not only the performance of the model that matters. It is important to consider trade‐offs between different properties such as trade‐offs between the cost of hardware and performance, performance and robustness or sensitivity and specificity. Several measures are available to assess some of these trade‐offs. For example, the area under ROC curves enable the trade‐offs between false‐positive and false‐negative classifications to be evaluated [20].