During Q&A session at San Jose GTC 2018, nVidia CEO Jen-Hsun Huang reiterated that critical functional safety, such as in autonomous vehicle, requires both the redundancy and the diversity aspects. For example, CUDA with Tensor core and GPU with DLA were both utilized. Safety is paramount to automotive applications. Any failure related to the system performance could translate to dire consequences as the application owning exposure around human activities.
In ADAS application, aggressive small form-factor, low power consumption and increased compute requirements for inferencing have driven demand towards advanced process node migration. This in turn has caused more challenge for reliability such as in addressing process variation, electrostatic discharge and electromigration.
Safety can be defined by referring to two existing safety standards:
IEC 61508 (International Electrotechnical Commission (IEC), a functional safety standard for the general electronics market developed by the IEC. ISO 26262 (IEC), a functional safety standard for automobiles from ISO. It has rapidly gaining acceptance as the guideline for the automotive engineer since its release. While compliance to these standards is initially addressed by car manufacturers and system suppliers, recent increasing complexity has also motivated all participants of the supply chain to participate. Since its first roll-out in 2011, more efforts being spent to formalize its adoption into the development stage of the electrical/electronic systems. As a solution provider, Cadence has made headways within this space as will be covered later in this article.
Functional Safety and Failure Classifications
As derivative to IEC-61508, the ISO 26262: Road Vehicles—Functional Safety was designated for passenger vehicles with a max gross vehicle mass up to 3,500 kg and that are equipped with one or more electrical/electronic (E/E) subsystems. Functional safety is defined as the “absence of unreasonable risk due to hazards caused by malfunctioning behavior of electrical/electronic systems”. The definition can be illustrated by chain of implication as shown in reversed waterfall diagram in figure 1.
On the other side of the equation, per ISO.26262 malfunction in E/E systems can be classified into either systematic or random failure. In each of this domain, it could be expanded into more approaches and necessary metrics to fully address the issue. For example systematic failures induced during development, manufacturing or maintenance could be due to process origin and can be resolved by change in the procedures and its related documentation (captured on left hand side in figure 3).
Unlike the systematic failures, random failures which might appear during the lifetime of a component could be due to random defects as part to innate usage or process conditions. A number of safety mechanisms and detection metrics are available as captured in the diagram. For more details on random failures metrics discussion, please refer tothis document.
Given the failure classification and applying the chain of implications, any potential malfunction of a defined automotive system function can be assessed within the context of ASIL term, Automotive Safety Integrity Level (ASIL) is the level of risk reduction needed to achieve a tolerable risk. For example, a malfunction in the Anti-lock Braking System (ABS) may involve addressing prompts at each sequence of ASIL. The severity level of each event, labelled as A (low) to D (high), can be measured in terms of FIT (Failures in Time), SPFM and LFM levels.
Functional Safety Analysis and Design Flow
To evaluate the safety level of design components such as an IP or SOC, one could use functional safety analysis. The dichotomy of this analysis can be illustrated in figure 4. It comprises quantitative evaluations (e.g. Failure Mode Effect and Diagnostic Analysis or FMEDA), timing analysis, and qualitative assessments (e.g. Dependent Failure Analysis or DFA. FMEDA is a structured approach to define failure modes, failure rate, and diagnostic capabilities of a hardware component and DFA is used to assess dependent failures between given elements (is important especially if the system has shared resources).
Built-In Self-Test (BIST) is the most notable example of safety mechanism already automated in the design flow. It is used for automotive in-system/field testing for lifetime reliability to achieve the desired ASIL. For testing random logic, there are online and offline BISTs, each is applied based on the stringent timing requirements. Types of challenges to be addressed during BIST integration are speed testing capabilities, power consumption, area, routing minimization, compression techniques and ASIL targets. Since the test coverage estimated during BIST insertion is not exactly the DC required by the ISO 26262 metrics; functional safety verification might be needed to accurately measure the DC. Other safety mechanisms are also available (such as Triple Modular Redundancy, Dual-Core Lockstep, etc.). For more details on these techniques, please refer to Cadence’s paper.
In the traditional RTL-to-GDS flow, FMEDA is utilized to drive design exploration to meet the functional safety targets. It shows where to focus the design effort for meeting the constraints and provides direction for improving diagnostic coverage. Implementing a design with functional safety produces more robust physical outcome.
Figure 5 shows a design implemented with and without functional safety routing constraints using Cadence Innovus Implementation System. The bottom-left region is the main copy of a block, while the top-right region is the replica inserted for redundancy. By guaranteeing that wires belonging to the main block can never venture into the top-right region, the redundant blocks are physically independent by construction and meet the requirements of the DFA.
Software Tool Confidence Level
Similar to the safety principles applied on the hardware components, the tool confidence level (TCL) quantifies the assurance that failures in the software tools can be detected. The tool confidence level spans from TCL1 to TCL3, with TCL1 being the highest, Tools classified as such do not require additional qualification and can be used in the development of applications with any ASIL. Cadence announced last year its industry’s first comprehensive TCL1 certification from TÜV SÜD, to meet stringent ISO 26262 automotive safety requirements. The functional safety documentation kits cover analog and mixed-signal, digital front-end and verification, digital implementation and signoff, and PCB flows comprised of nearly 40 tools, offering the broadest EDA-certified tool and flow documentation to support the automotive industry. For read more details on functional safety methodologies, please check HERE.