How to use runtime monitoring for automotive functional safety

By Lee Harrison | No Comments | Posted: May 29, 2020
Topics/Categories: Embedded - Architecture & Design, IP - Assembly & Integration, EDA - Verification | Tags: automotive, functional safety, ISO 26262, runtime monitoring | Organizations: Siemens EDA

The promise of autonomous vehicles is driving profound changes in the design and testing of automotive ICs.

Automotive ICs are increasingly developed and manufactured using cutting-edge processes. Electronic devices once drove simple functions like controlling windows or light signaling. They are now required for complex functions related to advanced driver-assist systems (ADAS) and increasingly for the applications that comprise autonomous driving. The processing power required for these advanced functions leads to the development of very large and complex ICs that are manufactured for optimal power efficiency. This, coupled with the need for these devices to meet the stringent safety requirements of the ISO 26262 standard, is presenting in a new set of challenges.

In particular, these solutions must ensure that new complex automotive electronic systems operate safely at all times throughout the life of a vehicle. This is known as functional safety.

Functional safety relies on so-called safety mechanisms within the design that monitor and check the correct functional operation of the design while it is in use. The ability of these safety mechanisms to cover potential faults, both latent and transient, determines the overall diagnostics coverage of the design. This in turn affects the level of Automotive Safety Integrity Level (ASIL) that can be achieved.

One extremely popular approach here is to distribute a set of embedded monitoring functions throughout the semiconductor device and tie them together with a global communication infrastructure that enables rapid detection and reporting of random failures anywhere in the system. The monitors must operate without interfering with normal functional operation and have the flexibility to provide varying degrees of failure coverage based on the end-application of the semiconductor device and the associated ASIL classification. An example of a chip-level test architecture that supports distributed system-wide monitoring is shown in Figure 1.

Figure 1: Chip-level test architecture for in-system test (Mentor)

A standard IEEE 1149.1 test access port (TAP) provides a portal to all on-chip test resources for manufacturing test. The TAP connects to a reconfigurable serial access network based on the IEEE 1687 standard (a.k.a. IJTAG). This IJTAG network is made up of switches called segment insertion bits (SIBs). Each SIB allows a sub-network to be switched-in or bypassed, allowing for optimized access to any test resource within the network. The IJTAG network is also accessed by an in-system test (IST) controller. The IST controller communicates through a CPU interface to either the outside world or an internal safety manager and performs the parallel-to-serial and serial-to-parallel data conversion necessary to transport information between the CPU bus and the internal IJTAG network. This IST controller enables a system-level communication architecture (Figure 2).

Figure 2: System-level test architecture (Mentor)

A service processor can access each chip’s IST controller and hence any on-chip test resource through whatever backplane vehicle bus is implemented, such as CAN (Controller Area Network) or I2C (Inter Integrated Chip).

Alternatively, the safety manager CPU in an advanced SoC may be embedded as part of the device, an architecture commonly referred to as a Safety Island. As an island, the safety manager is best treated as a separate physical and power partition on the silicon. It should have dedicated power and control signals, and be physically isolated from the functional logic to the greatest extent possible, with the only data connection being the links to the test network. This isolation reduces the chance that it will be impacted by any defects on the functional part of the device. Figure 3 shows the main components that make up a typical safety island.

Figure 3: On-chip safety island (Mentor)

The effectiveness of this distributed system on one or more devices depends on the test resources implemented within them. To achieve ISO 26262 certification, these resources will typically be a mix of both functional and structural safety mechanisms. Probably the most common form of on-chip structural resource is memory built-in self-test (MBIST). An MBIST engine fully tests an embedded memory by algorithmically generating a sequence of read and write operations that covers the entire address space. A major challenge in running memory test during vehicle operation is that the memory must first be taken offline to allow the BIST engine to take control. It may also be necessary to back up the memory contents before running the test and then restore the contents afterward as the memory test will destroy any pre-test memory content.

Taking the memory offline is also likely to degrade the system’s performance and this may not be acceptable in some applications. A non-destructive MBIST technique has been developed to avoid all of these problems. In this approach, the MBIST engine tests the memory using a series of short sequences of transactions, often referred to as bursts. A burst will typically last for only a small number of clock cycles (perhaps 20-to-30) and target different memory locations each time. The entire memory is therefore tested over a large number of short MBIST sessions. The approach is non-destructive because the memory locations that are modified by the burst are saved and restored during each burst by the MBIST engine. Functional performance is not significantly affected because the bursts are only initiated when arbitration logic implemented between the MBIST engine and the functional logic determines the memory is free.

Logic BIST is another popular form of structural IST that can be accessed through the IST controller. Logic BIST involves the on-chip generation of random patterns that are applied to scan chains to test the logic portion of a chip. The circuit responses to all the random patterns are accumulated into a signature that is examined at the end of the test for a pass/fail result. The test coverage achieved by applying an increasing number of random patterns grows logarithmically (Figure 4).

Figure 4: Managing logic BIST test time (Mentor)

The big challenge with this approach is achieving high enough test coverage within a given time budget. A solution to this problem is to time slice the test into multiple sessions (Figure 4b). Each successive slice is applied during an available break in the functional operation. For example, in an image processor, each test session could be applied in between processing individual image frames. Management of the multiple test slices requires careful coordination between the IST controller and logic BIST engine. The IST controller must keep track of which test slice is to be applied next, initialize the logic BIST engine to have it generate the correct set of random patterns, and then retrieve and compare the intermediate signature to determine pass or fail status.

There are still some cases where this form of distribution either is not possible or still cannot provide the required coverage in the FTTI (Fault Tolerant Time Interval). New technology called Observation Scan Technology (OST) is available that significantly reduces the test time of these logic BIST monitors, which in turn significantly improves their overall response time. Logic BIST with OST uses special test points inserted in the design along with a small dedicated scan chain of observation scan cells, which can effectively capture fault coverage of the functional logic on every shift cycle, as opposed to only on the capture cycle of each pattern (Figure 5).

Figure 5: Logic BIST with Observation Scan Technology (Mentor)

The result is a faster ramp in coverage for the functional logic, enabling these safety mechanism’s to reach their required quality goals significantly quicker than using traditional logic BIST. Figure 6 shows a comparison of coverage ramp in LBIST-OST and traditional logic BIST.

Figure 6: Test time improvement with LBIST-OST (Mentor)

All of these test methods allow any number of system-level, safety-related functions to be implemented. Key-on and key-off tests can easily be accomplished by sending out commands to all IST controllers to have all test resources run specific tests, depending on the scenario selected. Any test failures are reported back to the safety manager, which can use the results to drive some form of corrective action from something as simple as displaying a warning message on the dashboard to powering down the vehicle for further service.

The IST controllers can also be instructed to run periodic tests while the vehicle is operating on portions of the electronic system that are involved in safety-critical functions. Again, failing results from these tests are monitored by the safety manager, which can take actions like disabling specific ADAS functions or putting the vehicle into a safe operational state. This is where the response time of these safety mechanisms discussed above becomes critical.

The need for regular in-line monitoring of automotive electronic systems will no doubt continue to grow as the amount and complexity of safety-critical functions continue to expand. Some commercial solutions to address this need have already been introduced and will continue to evolve over time.

About the author

Lee Harrison is Automotive IC Test Solutions Manager within Mentor’s Tessent group and has responsibility for the company’s automotive test solutions. He has also held senior engineer positions at 3COM and BAE Systems. He received his B.eng in Microelectronics at Brunel University, London, UK.