Keynotes at this year’s IEEE International Reliability Physics Symposium (IRPS) focused on the way in which scaling is forcing changes to the way that the reliability aspects of semiconductors are examined, with a move away from generalized models to a combination of TCAD and screening tests that take into account device structures and applications much more strongly.
The ability to predict reliability accurately is becoming much more difficult as transistor structures become more complex and pressures on cost force a move to smaller and smaller die areas, even in high-power components. In his introduction to the issues facing high-integration digital design, Mike Mayberry, Intel CTO, described the increase in complexity of modeling for reliability not just through feature-size reduction but through the issues raised by 3D structures.
Looking back at older planar devices, where the main reliability problems were in the formation of traps in gate oxides, Mayberry said these are relatively simple to understand. “The finFET changed the nature of self-heating. And the adoption of high-k dielectrics changed the nature of wearout. We expect with new mechanisms and physics, new tuning will be required.
“After the finFET, we knew we would go to a gate-all-around structure. This introduces some complications from the standpoint of how to model it,” Mayberry said. One issue he noted is that the wires or ribbons that together form a transistor channel do not necessarily have the same crystal orientation as the underlying wafer. “We can take 2D materials and layer them up into a 3D transistor, at least in the lab. We can also look at the stacking of memory. We have to worry about the individual reliability of all the layers and how the system works as a whole.”
One issue with this far more complex processes is that validation during design and in the fab may not be possible, pushing the issue of determining reliability into systems running in the field. “Something that is hard to detect at validation can easily be detected at scale,” Mayberry said. Cloud operators are highly likely to detect reliability issues in processors simply through the sheer number they will have active at any given time.
“We need to figure out how to plan for detection. And we need to get past how you make things reliable and into the domain of how you make things resilient.”
For power semiconductors, Oliver Häberlen, senior principal of power transistor technology at Infineon Technologies, said there is little room for defects that surface during use. The scale of data centers means even one defect per million is too high: at those levels of failure, 2 per cent of servers in a large-scale data center would be out of action.
As transportation moves to greater levels of electrification, the amount of silicon and vulnerable gate oxide area is increasing dramatically. Häberlen used the example of a cruise ship’s drivetrain that would need power transistors with a total of half a square meter of gate oxide, spread across 230 million individual transistor cells. “Now think about zero defect. You don’t want a cruise ship stuck in the middle of the ocean because of transistor failures.”
Shrinking die, increasing fields
That the total area of gate oxide is not more is due to the relentless reduction of die area over time thanks to changes in materials and the move to wide bandgap components. Häberlen said chip size has halved every four or five years since 1990 and this has in turn caused huge increases in electric field strengths and power density. The dielectrics have not themselves changed much but they have to deal with more and deeper trap levels when used with the wide-bandgap materials like gallium nitride and silicon carbide.
“That leads to challenges in packaging, as well as in robustness and SOA [safe operating area]. It means we need to look for better material quality from oxides and we need to understand their failure mechanisms.”
In environments that need very low DPM level, Häberlen said the focus is moving to extrinsic factors that only become apparent under high-volume usage but are also very hard to detect using traditional tests. “We do not need to remove the whole extrinsic branch, only those that will fail within the useful lifetime of the end product.”
This means using screening as well as more stress tests on designs that accurately reflect usage in real-world applications. He noted that a problem with some existing stress methods is that they do not cover the full usage range. For example, several that model power supply circuitry in half-bridge layouts only deal with low-side usage not high-side.
For the company’s silicon carbide devices, screening and applications tests have fed back into layout to improve reliability. Some changes have focused on reducing stresses on gate oxide, sometimes at the cost of on-state resistance. “There is no free lunch; excellent reliability always comes with a price tag,” Häberlen said.
In his keynote, Gianluca Boselli, analog ESD lab manager at Texas Instruments, talked about the way in which detailed 3D TCAD simulation has shed new light in the failure modes of protection circuitry and led to design changes. He pointed out that in some devices, such as silicon controlled rectifiers, conventional 2D simulations failed to identify troublesome failure modes that depended heavily on ESD pulse widths and rise times.
A common problem in protection devices is that subtle aspects of design can often focus most of the ESD energy into thin filaments that then cause the protection itself to burn out and fail. By employing 3D TCAD and other modeling techniques, Boselli explained how designers of these devices can ensure more uniform current handling.