What is NAND flash?
The rise of NAND flash beginning several decades ago has revolutionized the data and storage industry. NAND devices have penetrated almost all industries and in many cases have replaced rotating magnetic disk storage.
The core flash memory technology uses a dielectric-conductor sandwich to allow electrons to be selectively trapped within the the insulator using a quantum-mechanical tunnelling process driven by charge pulses on word and bit lines. The NAND variant of flash is, unlike the NOR architecture that is used for local non-volatile memory in microcontrollers, arranged in such a way that data elements need to be read as large blocks, up to 16Kbyte in size, in series. A block may contain up to 64 pages, all of which need to be erased and rewritten when data is changed. As a result, NAND flash is used primarily for longer-term file storage, with contents cached in local DRAM or SRAM when required by an application.
However, NAND’s relative low cost has enabled massive growth in bit density, now measured in gigabytes per square millimeter. A model developed by Western Digital engineers Siva Sivaram and Alper Ilkbahar and presented at the 2023 VLSI Symposium showed that the growth in shipments can be explained almost entirely by reductions in average selling price. Total petabytes shipped quickly rose once the price per gigabyte fell below 50 US cents per gigabyte: “…when prices hit lower levels, new sets of applications and products become viable, which starts off the next cycle of demand growth referred to as the Virtuous Cycle of NAND Flash”.1
NAND flash has been through a number of changes in order to achieve the high bit-cost scaling to which the market has become accustomed. At around the 14nm node, conventional 2D flash ran into obstacles, mainly due to cell-to-cell interference and increases in cell programming and read noise, limiting potential increases in density and performance.1
The answer was to move from pure 2D scaling to an architecture that promoted vertical scaling and a more relaxed 2D pitch. The vertical layers are largely created using atomic layer deposition with the channel provided by polysilicon deposited around a core dielectric and surrounded by a cylindrical charge trap layer. Word lines are stacked in layer around the vertical cylinder with the result that bit cells are formed at the intersections. The 3D channel architecture helped to eliminate bit-line to bit-line and reduce word-line to word-line cell interferences, representing major advantages over 2D flash.
Though this structure enables further scaling, it presents technical and economic challenges as it demands high aspect-ratio reactive-ion etch processes to form the vertical channel. The increasing in processing needed has reduced the cost reduction per node from 34% under 2D scaling up to the 14nm node to 21%, averaged over four generations1.
Methods to mitigate the cost of vertical scaling have revolved around reducing interlayer height in the vertical stack by trimming the oxide-nitride (ON) pitch, bringing back 2D pitch scaling and reducing the circuit overhead around the core memory blocks. Some vendors have been able to use these factors to increase Gb/mm relative to layer count.
Some area efficiency has been made possible by the CMOS-under-array (CUA) architecture rather than placing the control circuitry next to the memory arrays in the same silicon plane. Though this improves areal density, it increases processing due to the additional layer processing involved. It also demands that the circuitry be matched in size to the memory array over each block.
Conversely, CUA can allow more space for control circuitry per block and an increase in write parallelism, which improves overall throughput on top of the gains made in the 2D-to-3D transition as the improved electrical screening of the channel in the vertical structure allows a simpler pulse-based programming method. The key issue is the difference in cost between 2D areal scaling versus additional 3D processing.
One option for reducing cost overall in higher-performance devices is to adopt wafer bonding in which the memory arrays and control circuitry wafers are processed separately before being bonded at the wafer level and finished. One advantage of wafer bonding is that it allows for higher-temperature processing of the memory array which, in turn, improves the performance of the charge trap layer. The use of wafer bonding may in turn present opportunities for more specialized architectures, such as wafer-scale SSDs.2
1: S. Sivaram & A. Ilkbahar, “Searching for nonlinearity”, VLSI Symposium 2023
2: S. Ohshima, “Empowering next-generation applications through flash Innovation”, VLSI Symposium 2020