Xilinx has reworked its Versal field-programmable gate array for edge-AI applications by adding more local memory and exchanging conventional floating-point arithmetic for a block format that is generally more efficient for situations where the emphasis is on inference rather than training.
Rehan Tahir, senior product manager at Xilinx, said the Versal Edge would be made on TSMC’s 7nm process with availability due in the first half of next year. Whereas the predecessor, which has been used in AI accelerators in cloud computers, had an architecture that suited signal-processing applications this one is squarely aimed at machine learning, he said. The changes include hardware support for common inferencing optimizations, such as network sparsification.
The arithmetic blocks in the AI engines can perform twice as many int8 multiplies as the original Versal, with support added for int4 and 16bit block floating-point (Bfloat) operations. Up to 256 Bfloat16 operations can be performed per tile per cycle. Single-precision floating point is handled using emulation.
The local data memory has been doubled to 64Kbyte and, to try to avoid frequent offchip memory accesses, up to 38Mbyte of onchip shared memory in distributed tiles can be used by the AI engines.
Because a key market is automotive driver assistance, Xilinx is working to achieve product grades that will allow use in safety-critical. The automotive grades are expected to be ready in 2024. Self-test modes will include support for the ability to switch functions in and out of the programmable-logic array dynamically. The company claims it will be able to replace blocks in milliseconds. In principle, a parking assistance function can be replaced by one for lane departure as the car speeds up to avoid having to maintain space for both at the same time. The same exchange functionality will also help support over the air updates, Tahir said.