Not only has Microsoft decided to make a compression algorithm intended for data centers open source, the company is providing its own RTL to anyone who wants to implement it in silicon through GitHub.
In a keynote at the Open Compute Platform (OCP) Summit last week (March 14, 2019), Microsoft Azure hardware-infrastructure general manager Kushagra Vaid said data-center owners face a problem with a massive predicted expansion in demand for storage with which the hardware makers cannot keep pace. He cited a study by analyst firm IDC that projected users’ storage needs worldwide to be some 175 zettabytes (ZB) by 2025. The same study claimed the likely capacity available would be a fraction of that, at just 5ZB.
“If you can’t store all that data, you have to drop some of the data,” Vaid warned. “The other way is to be more efficient at data management and data storage.”
Better compression is one way forward, Vaid argued as he introduced the company’s Project Zipline, which is focused on unstructured textual data that cloud servers are expected to process. “This compression standard was designed to take into account a complete system stack. It is targeted for both legacy and modern datasets,” he said, pointing to server logs as being examples of legacy datasets and modern datasets being represented by IoT telemetry.
Image Microsoft examples of compression ratios for the Zipline XP10 algorithm
Vaid claimed the Zipline algorithm stands out because it “hits the three important vectors: high compression ratio; high bandwidth; and low latency while you compress and return data. It opens up a whole new world of where we can use [compression] technology.” He showed several examples of “cloud datasets” that were compressed by 92 per cent or more using the algorithm.
In order to make the movement of Zipline-compressed data between data centers realistic and to offload the processing from server CPUs, the company decided to make it easy to implement the algorithm by both open-sourcing the source code and an example RTL implementation. “This is what excites me the most. We are setting a precedent by contributing complete RTL. I hope other companies will follow,” he claimed. However, open-source hardware contributions from computing vendors date back to Sun Microsystems’ decision to contribute a Sparc processor design in the mid-2000s.
Microsoft sees hardware implementation as important. Vaid explained: “There were several things we did to get high bandwidth and low latency: they are all encapsulated in that RTL implementation. It makes it seamless and frictionless to integrate this into silicon products.”
Vaid pointed to solid-state drives (SSDs), IoT devices and “even cloud CPUs” as possible recipients for the Zipline hardware-compression engine. The XP10 algorithm is derived from Huffman encoding. The RTL supports compression and decompression not just for the XP10 algorithm but for Zlib and Gzip as well. The RTL and source code are offered under the MIT license. Future releases will include additional RTL, an RTL test harness, and an XP10 SW library.