Could ISSCC have asked for much better than a keynoter in its 60th year than one who also managed to wrap in the 25th anniversary of Star Trek: The Next Generation? Conference catnip – in this business anyway.
But AMD’s Lisa Su had a serious purpose in – with help from cast member LeVar Burton (aka Lt Cmdr Geordi La Forge) – adopting the show’s holodeck as a way of illustrating the promise of heterogeneous computing, and in particular the Hetereogeneous System Architecture (HSA) that her company is promoting.
AMD’s HSA vision today is at a generation that primarily looks to share and allocate tasks between a CPU and GPU on chip together with a unified memory controller to achieve the best balance of performance and efficiency. But this is only the beginning, said Su, senior vice president and general manager for AMD’s Global Business Units.
“However, they’re really bottlenecked by the bus. When you want to switch from A to B, you really have to move a whole bunch of data over and that ends up being the bottleneck,” she said. “This is stage one of really heterogeneous systems.
“The next generation takes a more system view of the world and this is where you can put CPUs, other HSA computing units like audio acceleration as well as graphics units together on chip where they have unified coherent memory. That allows you to have a unified set of data. So, when you’re doing a [holodeck-like] 360-degree environment, any one of the compute units being able to access that data gives you tremendous capability in terms of optimization. It reduces the latency as you go between the switching. It allows you to do switching in much smaller chunks so that you can optimize.
“Then when we go to next generation processing capability, that includes more sophistication in the graphics unit, including compute context switching and graphics pre-emption.
“So you can see that we’re really well within the next couple of years, seeing HSA come into play on the hardware side.”
But then we come to the code slinging…
“As a hardware person, I might say that putting the hardware together is not so hard,” Su acknowledged. “Getting the entire software ecosystem to come together is more of a challenge.”
What AMD wants – arguably needs – to do is find a way to embrace all of today’s program languages.
“We want to use the tens of thousands who can program in C, C++, HTML5, Java and some of the domain-specific APIs, and really build on top the capability to have an intermediate program language – the HSA Intermediate Language [or HSAIL] – that can unify those models. This is a big undertaking. It requires a lot of cross-collaboration between hardware, software, the system and all the industry standards that have to come around,” Su said.
“But think about what you could get with that: the idea that you could write your software once and run it anywhere, albeit with different performance. This is the view of how we really bring heterogeneous systems together.”
So, that’s the bran – how did the holodeck actually fit? One of the interesting things about Su’s speech was how she identified a number of enabling technologies that already exist – though she fully acknowledged that nowhere are near a mature enough state in themselves or as yet able to access the kind of performance needed. There were five technologies, in particular:
1. Computational Photography: Delivering seamless and immersive video environments.
2. Directional Audio: Using audio to enhance immersion and realism of our environments.
3. Natural User Interfaces: Enabling realistic, natural human communication.
4. Context Computing: Delivering an intuitive understanding of the user’s needs in real time.
5. Augmented Reality: Bringing it all together – combining the real and the virtual.
For sure, an actual holodeck is years away, but you do have to note that some key building blocks are there, if still in their infancy. And to underline how heterogeneous computing could bring it closer to reality, Su discussed another AMD research project: one aimed at performing facial analysis and recognition on the Mona Lisa at 1080p resolution.
First the breathgrabbing part. To achieve this by first breaking down the image into 21x21px blocks, gives you 2M squares. Now you need to scale to find the face, probably to 3.8 squares. The math then looks like this:
- Search squares = 3.8 million
- Average features per square = 124
- Calculations per feature = 100
- Calculations per frame = 47 GCalcs
Beyond that, for the sake of extending the argument into the holodeck world, she offered the numbers for moving from a still image to HD video:
- 30 frames/sec = 1.4TCalcs/second
- 60 frames/sec = 2.8TCalcs/second
So, that holodeck is some way off, but to show that a heterogeneous approach could start to cut the delay, AMD took its Mona Lisa research and looked at what happened when either the CPU or GPU handled each cascade in the algorithmic analysis as appropriate. It found was that while the GPU was clearly more efficient in earlier stages (up to about #8), the CPU was better for stages #9-21. By allocating the different processors to different stages in the analysis, the company claims to have already got a performance boost of 2.5X and a decrease in the energy consumed per frame analyzed of, again, 2.5X.
Making it so, you could say.