Object-based audio demands higher-performance audio processors
High-performance audio processors will introduce consumers to an upgraded theater-quality audio experience.
The drive to improve the user experience of home entertainment has led to the widespread adoption of high-definition media, including audio. Surround-sound audio is now mandatory in home entertainment playback equipment. But, although the decoders are able to relay high-quality sound, the effect is often reduced by the compromises that consumers need to make when they set up their equipment.
Traditional surround-sound mixing for audio and video post-production has had to assume a default speaker configuration and layout that may not fit real-world listening conditions. Consumers cannot always place speakers in the optimal locations for surround sound and retain the aesthetic look of their rooms that they want. Reflections from the walls, floor and ceiling of a room can disrupt the carefully placed and mixed sound cues in a movie or sound recording. Compounding this is the fact that different mixes have to be created depending on whether the video is for 2-speaker stereo, 5.1-channel surround sound, or 7.1 channel surround sound.
A number of techniques have been developed to deal with these issues. Audyssey researchers studied the acoustical and perceptual effects of sound reflections in different spaces and found that some directions improve our sense of envelopment compared to others. By monitoring playback and adjusting the EQ of audio fed to each speaker, it is possible to reduce the effect of unwanted reflections and improve the perceived quality of the surround sound.
Content extraction
The Audyssey DSX system extracts content for new wide and height channels from the standard 5.1 surround configuration to further improve the sense of envelopment. It also blends surround and front channels to render a seamless and more enveloping soundstage.
As well as using existing audio mixed for the surround-sound environment, it is possible to improve the effect by delaying the mixing process until the point of playback itself. Object-based audio found in algorithms such as Dolby Atmos, MPEG-H and DTS:X from DTS avoids the need to record a final mix for each playback environment. Instead, the mixing engineers keep each instrument, voice or sound effect separate and instead provide guidance on where each should appear to sit in the audio field and how they move around. The decoder then mixes the audio in real time at playback to ensure the right balance for each speaker.
Image Sound mixers can use visual tools to determine the playback position of each audio object.
A gunshot can appear to flash past one ear or an aircraft zoom overhead. The calibration processes used for Atmos, for example, ensure that, whatever the playback environment, each sound object is placed where the director intended.
The technology can also hand more control to the user. The recording of a rock band can be manipulated in real time to create an individual mix that places the listener next to the guitarist or beside the drummer instead of the normal position in front of the stage. For sports and similar live events, the listener can choose commentary tracks and have them play back from a certain position, which may be off to one side or close to the piece of action they are describing. During a motor race, the user may select a team radio as an additional audio source and have that play from the apparent position of the car on the screen.
The object-based audio approach lets a compatible decoder adapt to any playback environment, whether it is in a cinema with dozens of speakers, in a living room with five or seven, or over headphones for private playback.
Speaker control
Consumers can choose to stay with the conventional surround-sound layouts or add additional units that contain upward firing cones that use reflections from the ceiling to provide convincing overhead sound. Even if the speakers are not placed in the recommended layout for a 5.1 or 7.1 system, the object-based decoder can compensate and analyse the media stream to determine how best to spread the sound among the different speakers to enhance the feeling of envelopment.
The increased complexity of object-based audio decoders as well as advanced encoding schemes such as Dolby Digital Plus places stringent demands on the audio processors running these decoders. At the same time, features such as voice control are seen as increasingly important in home-entertainment systems as they improve the overall user experience.
Concerns over the privacy of cloud-based voice-control services is causing system designers to look at ways to bring the bulk of the processing into the target system itself, which places a much higher burden on the hardware. As voice processing and recognition remains an area of differentiation using proprietary algorithms, the ability to customise the processor to handle new classes of instruction is helpful.
Requirements such as these inspired the design of the Tensilica HiFi 4 digital signal processor (DSP). Ideally, the processing power to run object-based audio should be available from a single pipeline to avoid issues with partitioning the software to run across multiple cores. Because of the need for audio to cross between any of the input and output channels when upmixing and for individual objects to be mapped to any of the output channels, a multiprocessor implementation would be difficult because of the need to carefully coordinate the actions of each processor. Other techniques required for object-based systems such as noise filling to ensure channels that are not heavily used in a scene are not left unnaturally silent further add to the compute burden on the audio processor.
Single-pipeline advantage
The need to have a more powerful single pipeline and other microarchitectural improvements that optimise the flow of audio data through the system led to the development of the HiFi 4 DSP. As the latest and most powerful implementation of the HiFi architecture, the HiFi 4 DSP builds on Cadence’s combination of customisable processing with specialised audio extensions.
Based around the Tensilica Xtensa RISC architecture, the HiFi processors add dedicated registers for holding audio-stream data and execution units to provide high-performance digital signal processor (DSP) support. Although the HiFi 4 DSP remains software compatible with the existing generations, such as the HiFi 2 and HiFi 3, it provides greater performance through the use of a more powerful pipeline. The core is able to issue four 24 x 24bit or 8 32 x 16bit multiply-accumulate (MAC) operations each cycle using an expanded very long instruction word (VLIW) approach. DTS, among others, recommends using a processor with up to four 32x32bit MACs to handle object-based audio decoding. An optional vector floating-point unit is also available, providing up to four single-precision IEEE floating-point MACs per cycle.
Because the HiFi 4 DSP continues to use the flexible long instruction word (FLIX) format, code density remains high as only operations that need to use all of the instruction slots in one cycle use the full word size. Where parallelism is not available, the instruction word is shorter. The architecture ensures high data throughput by allowing two load or store operations each cycle. Throughput is further aided in systems with large amounts of off-chip memory by the addition of block prefetch assist instructions, which allow larger areas of memory to be tagged for fetching into the cache before their data elements are requested using load instructions.
As well as home audio, the HiFI 4 DSP can provide high performance for sophisticated automotive designs where object-based audio and improved voice control can greatly improve the user experience of driver and passengers. The key is the adoption of a high-performance single pipeline and design for high data throughput that allows these algorithms to be deployed without resorting to complex multicore architectures.