Introduction to VR Audio: The Science of Spatialization

Sound Waves and Spatialization

The general consensus on the aim of virtual reality technology is to deepen immersion in digital experiences and to take the user to any conceivable setting, limited only by a developer’s imagination. Which opens the field to a wide array of experiences that pride themselves in ‘other-realism’, which is essentially a fictional space presented to the user through realistic means (such as scalability, movement tracking, and interaction), to allow the believability of this presented virtual world.

One of the main sensory factors when it comes to making Virtual Reality experiences inherently believable is convincing audio. Physical spaces come alive through the multiple reflections and obstacles that sound waves hit before entering our ears, giving us a perception of how large a room might be, or even what type of materials are in the room. Our ability to hear is also paired with our ability to localize sounds, where we are able to pinpoint where a sound is coming from, which makes for an extremely immersive experience when it’s emulated well within VR, for example, audio might prompt the user to turn around and face the sound, when it perhaps acts as a cue for a player action, which wouldn’t be entirely possible if VR experiences only integrated stereo sound, which only replicates the spatialization of audio on a flat plane. To understand the effects and importance that audio might have on the immersion of VR experiences, it is worth looking into how we localize sound in physical spaces, by exploring how we locate sounds coming from a particular distance, and how far away these sound sources might be.

Lateral Sounds

Lateral sounds relate to audio sources which are localized from left to right, for example, recognizing that you can hear a sound sound wavesmore in your left ear than your right ear dictates that the sound source is coming from your left.
This occurs through a variety of factors; with the first being that the sound will be loudest for the ear which is closest to the source. A subtler factor to take into account is called interaural time difference (ITD), which means that the difference in time that the audio takes to reach both ears will help to determine which direction the audio source is. To put this into context, imagine a glass being dropped to the right of you; the sound will reach the right ear faster than the sound reaches the left. Although this time difference is minuscule, it undoubtedly helps to accurately pinpoint sounds on a lateral plane.
Furthermore, it’s worth mentioning the exceptions for frequency changes; lower sounds (around 500-800Hz) are difficult to distinguish directionally through volume (which is due to the half wavelengths being larger than the dimensions of an average human head), so phase – time information – (including ITD) becomes the main factor when it comes to localizing bass frequencies. This also is true for short, sharp sounds; their localization ultimately depends on which ear heard the sound first. Higher frequencies (around 1500Hz), however, have half wavelengths smaller than the human head, which results in time information becoming unreliable, and head shadowing being the main form of recognition. Head shadowing is the volume differences caused by the head obstructing the sound waves to the further ear.

Front to Back | High to Low Sound Localization

low to high frequency sound waves
Sounds heard in front and behind the head are difficult to be distinguished by ITD and volume differences, as a sound wave could potentially reach both ears at the same time, and have the same volume if the source comes from a particular place. Instead, spectral modifications allow the sound to be perceived as behind or in front of us, creating something called a direction selective filter. Spectral modifications occur when parts of the body such as, the pinnae (the outside of the ears), head, shoulders and torso, reflect the sound and act as filters to the audio that is resultantly heard. These spectral modifications are used by the brain to create a perception of direction; for example, a sound wave coming from the front of you will be reflected from the front of your shoulders, and guided by the interior of the pinnae.
Furthermore, the head tracking capabilities of most VR headsets can be used to the developer’s advantage to ensure less ambiguity in these directions. The user can tilt their head or move their head slightly to the left or right to solidly determine the direction of the sound source.

Distance of a Sound Source

To perceive the distance of a sound source, its volume would be the most obvious course of determination, however, many objects we hear around us are familiar to our ears, so we know how loud they should be, meaning that we have a reference for these sounds. However, this cannot be the only case of distance localization due to our ability to perceive the approximate distances of more unfamiliar sounds, which might be more common in a VR experience.
One factor that we rely on to approximate distance is initial time delay which is the gap between hearing the direct sound source and then hearing its first reflection from a surface, and the longer that the gap is, the closer we will be to a sound source. (Although it’s worth noting that more open environments, such as deserts, have less reflective surfaces, so it is much less applicable in these environments).

Motion Parallax is the perception of sound moving through space. This can help the listener also determine distance, as a further sound source will appear to travel slower, for example, imagine a helicopter flying left to right, and a mouse running in front of you from left to right, the helicopter will be perceived as slower, and thus it is perceived to be further away, whereas the mouse scuttling is fairly quick, implying that it is closer.

distance of a sound sourceThe final important factor regarding the perception of distance is the fact that high frequencies tend to be lost over larger distances, so sounds that come from further away will typically sound lower, with more bass frequencies.

A combination of each of these described perceptive processes adds to the accuracy and determination of localizing a sound in 3D space. This knowledge of how the human auditory localization system works can help to deepen realism and immersion in VR games, or films by replicating how these sounds would be heard in a real environment. This subsequently adds realism to the environment around you in a virtual space, as sound can make a large room feel large by replicating these localization processes.

A final note comes to bridge the gap from a biological system into encoding these various processes electronically to allow for a perceptible 3D sound environment in a VR experience. This encoded data is known as a head-related transfer function (HRTF), which will be explored deeper in further posts.

Written By: Ciarán Jai Cosway

Related Posts

Comments are closed.