Fixing Headphones With Computers

The analog games that engineer Ben Bauer started are cool and work well in creating a more natural audio image when listening to music on headphones. However, they don’t really make headphone audio sound like it’s coming from outside your head. So, as usual, the military got involved. It’s not that they really cared about listening to great music; they were busy listening to other things.

Two important military jobs use headphones: submarine sonar operators and airplane pilots. In the case of submarines, sonar operators listen with the intent of detecting and locating sounds relative to the sub. Any improvements in providing an environment that would allow the operator improved sensitivity for detection and the ability to “see” with his/her ears would be highly beneficial.

Airplane pilots have a different problem: they often have to deal with too much information. In order to improve a pilot’s situational awareness with more and more information, while at the same time preventing the pilot from becoming overwhelmed, it is desirable to provide an acoustic display in which sounds appear to come from different locations. That way a pilot can better keep track of who is talking and where things are.

Without going into too much detail (‘cause then we’d have to kill you), the military started playing around with a thing called the convolvatron. Basically, a pilot would get tiny microphones put into both of his/her ears, and would then go into an anechoic chamber (a room with walls that completely absorb all sound) where his/her head would be clamped into a fixed position. A reference sound would then be played from various angles in the room and a computer would record the signals from the microphones in the pilot’s ears. After a bunch of number crunching, the computer would develop a complete picture of how sound coming from different angles would be changed by the size and shape of each pilot’s head and ears. This “picture” is called a head-related transfer function, or HRTF. Pilots could then load their individual HRTF into their aircraft’s headphone system, which would then synthesize a customized acoustic environment for audio display. The system would also track head movement so that as the pilot moved his/her head the acoustic cues would remain in their absolute positions.

“Cool! I want one!” you might say. Unfortunately, anechoic chambers are neither cheap nor very common, so it’s pretty tough to get your hands on your own personal HRTF. But technology does have a way of trickling down, and computers have a way of getting faster and cheaper. The advent of home theater spawned renewed, though somewhat limited, interest in headphone acoustic environment synthesis. The result has been a handful of products over the last few years based on digital signal processing (DSP). These products use generic or, in some cases, modifiable HRTFs to provide a more realistic “display” of both stereo and home theater sound when using headphones. The performance of these devices to date generally leaves a lot to be desired. There are a variety of reasons for this. The most significant is that, as mentioned earlier, the main cue your brain uses to localize sounds is the way those sounds change when you move your head.

This means that any truly effective headphone acoustic synthesis system must be able to track your head movements and make changes to each sound source based on the position of the head. Not only does this require that the HRTF be comprehensive (including information for every angle of approach to the head) but it also requires the added complexity and expense of a motion sensor mounted on the headphone. Not such a big deal when you’re flying around in a billion dollar jet, but a pretty serious price tag when you’re trying to come up with the bucks out of your own wallet. Most manufacturers simply ignore the task of tracking head movement, and instead build products that use HRTF for six fixed-position sound sources and call it good. In some of the scientific papers done in the past, it was shown that when the head is not allowed to move relative to the sound, front-to-back reversals (the subject thinking a sound is coming from behind when, in fact, it is coming from the front) went up by a factor of ten. The end result is that systems that don’t use head movement cues won’t really fool you into thinking that the sound is coming from outside your head. However, once you’ve spent enough time listening to “learn” the environment, good ones will be able to provide a satisfying and immersive listening experience.

Another reason these systems can be less than satisfactory is that sometimes the manufacturer tries to include too many darn cues in the transfer function. In speaker-based listening environments, in addition to HRTF you also hear the room that you are in. When you localize sound you also hear the sounds as they bounce off walls and are absorbed by carpet and furnishings. Your mind is amazingly adept at taking all these things into consideration when combining your acoustic sense with your visual sense to develop a complete picture of your orientation with your surroundings.

Oftentimes equipment manufacturers will add simulated room wall reflections and reverberations to HRTF cues in an effort to do a better job of fooling your head. Well, it’s not easy to fool Mother Nature, and these systems tend to sound weird and artificial. Though some teenagers may think it’s cool, I don’t know of anybody who seriously thinks their AV system sounds better in any of the fake acoustic environment settings on their surround receiver (“hall,” “church,” “theater,” etc.). Besides, your visual system is dominant, so your brain knows the size and shape of the room you’re in from your visual sense. There is no way your hearing system is going to override your visual system and convince your brain that you’re actually in a different room. Bottom line: most headphone systems that try to synthesize room information ultimately sound like you’re listening to music in tin cans of different sizes.

Finally, one of the most difficult problems to overcome in designing a system that tries to convince you that the sound is coming from outside of your head is that you have to do a LOT of things right to get it to work. One company might do a good job on the DSP codec but not such a hot job on the analog section, or vice versa. Or you might just be wearing not-so-hot headphones. The human listening system is extremely sophisticated and can successfully interpret extremely subtle cues; anything less than perfect quality will cause your mind to become aware of the fact that you’re listening to headphones. Once your brain realizes it’s listening to cans, it’s not going to be easily fooled into forgetting. However, time marches on and no question these systems have improved immeasurably over the past few years.