Coding Horror

programming and human factors

3D positional audio and HRTFs

I've always been fascinated with 3d positional audio through headphones. The nice thing about headphones is that they don't bug your neighbors or your wife-- and they're actually the best way to hear surround sound, too:

But for some surround sound, particularly 3D positional computer audio, headphones can actually work better than speakers.

The reason for this is that you've only got two ears. The way you tell whether a sound's in front, behind or above you, rather than just to your left or your right, is by processing the complex differences in phase, time delay and frequency balance that're imparted to differently located sounds by nearby objects (like walls), and by the sonic characteristics of your head.

Your pinnae - the outer parts of your ears - strongly influence sound waves that pass through and bounce off them. 3D game audio uses Head Related Transfer Function (HRTF) algorithms to fake the effects of the pinnae, the head and various listening environments, so that injecting the sound straight into the ear canal can produce the impression of real 3D audio sources.

When you've got HRTF-massaged two-channel audio already, for instance when you're playing a game, headphones are obviously the best way to get the sound into your head. There's no way for speakers to do the job as well, because there's no way for them to stop each ear hearing the sound that's intended for the other.

There's a long history of audiophile interest in stereo and binaural recordings, but 3d sound on a computer is a bit different:

  1. Monaural sound is a recording of a sound with one microphone. No sense of sound positioning is present in monaural sound.
  2. Stereo sound is recorded with two microphones several feet apart separated by empty space. Most people are familiar with stereo sound; it is heard commonly through stereo headphones and in the movie theater. When a stereo recording is played back, the recording from one microphone goes into the left ear, while the recording from the other microphone is channeled into the right ear. This gives a sense of the sound's position as recorded by the microphones. Listeners of stereo sound often perceive the sound sources to be at a position inside the listener's head -- that's because humans do not normally hear sounds this way, separated by empty space. The human head should be there acting as a filter to incoming sounds.
  3. Binaural recordings sound more realistic, as they are recorded in a manner that more closely resembles the human acoustic system: with the recording microphones embedded in a dummy head. Binaural recordings sound closer to what humans hear in the real world; the dummy head filters sound in a manner similar to the human head.
  4. 3D sound attempts to take binaural recordings one step further by recording sounds with tiny probe microphones in the ears of a real person. These recordings are compared with the original sounds to compute the person's head-related transfer function. The HRTF is a linear function that is based on the sound source's position and takes into account many of the cues humans used to localize sounds. The HRTF is used to develop pairs of finite impulse response (FIR) filters for specific sound positions; each sound position requires two filters, one for the left ear, and one for the right. To place a sound at a certain position in virtual space, the set of FIR filters that correspond to the position is applied to the incoming sound, yielding spatial sound.

Your ear shape (a.k.a. your pinnae) has a dramatic effect on how you hear sound. But don't take my word for it -- hear it for yourself. The 3D hearing test page has a binaurally recorded sound sample using eight different ear shapes.

3d Hearing Test

You can hear your PC sound card perform HRTFs using RightMark's 3DSound Positioning Accuracy test. Note that you must switch to DirectSound3D Hardware mode (or better) via the System menu to hear anything more than stereo positioning!

RightMark 3DSound Positioning Accuracy Test

If your card supports EAX modes, try those too. However, when using EAX, make sure you switch to the "plain" environment for apples-to-apples testing. For some reason it defaults to "generic", which colors the sound a bit.

HRTF functions magically convert stereo sound into 3D sound, but they are computationally expensive. That's probably why DirectSound Software mode offers no HRTFs. You need an add-in sound card with hardware acceleration to achieve 3D sound with headphones. The first PC sound card to offer 3D positional sound was the Aureal Vortex via the A3D API circa 1998. I was a huge fan. But unfortunately, Aureal isn't around any more.

So called "onboard" sound -- the kind you get on your motherboard for free -- has improved, but it generally has lower sound quality than a dedicated sound card, and it's certainly not capable of meaningful hardware acceleration. Onboard sound is simply not an option if you're a gamer of any kind. Although I grudgingly installed Creative sound cards in my PCs after the demise of Aureal, it was only because I had no other viable options. I always felt that Creative's 3D sound HRTF algorithms were never as good as Aureal's. Creative's new X-Fi sound cards, however, are finally poised to change that. For one thing, they have a lot more horsepower:

Sound Blaster Live!19982 million transistors
Sound Blaster Audigy 220024.1 million transistors
Pentium 4 "Northwood" 2.0GHz200255 million transistors
Sound Blaster X-Fi200551 million transistors

The X-Fi sound cards are also comically overpriced. Three hundred bucks for a sound card? But the lowest-end model, the X-Fi XtremeMusic, sacrifices almost nothing compared to the fancier models and is priced within reason at around $110 online. That's still double the cost of an Audigy 2, but unlike the last umpteen zillion Creative sound card "upgrades", you get a much more powerful card this time with some truly useful new features:

  • Up to 128 simultaneous sounds
  • Vastly improved CMSS-3D headphone HRTFs
  • Real time Smart Volume Management aka dynamic range compression
  • Real time upsampling of 16-bit sound to 24-bit

If you're looking for performance improvements over an earlier Sound Blaster card, there are none. It's just more functionality with no performance loss. For more details, check out Extremetech's review of the X-Fi by my pal Loyd Case.

I've been testing the X-Fi with Battlefield 2. It's one of the only two games that explicitly supports the new card's features at the moment (the other being the execrable Quake 4). I always play with headphones, and I noticed the improved 3D sound HRTFs immediately. The sound is also much richer with 128 voices; it's easy to exceed the previous limit of 64 simultaneous sounds in large multiplayer games. I'm very impressed. I also tried the card with Doom 3 using the 1.3 EAX patch and noticed similar improvements.

Although the X-Fi is a wee bit spendy, I can heartily recommend the basic model to fans of 3D audio and headphones. And if you want a clean, non-cluttered install, don't bother with the CD in the box. Just download the latest X-Fi drivers from Creative's website and install those instead.

creative-audio-console.png

The Creative Audio Console comes with the base driver, and it's all you need to configure the card.

Discussion

The World's Slowest Windows XP System

I'm not sure exactly why, but the guys at winhistory.de managed to install Windows XP on a 20 megahertz Pentium 1 system with 32 megabytes of RAM:

The system info dialog for the world's slowest XP install

That puts the XP in back in Windows XP -- Xtremely Pokey:

The CPU is working at 60% of full capacity at the Desktop! Nowadays with a modern CPU you have to run many tasks in background to reach such a high level of work.

For this reason you had to have patience, very often. Do recognize the changing of the blue color on the screen before the "Welcome"-page?? At 20 MHz you can see all 8 blues line by line!

Great stuff. The actual minimum system requirements for Windows XP are a 233 MHz CPU and 64 megabytes of RAM. But even a 20 MHz Pentium is still orders of magnitude more powerful than this Osborne Executive:

Osborne laptop ad

Most people associate Osborne Computers with the Osborne Effect-- pre-announcing the next model too early and decimating the sales of your current models. But as it turns out, that's an urban legend.

Discussion

DIVX vs. DivX

It's ironic that the popular DivX codec has all but obliterated the identity of the ill-fated DIVX pay-per-view rental system.

DivX logo vs. DIVX logo

So what was DIVX?

DIVX (Digital Video Express) was a rental format variation on the DVD player in which a customer would buy a DIVX disc -- physically similar to a DVD -- at a low cost, which would be able to be freely viewed up to 48 hours from its initial viewing. After this period, the disc could be viewed by paying a continuation fee, typically $3.25. DIVX discs could only be played on special DIVX/DVD combo players that needed to be connected to a phone line. DIVX Viewers had to set up an account that additional viewing fees could be charged to. The player would call an account server over the phone line to charge for viewing fees similar to the way DirecTV and Dish Network satellite systems handle pay-per-view. Viewers who wanted unlimited viewing of a particular disc could pay to convert the disc to "Silver" status for a special fee. The physical disc was not altered in any way. The viewer's account kept track of the status of each disc. The Silver disc could be kept for future viewing, resold, given away, or discarded.

This particular bad idea was intensely unpopular online. In 1998 and early 1999, it felt like you couldn't visit a website without seeing an anti-DIVX banner plastered on it somewhere.

Free DVD! Fight DIVX!

The format barely lasted a year.

The DivX codec was introduced in 1998 and intentionally named to parody the besieged DIVX format:

Early versions of DivX included only a codec, and were named "DivX ;-)", where the winking emoticon was a tongue-in-cheek reference to the failed DIVX system.

The DivX codec has a rather storied history itself:

1998 3.11 Alpha Codec created from an illegally hacked MPEG-4 video codec
2001 4.0 DivX corporation formed; clean room codec "created" from OpenDivx project*
2002 5.0 Codec improvements
2005 6.0 Codec expanded to full media container format (eg, *.divx)

The name replacement is so complete that the divx domain has been subsumed as well. You can watch it change hands via the internet archive wayback machine. The last DIVX snapshot is in October 1999; the site goes dark throughout 2000, and reappears in February 2001 as DivX.

It sure is funny when a little hacked codec name joke turns into a multi-million dollar business under the same exact name.

* The ethically questionable 2001 commercial DivX fork of the OpenDivx project is where the open source XviD codec originates from. It's DivX backwards, naturally.

Discussion

Our Virtual Machine Future

Lately I've been spending more and more time inside virtual machines. Whenever I need to try out a new bit of software, whether it's a small shell extension, or a giant product like Team System-- I tear off a new VM first. I don't want to junk up my primary install until I'm totally confident I know what that software does. It's guilty until proven innocent.

In fact, I'll go one step further. I think all software will eventually be distributed as virtual machine images. And why not? Consider the advantages:

  • It's the ultimate security sandbox. Too many scary vulnerabilities in crusty old IE6? You can't stop clicking on dancing bunnies? Just run your OS session in a virtual machine. At the end of every session, you blow it away. No spyware or virus is virulent enough to escape a VM. If you want to log in again, you tear off a new VM and start fresh. It's like formatting your hard drive every time you turn off your PC. And this doesn't have to be done at the OS level to be beneficial, either; why not selectively launch apps in their own private VMs?

     

  • It makes software installation a no-brainer. Forget installation or setup.exe; just boot a fully pre-configured VM that has the application locked, loaded, and primed. Now you're up and running in seconds. That's the ultimate out of box experience!
  • The operating system doesn't matter. Who cares if your app requires Linux or OS X to run if I can boot it in a pre-configured VM within a few seconds? This could be a huge industry sea change -- albeit helped a lot by the way Apple has cemented x86 as the industry standard CPU instruction set for the next millennium. But on the plus side, think of the vast number of applications you can choose from once you no longer have to worry about OS choice.
  • New CPUs will accelerate VMs. Virtual machines are reasonably fast now. But Intel has their "vanderpool" technology and AMD has an equivalent in "pacifica"; both promise to radically speed up virtualization via dedicated hardware.
  • What else are we going to do with all this power? Within a few years, quad-core chips will be available on the desktop and dual-core will be bog-standard on all new PCs. Terabyte hard drives? Check. 64-bit memory addressing and more than 4 gigabytes of RAM? Check. Outside of gaming, there's a handful of legitimate uses for all that power. But to be truly pervasive on the desktop, virtual machines need all that power.

And virtual machine software keeps getting cheaper, too. Parallels Workstation is only $45, and VMWare offers their free player which runs both VMWare and Virtual PC images. Virtual PC is effectively free for any developer with an MSDN subscription.

All we really lack, I suppose, is VM built into the operating system as a first-class citizen rather than a standalone application. But the solipsist operating system is surely coming:

 

solipsism (n): a theory holding that the self can know nothing but its own modifications and that the self is the only existent thing.

Eventually, all applications will believe they're the only applications in the world. And they'll be right.

Discussion

Software Developers and Asperger's Syndrome

When I read Wesner Moise's post on Asperger's Syndrome, I wasn't surprised. Many of the best software developers I've known share some of the traits associated with Asperger's Syndrome:

  1. Social impairments
    It is worth noting that because it is classified as a spectrum disorder, some people with Asperger syndrome are nearly normal in their ability to read and use facial expressions and other subtle forms of communication. However, this ability does not come naturally to most people with Asperger syndrome. Such people must learn social skills intellectually, delaying social development.
  2. Narrow, intense interests
    Asperger syndrome can involve an intense and obsessive level of focus on things of interest. [..] Particularly common interests are means of transport (such as trains), computers, math (particularly specific aspects, such as pi), wikipedia, and dinosaurs. Note that all of these last items are normal interests in ordinary children; the difference in Asperger children is the unusual intensity of their interest.
  3. Speech and language peculiarities
    Literal interpretation is another common but not universal hallmark of this condition. Attwood gives the example of a girl with Asperger syndrome who answered the telephone one day and was asked "Is Paul there?". Although the Paul in question was in the house, he was not in the room with her, so after looking around to ascertain this, she simply said "no" and hung up. The person on the other end had to call back and explain to her that he meant for her to find him and get him to pick up the telephone.

I often joke that you have to be a little obsessive-compulsive to be a good developer. Software development ..

  • skews heavily male
  • is fixated with order, syntax, and literal interpretation
  • allows you to deal with machines instead of people
  • requires a nearly obsessive focus

.. just like Asperger's.

This isn't a new idea; there's a classic Wired article on the disturbing connection between programming and asperger's syndrome:

It's a familiar joke in the industry that many of the hardcore programmers in IT strongholds like Intel, Adobe, and Silicon Graphics - coming to work early, leaving late, sucking down Big Gulps in their cubicles while they code for hours - are residing somewhere in Asperger's domain. Kathryn Stewart, director of the Orion Academy, a high school for high-functioning kids in Moraga, California, calls Asperger's syndrome "the engineers' disorder." Bill Gates is regularly diagnosed in the press: His single-minded focus on technical minutiae, rocking motions, and flat tone of voice are all suggestive of an adult with some trace of the disorder. Dov's father told me that his friends in the Valley say many of their coworkers "could be diagnosed with ODD - they're odd." In Microserfs, novelist Douglas Coupland observes, "I think all tech people are slightly autistic."

Though no one has tried to convince the Valley's best and brightest to sign up for batteries of tests, the culture of the area has subtly evolved to meet the social needs of adults in high-functioning regions of the spectrum. In the geek warrens of engineering and R&D, social graces are beside the point. You can be as off-the-wall as you want to be, but if your code is bulletproof, no one's going to point out that you've been wearing the same shirt for two weeks. Autistic people have a hard time multitasking - particularly when one of the channels is face-to-face communication. Replacing the hubbub of the traditional office with a screen and an email address inserts a controllable interface between a programmer and the chaos of everyday life. Flattened workplace hierarchies are more comfortable for those who find it hard to read social cues. A WYSIWYG world, where respect and rewards are based strictly on merit, is an Asperger's dream.

There's a documented genetic component to this spectrum of developmental disorders, which has unfortunate implications for areas where software engineers congregate:

High tech hot spots like the Valley, and Route 128 outside of Boston, are a curious oxymoron: They're fraternal associations of loners. In these places, if you're a geek living in the high-functioning regions of the spectrum, your chances of meeting someone who shares your perseverating obsession (think Linux or Star Trek) are greatly expanded. As more women enter the IT workplace, guys who might never have had a prayer of finding a kindred spirit suddenly discover that she's hacking Perl scripts in the next cubicle.

One provocative hypothesis that might account for the rise of spectrum disorders in technically adept communities like Silicon Valley, some geneticists speculate, is an increase in assortative mating. Superficially, assortative mating is the blond gentleman who prefers blondes; the hyperverbal intellectual who meets her soul mate in the therapist's waiting room. There are additional pressures and incentives for autistic people to find companionship - if they wish to do so - with someone who is also on the spectrum. Grandin writes, "Marriages work out best when two people with autism marry or when a person marries a handicapped or eccentric spouse.... They are attracted because their intellects work on a similar wavelength."

At clinics and schools in the Valley, the observation that most parents of autistic kids are engineers and programmers who themselves display autistic behavior is not news. And it may not be news to other communities either. Last January, Microsoft became the first major US corporation to offer its employees insurance benefits to cover the cost of behavioral training for their autistic children. One Bay Area mother told me that when she was planning a move to Minnesota with her son, who has Asperger's syndrome, she asked the school district there if they could meet her son's needs. "They told me that the northwest quadrant of Rochester, where the IBMers congregate, has a large number of Asperger kids," she recalls. "It was recommended I move to that part of town."

But it's ultimately a question of degree; who decides what is functional, what is normal? Hans Asperger, the Austrian psychiatrist who first identified the condition, once wrote it seems that for success in science and art, a dash of autism is essential.

Discussion