Coding Horror

programming and human factors

Nasty Software Hacks and Intel's CPUID

We were discussing nasty software hacks today at lunch. The worst hacks are always in software, but those software hacks have an insidious tendency to seep into the hardware, too. I was reminded of Intel's infamous CPUID hack:

Prior to the Pentium, software had to jump through elaborate loops to determine exactly what type of CPU was installed on an 80x86 computer. These methods involved checking for illegal opcodes, using known bugs in prior processors, a voodoo doll of Charles Babbage, and a ouija board. Intel fixed some of these problems with CPUID.

The CPUID opcode was introduced to the late models of the Intel 486 (486SL and 486DX4). The Intel Pentium, along with its various clones and successors, have all included this instruction. CPUID allows software to gain information on the CPU type and version. CPUID Function 0 returns an ASCII string, identifying the vendor ("GenuineIntel," "CyrixInstead," "AuthenticAMD," etc.). CPUID Function 1 returns the CPU family, model, and stepping.

Intel identifies its various processor using a combination of the family and model codes. Pentium processors are identified by a family code of 5. A family code of 6 covers the PentiumPro and all of its variants. Since the PentiumPro, Pentium II, Pentium III and Celeron are all based on the same processor architecture, they are all part of the P6 family( hence, family code 6). The model code is used to tell the various P6 processors apart, along with the cache size and brand ID, depending on the CPU (it's messy; don't ask).
Intel decided to make a new family code for the Pentium 4. That's where the fun begins.

The average person would think Intel would just increment the family code, making the Pentium 4 part of 'family 7'. That does make sense, but Intel already has a family code 7 processor : the Itanium (it came before the Pentium 4, even though the P4 hit the market first). Ok, no problem. Just make the Pentium 4's family code 8 instead of 7. Wrong. Big problem.

Microsoft Windows NT 4.0 ran into a bit of a snag with "family 8." For those not schooled in the ways of binary, the decimal number 8 is "1000" in binary. That's four binary digits. Four bits.

Four bits. Remember that, because it's important.

When Windows NT 4.0 and its six service packs were released, the largest CPU family code was 6. That's "110" in binary. Only three bits. So the NT code only looks at the first three bits of the CPU family when configuring the system.

If you haven't figured it out by now, the first three bits of 8 are zero, zero and, you guessed it, zero. Windows NT goes wacko when it sees a CPU family zero. Serious wacko. Jack with an axe at the end of The Shining wacko. Since Windows 2000 wasn't in wide release at the time, and Intel wanted to avoid this tech support issue, the family code had to be changed to avoid a conflict with Windows NT.

So now the family code for the Pentium 4 is 15, or "1111" in binary, so the first three bits look like 'CPU family 7' to Windows NT.

This hack is nasty enough to make even Raymond Chen, the patron saint of nasty software hacks, blanch. On the other hand, it is an instant upgrade from processor family 8 to processor family 15! Gee, thanks Windows NT 4.0!

Discussion

Microsoft 1978

I'm sure most of you are familiar with this famous Microsoft group photo from December 1978:

Microsoft group photo, December 1978

Groovy. In case you were wondering, the photo is authentic. It's even featured on the official Microsoft Bill Gates biography page. Of course we recognize Bill Gates in that famous photo, but I was curious about the other people in the photo. What happened to them? When did they leave Microsoft, and why? What are they doing now?

Update: Nearly 30 years later, Microsoft reshot this classic photo.

A coworker provided a link to this 2000 Time article that did most of the research for us; there's also a page on the Museum of Hoaxes that adds a bit more information on the people in the photo. I've combined the information from both sources here:

Top Row
Steve WoodProgrammer. Left Microsoft in 1980. Married to Marla Wood. Now runs a telecommunications company. EW $15 million.
Bob WallaceProduction manager-designer. Left Microsoft in 1983. Was a psychedelic-drug advocate. Died in 2002. EW $5 million.
Jim LaneProject manager. Left Microsoft in 1985. Now owns his own software company. EW $20 million.
 
Middle Row
Bob O'RearChief mathematician. Left Microsoft in 1993. Now a cattle rancher. EW $100 million.
Bob GreenbergProgrammer. Left Microsoft in 1981. Helped develop Cabbage Patch dolls for Coleco. Now makes software for golf courses. EW $20 million.
Marc McDonaldProgrammer. Microsoft's first employee. Left Microsoft in 1984 because it was "too big", then rejoined the company when they bought Design Intelligence, the company he was working for. Has the honor of wearing badge number 00001. EW $1 million.
Gordon LetwinProgrammer. Left Microsoft in 1993. Now an environmental philanthropist. EW $20 million.
 

Bottom Row


Bill GatesCo-founder. Still Microsoft chairman and chief architect. Now the richest person in the world. EW $50 billion.
Andrea LewisTechnical writer. Left Microsoft in 1983. Now a freelance journalist. EW $2 million.
Marla WoodBookkeeper. Married to Steve Wood. Left Microsoft in 1980, then sued the company for sex descrimination. Now a self-described "professional volunteer". EW $15 million.
Paul AllenCo-founder. Left Microsoft in 1983 but remains a senior strategy advisor to the company. Now sports team owner, space enthusiast, and philanthropist. EW $21 billion.

A lot of the information on the hoaxes site was cribbed directly from a 2000 article in the Albuquerque Tribune. Unfortunately, that article is no longer available on the Tribune website. I managed to pry a copy of the article out of the google cache, so I'm mirroring it locally to preserve the content.

The Microsoft logo was no less groovy in 1978:

70's era Microsoft logo

If you want to bone up on your ancient Microsoft history even further, I recommend the History of Computing project's Microsoft timeline.

Discussion

UI Follies: Windows Media Player Edition

Windows Media Player may be the only windows application with a UI that gets progressively worse with each new version. It is my media player of choice due only to overwhelming indifference on my part; I curse every time I use it. That's why I was so encouraged by Philipp Lenssen's rant on the horrible usability of WMP 10.

I am not alone. Philipp outlines the many UI problems in WMP 10 with detailed screenshots. I could elaborate, but why bother? He says exactly what I would say, almost to the letter. Go read it! These little niggling UI problems aren't enough to motivate me to switch to another media player, but they're painful and unnnecessary.

Paul Thurott has issues with the WMP 10 user interface too:

Though WMP 10 is less cluttered than previous WMP versions, it's easy to return WMP 10 to a state of UI complexity fair easily. Simply enter any of its "modes"--Now Playing, for example--and the UI is suddenly transformed to include a number of bizarre little buttons once again, in this case, the Select Now Playing Options button (to access Visualizations, Info Center View, various Plug-ins, and several enhancements), a status area for the currently accessed service, a View Full Screen button, a Video Pane Maximize/Restore button, and so on. But some of the modes are really nasty: Get into the Media Library, select the appropriate options, and you're suddenly looking at a pretty busy application (Figure). I mean, compare this clumsiness to the clean iTunes user interface, and you'll see what I mean (Figure).

But maybe that's not fair. After all, WMP 10 does a lot more than iTunes. A better comparison might be RealNetworks RealPlayer 10.5 with Harmony Technology, which, like WMP 10, is an all-in-one media player. And sure enough, like WMP 10, RealPlayer gets bogged down in options, though the presentation is largely is arguably more attractive, with pastel colors and none of the tree view nonsense that Microsoft is so fond of for some reason (Figure).

What's really shocking is that WMP 10 was an improvement over WMP 8 and 9. It still sucks for even my minimal usage patterns, so you can imagine how bad those versions were. Maybe that's why the open-source Media Player Classic project exists: it was all downhill for Microsoft from version 6.4.

If you're feeling nostalgic, try start, run, mplayer2 to see what version 6.4 looks like. Unfortunately, the old player is falling way behind on the technical playback details, but the simple UI is timeless.

Discussion

Passphrase Evangelism

The article Passwords: The Weakest Link references a 25 year old research work on the efficacy of passwords:

In the pre-Internet Age of 1979, when storage was measured in the number of bits that could fit on a foot of magnetic tape, a seminal paper on password security found that a third of users' passwords could be broken in less than five minutes.

This article was written in 2002, and the password security picture hasn't improved at all in the intervening 23 years:

When a regional health care company called in network protection firm Neohapsis to find the vulnerabilities in its systems, the Chicago-based security company knew a sure place to look.

Retrieving the password file from one of the health care company's servers, the consulting firm put "John the Ripper," a well-known cracking program, on the case. While well-chosen passwords could take years--if not decades--of computer time to crack, it took the program only an hour to decipher 30 percent of the passwords for the nearly 10,000 accounts listed in the file.

"Just about every company that we have gone into, even large multinationals, has a high percentage of accounts with easily (cracked) passwords," said Greg Shipley, director of consulting for Neohapsis. "We have yet to see a company whose employees don't pick bad passwords."

When there's no measurable improvement in password security between 1979 and 2005, clearly we aren't dealing with a technology problem. We're dealing with a people problem. Passwords are fundamentally broken because they aren't compatible with typical human behavior :

The only defense is to make passwords nearly impossible to guess, but such strength requires that the password be selected in a totally random fashion. That's a tall order for humans, said David Evans, an assistant professor of computer science at the University of Virginia.

"When humans make passwords, (they) are not very good at making up randomness," he said. Furthermore, because people usually have several passwords to keep track of, locking user accounts with random, but difficult-to-remember, strings of characters such as "wX%95qd!" is a recipe for a support headache. "The idea is to make something that is easy to remember but that will make up a good password," he said.

Many security administrators focus their efforts on teaching users how to use various mnemonics to create strong, but memorable, passwords. A common technique takes the first or last letter of each word in a saying or phrase familiar to the user. For example, by using random capitalization and substituting some punctuation marks and digits for letters, "Friends don't let friends give tech advice" might become "fD!Fg7a."

The education doesn't seem to be sticking, and the password problem is getting worse as the percentage of less-tech-savvy computer users increases.

I don't have a solution to the password problem, but there is one thing we can do to improve the usability and security of passwords dramatically.

We have to encourage users to stop thinking of passwords as single words, and start thinking of them as pass phrases. The worst imaginable pass phrase (eg, "this is my secret password") is many times more secure than an average single word password (eg, "god123"). And it's easier to remember.*

As a developer, you need to do your part, too:

  1. Absolutely, positively make sure your applications support a password field length of at least 128 unicode characters.
  2. In the user interface for defining the password, remind the user that password doesn't literally mean a word. Give several examples of pass phrases directly alongside the entry field. It's absolutely imperative that we educate the users-- how else will they know there's some other way to deal with that input box?

The greatest long term security threat isn't hackers. It's the perpetuation of the braindead 8-16 character password length limitation, and the idea that passwords are single words.

* unfortunately, not easier to type, but neither is "X74@&z3!". What are you gonna do?

Discussion

Perfmon Gone Wild

When diagnosing server performance problems, the first tool I turn to is the humble Task Manager. That's usually enough to get a rough idea of where we are in the bottleneck shell game: is it CPU, Disk, Network or Memory?

But sometimes you need to dig into performance a little deeper. Then it's time to drag out Performance Monitor. I always resist doing this for as long as I can because using perfmon is like trying to sip from a fire hose: there are a zillion performance counters that produce veritable mountains of data. The .NET framework has probably a hundred .NET-specific performance counters, and that's just a tiny fraction of the available operating system performance counters. It's downright overwhelming. Where to begin?

Microsoft provides a helpful performance monitor wizard which walks you through the process of setting up a perfmon trace with default counters. Per the Wizard, that's the following:

Cache*
Memory*
Network Interface(*)*
Objects*
Paging File(*)*
Physical Disk(*)*
Process(*)*
Processor(*)*
Redirector*
Server Work Queues(*)*
Server*
System*

Once created, the trace can be stopped, started or modified via the Computer Management / System Tools / Performance Logs and Alerts / Counter Logs interface. Here's what the default wizard-produced trace looks like:

perfmon, or modern art?

It's perfmon gone wild!

This is way, way, WAY too much information. Let's see if we can narrow it down to some key performance counters:

  • Processor(_Total)% Processor Time
    The percentage of elapsed time that the processor spends to execute a non-Idle thread. (more)
  • Processor(_Total)Interrupts/sec
    An indirect indicator of the activity of hardware devices that generate interrupts, such as the system clock, the mouse, disk drivers, data communication lines, network interface cards, and other peripheral devices. (more)
  • SystemProcessor Queue Length
    The number of non-running ready threads in the processor queue. There is a single queue for processor time even on computers with multiple processors. If a computer has multiple processors, you need to divide this value by the number of processors servicing the workload. A sustained processor queue of less than 10 threads per processor is normally acceptable, depending on workload. (more)
  • MemoryAvailable Bytes
    The amount of physical memory, in bytes, available to processes running on the computer. Calculated by adding the amount of space on the Zeroed, Free, and Standby memory lists. (more)
  • Process(All_processes)Working Set
    The set of recently touched memory pages for all processes. If free memory in the computer is above a threshold, pages are left in the Working Set of a process even if they are not in use. When free memory falls below a threshold, pages are trimmed from Working Sets. If they are needed they will then be soft-faulted back into the Working Set before leaving main memory. (more)
  • MemoryPages/sec
    The rate at which pages are read from or written to disk to resolve hard page faults. This is a primary indicator of the kinds of faults that cause system-wide delays. It includes pages retrieved to satisfy faults in the file system cache. (more)
  • PhysicalDisk% Disk Time
    The percentage of elapsed time that the selected disk drive was busy servicing read or write requests. (more)
  • PhysicalDiskCurrent Disk Queue Length
    The number of requests outstanding on the disk at the time the performance data is collected. Requests experience delays proportional to the length of this queue minus the number of spindles on the disks. For good performance, this difference should average less than two. (more)
  • ServerBytes Received/sec
    The number of bytes the server has received from the network.
  • ServerBytes Transmitted/sec
    The number of bytes the server has sent on the network.

Those counters should be enough to give you a general sense of whether you're dealing with a disk, memory, CPU, or network bottleneck-- without being too overwhelming.

If you need to capture more performance counters than this, I suggest switching the counter log to CSV output via the properties dialog. Then download Microsoft's excellent LogParser tool. Now you can slice, dice, and even graph the data however you like using a relatively simple SQL-like syntax.

Discussion