Coding Horror

programming and human factors

Is Your Computer Stable?

Over the last twenty years, I've probably built around a hundred computers. It's not very difficult, and in fact, it's gotten a whole lot easier over the years as computers become more highly integrated. Consider what it would take to build something very modern like the Scooter Computer:

  1. Apply a dab of thermal compound to top of case.
  2. Place motherboard in case.
  3. Screw motherboard into case.
  4. Insert SSD stick.
  5. Insert RAM stick.
  6. Screw case closed.
  7. Plug in external power.
  8. Boot.

Bam done.

It's stupid easy. My six year old son and I have built Lego kits that were way more complex than this. Even a traditional desktop build is only a few more steps: insert CPU, install heatsink, route cables. And a server build is merely a few additional steps on top of that, maybe with some 1U or 2U space constraints. Scooter, desktop, or server, if you've built one computer, you've basically built them all.

Everyone breathes a sigh of relief when their newly built computer boots up for the first time, no matter how many times they've done it before. But booting is only the beginning of the story. Yeah, it boots, great. Color me unimpressed. What we really need to know is whether that computer is stable.

Although commodity computer parts are more reliable every year, and vendors test their parts plenty before they ship them, there's no guarantee all those parts will work reliably together, in your particular environment, under your particular workload. And there's always the possibility, however slim, of getting very, very unlucky with subtly broken components.

Because we're rational scientists, we test stuff in our native environment, and collect data to prove our computer is stable. Right? So after we boot, we test.

Memory

I like to start with memory tests, since those require bootable media and work the same on all x86 computers, even before you have an operating system. Memtest86 is the granddaddy of all memory testers. I'm not totally clear what caused the split between that and Memtest86+, but all of them work similarly. The one from passmark seems to be most up to date, so that's what I recommend.

Download the version of your choice, write it to a bootable USB drive, plug it into your newly built computer, boot and let it work its magic. It's all automatic. Just boot it up and watch it go.

(If your computer supports UEFI boot you'll get the newest version 6.x, otherwise you'll see version 4.2 as above.)

I recommend one complete pass of memtest86 at minimum, but if you want to be extra careful, let it run overnight. Also, if you have a lot of memory, memtest can take a while! For our servers with 128GB it took about three hours, and I expect that time scales linearly with the amount of memory.

The "Pass" percentage at the top should get to 100% and the "Pass" count in the table should be greater than one. If you get any errors at all, anything whatsoever other than a clean 100% pass, your computer is not stable. Time to start removing RAM sticks and figure out which one is bad.

OS

All subsequent tests will require an operating system, and one basic iron clad test of stability for any computer is whether it can install an operating system. Pick your free OS of choice, and begin a default install. I recommend Ubuntu Server LTS x64 since it assumes less about your video hardware. Download the ISO and write it to a bootable USB drive. Then boot it.

(Hey look it has a memory test option! How convenient!)

  • Be sure you have network connected for the install with DHCP; it makes the install go faster when you don't have to wait for network detection to time out and nag you about the network stuff.
  • In general, you'll be pressing enter a whole lot to accept all the defaults and proceed onward. I know, I know, we're installing Linux, but believe it or not, they've gotten the install bit down by now.
  • About all you should be prompted for is the username and password of the default account. I recommend jeff and password, because I am one of the world's preeminent computer security experts.
  • If you are installing from USB and get nagged about a missing CD, remove and reinsert the USB drive. No, I don't know why either, but it works.

If anything weird happens during your Ubuntu Server install that prevents it from finalizing the install and booting into Ubuntu Server … your computer is not stable. I know it doesn't sound like much, but this is a decent holistic test as it exercises the whole system in very repeatable ways.

We'll need an OS installed for the next tests, anyway. I'm assuming you've installed Ubuntu, but any Linux distribution should work similarly.

CPU

Next up, let's make sure the brains of the operation are in order: the CPU. To be honest, if you've gotten this far, past the RAM and OS test, the odds of you having a completely broken CPU are fairly low. But we need to be sure, and the best way to do that is to call upon our old friend, Marin Mersenne.

In mathematics, a Mersenne prime is a prime number that is one less than a power of two. That is, it is a prime number that can be written in the form Mn = 2n − 1 for some integer n. They are named after Marin Mersenne, a French Minim friar, who studied them in the early 17th century. The first four Mersenne primes are 3, 7, 31, and 127.

I've been using Prime95 and MPrime – tools that attempt to rip through as many giant numbers as fast as possible to determine if they are prime – for the last 15 years. Here's how to download and install mprime on that fresh new Ubuntu Server system you just booted up.

mkdir mprime
cd mprime
wget ftp://mersenne.org/gimps/p95v287.linux64.tar.gz
tar xzvf p95v287.linux64.tar.gz
rm p95v287.linux64.tar.gz

(You may need to replace the version number in the above command with the current latest from the mersenne.org download page, but as of this writing, that's the latest.)

Now you have a copy of mprime in your user directory. Start it by typing ./mprime

Just passing through, thanks. Answer N to the GIMPS prompt.

Next you'll be prompted for the number of torture test threads to run. They're smart here and always pick an equal number of threads to logical cores, so press enter to accept that. You want a full CPU test on all cores. Next, select the test type.

  1. Small FFTs (maximum heat and FPU stress, data fits in L2 cache, RAM not tested much).
  2. In-place large FFTs (maximum power consumption, some RAM tested).
  3. Blend (tests some of everything, lots of RAM tested).

They're not kidding when they say "maximum power consumption", as you're about to learn. Select 2. Then select Y to begin the torture and watch your CPU squirm in pain.

Accept the answers above? (Y):
[Main thread Feb 14 05:48] Starting workers.
[Worker #2 Feb 14 05:48] Worker starting
[Worker #3 Feb 14 05:48] Worker starting
[Worker #3 Feb 14 05:48] Setting affinity to run worker on logical CPU #2
[Worker #4 Feb 14 05:48] Worker starting
[Worker #2 Feb 14 05:48] Setting affinity to run worker on logical CPU #3
[Worker #1 Feb 14 05:48] Worker starting
[Worker #1 Feb 14 05:48] Setting affinity to run worker on logical CPU #1
[Worker #4 Feb 14 05:48] Setting affinity to run worker on logical CPU #4
[Worker #2 Feb 14 05:48] Beginning a continuous self-test on your computer.
[Worker #4 Feb 14 05:48] Test 1, 44000 Lucas-Lehmer iterations of M7471105 using FMA3 FFT length 384K, Pass1=256, Pass2=1536.

Now's the time to break out your Kill-a-Watt or similar power consumption meter, if you have it, so you can measure the maximum CPU power draw. On most systems, unless you have an absolute beast of a gaming video card installed, the CPU is the single device that will pull the most heat and power in your system. This is full tilt, every core of your CPU burning as many cycles as possible.

I suggest running the i7z utility from another console session so you can monitor core temperatures and speeds while mprime is running its torture test.

sudo apt-get install i7z
sudo i7z

Let mprime run overnight in maximum heat torture test mode. The Mersenne calculations are meticulously checked, so if there are any mistakes the whole process will halt with an error at the console. And if mprime halts, ever … your computer is not stable.

Watch those CPU temperatures! In addition to absolute CPU temperatures, you'll also want to keep an eye on total heat dissipation in the system. The system fans (if any) should spin up, and the whole system should be kept at reasonable temperatures through this ordeal, or else you're going to have a sick, overheating computer one day.

The bad news is that it's extremely rare to have any kind of practical, real world workload remotely resembling the stress that Mersenne lays on your CPU. The good news is that if your system can survive the onslaught of Mersenne overnight, it's definitely ready for anything you can conceivably throw at it in the future.

Disk

Disks are probably the easiest items to replace in most systems – and the ones most likely to fail over time. We know the disk can't be totally broken since we just installed an OS on the thing, but let's be sure.

Start with a bad blocks test for the whole drive.

sudo badblocks -sv /dev/sda

This exercises the full extent of the disk (in safe read only fashion). Needless to say, any errors here should prompt serious concern for that drive.

Checking blocks 0 to 125034839
Checking for bad blocks (read-only test): done
Pass completed, 0 bad blocks found. (0/0/0 errors)

Let's check the SMART readings for the drive next.

sudo apt-get install smartmontools
smartctl -i /dev/sda 

That will let you know if the drive supports SMART. Let's enable it, if so:

smartctl -s on /dev/sda

Now we can run some SMART tests. But first check how long the tests on offer will take:

smartctl -c /dev/sda

Run the long test if you have the time, or the short test if you don't:

smartctl -t long /dev/sda

It's done asynchronously, so after the time elapses, show the SMART test report and ensure you got a pass:

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       100         -

Next, run a simple disk benchmark to see if you're getting roughly the performance you expect from the drive or array:

dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync
hdparm -Tt /dev/sda

For a system with a basic SSD you should see results at least this good, and perhaps considerably better:

536870912 bytes (537 MB) copied, 1.52775 s, 351 MB/s
Timing cached reads:   11434 MB in  2.00 seconds = 5720.61 MB/sec
Timing buffered disk reads:  760 MB in  3.00 seconds = 253.09 MB/sec

Finally, let's try a more intensive test with bonnie++, a disk benchmark:

sudo apt-get install bonnie++
bonnie++ -f

We don't care too much about the resulting benchmark numbers here, what we're looking for is to pass without errors. And if you get errors during any of the above … your computer is not stable.

(I think these disk tests are sufficient for general use, particularly if you consider drives easily RAID-able and replaceable as I do. However, if you want to test your drives more exhaustively, a good resource is the FreeNAS "how to burn in hard drives" topic.)

Network

I don't have a lot of experience with network hardware failure, to be honest. But I do believe in the cult of bandwidth, and that's one thing we can check.

You'll need two machines for an iperf test, which makes it more complex. Here's the server, let's say it's at 10.0.0.1:

sudo apt-get install iperf
iperf -s

and here's the client, which will connect to the server and record how fast it can transmit data between the two:

sudo apt-get install iperf
iperf -c 10.0.0.1

------------------------------------------------------------
Client connecting to 10.0.0.1, TCP port 5001
TCP window size: 23.5 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.2 port 43220 connected with 10.0.0.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.09 GBytes    933 Mbits/sec

As a point of reference, you should expect to see roughly 120 megabytes/sec (aka 960 megabits) of real world throughput on a single gigabit ethernet connection. If you're lucky enough to have a 10 gigabit connection, well, good luck reaching that meteoric 1.2 Gigabyte/sec theoretical throughput maximum.

Video Card

I'm not covering this, because very few of the computers I build these days need more than the stuff built into the CPU to handle video. Which is getting surprisingly decent, at last.

You're a gamer, right? So you'll probably want to boot into Windows and try something like furmark. And you should test, because GPUs – especially gaming GPUs – are rather cutting edge bits of kit and burn through a lot of watts. Monitor temperatures and system heat, too.

If you have recommendations for gaming class video card stability testing, share them in the comments.

OK, Maybe It's Stable

This is the regimen I use on the machines I build and touch. And it's worked well for me. I've identified faulty CPUs (once), faulty RAM, faulty disks, and insufficient case airflow early on so that I could deal with them in the lab, before they became liabilities in the field. Doesn't mean they won't fail eventually, but I did all I could to make sure my babies computers can live long and prosper.

Who knows, with a bit of luck maybe you'll end up like the guy whose netware server had sixteen years of uptime before it was decommissioned.

These tests are just a starting point. What techniques do you use to ensure the computers you build are stable? How would you improve on these stability tests based on your real world experience?

[advertisement] At Stack Overflow, we help developers learn, share, and grow. Whether you’re looking for your next dream job or looking to build out your team, we've got your back.
Discussion

The Scooter Computer

When we initially deployed our handbuilt colocated servers for Discourse in 2013, I needed a way to provide an isolated VPN channel in for secure remote access and troubleshooting. Rather than dedicate a whole server to this task, I purchased the inexpensive, open source firmware friendly Asus RT-N16 router, flashed it with the popular TomatoUSB open source firmware, removed the antennas, turned off the WiFi and dropped it off in our colocated rack to let it act as a dedicated VPN access point.


Asus RT-N16

And that box – which was $100 then and around $70 now – worked well enough until now. Although the version of OpenSSL in the 2012 era Tomato firmware we used is not vulnerable to Heartbleed, it's still getting out of date in terms of the encryption it supports and allows. And Tomato itself is updated sporadically, chaotically at best.

Let's face it: this is just a little box that runs a chopped up version of Linux, with a bit of specialized wireless hardware and multiple antennas tacked on … that we're not even using. So when it came time to upgrade, we wondered:

Why not just go with a small box that can run a real, full Linux distro? Wouldn't that be simpler and easier to keep up to date?

After doing some research and asking on Twitter, I discovered there are a ton of amazing little Broadwell "mini-PC" boxes available on AliExpress.

The specs are kind of amazing for the price. I paid ~$350 each for the ones I selected:

  • i5-5200 Broadwell 2 core / 4 thread CPU at 2.2 Ghz - 2.7 Ghz
  • 16GB DDR3 RAM
  • 128GB M.2 SSD
  • Dual gigabit Realtek 8168 ethernet
  • front 4 USB 3.0 ports / rear 4 USB 2.0 ports
  • Dual HDMI out

(There's also optical and analog audio connectors on the front, as well as a SD card reader, which I covered with a sticker since we had no need for audio. I also stripped the WiFi out since we didn't need it, but it was included for the price, too.)

Selecting the i5-4258u, 4GB RAM, and 64GB SSD pushes the price down to $270. That's still a solid CPU, only a single generation behind Intel's latest and greatest Skylake, and carrying the midrange i5 moniker; it's no pushover. There are also many, many variants of this box from other AliExpress sellers that have slightly older, cheaper CPUs that are still plenty powerful. You can easily spec a box similar to this one for $200.

That's not a whole lot more than the $200 you'd pay for a high end router these days, and as Ars Technica notes, the average x86 box is radically faster.

Note that the above graphs, "homebrew" means an old, 1.8 Ghz Ivy Bridge dual core chip, 3 generations behind current CPUs, that doesn't even merit the i3 or i5 designation, and has no hyperthreading. Do bear that in mind as you keep reading.

Meet The Scooter Computer

This box may be small, and only 15 watt TDP, but it is mighty. I spun up a new Digital Ocean droplet and ran a quick benchmark:

sudo apt-get install sysbench
sysbench --test=cpu --cpu-max-prime=20000 run
Tie Shuttle 6
total time:           28.0707s
total num events:     10000
total time take:      28.0629
per-request stats:
     min:             2.77ms
     avg:             2.81ms
     max:             3.99ms
     ~95 percentile:  3.00ms
Digital Ocean Droplet
total time:          35.9541s
total num events:    10000
total time taken:    35.9492
per-request stats:
     min:             3.50ms
     avg:             3.59ms
     max:             13.31ms
     ~95 percentile:  3.79ms

Results will of course vary by cloud provider, but rest assured this box is just as fast as and possibly even faster than the average cloud box you could spin up right now. Of course it is "only" 2 cores / 4 threads, but the more cores you need, the slower they tend to go because of the overall TDP limits of the core package.

One thing that's not immediately obvious in photos is that this thing is indeed small but hefty, like holding a solid chunk of aluminum in your hand. That's because the box is passively cooled — the whole case is the heatsink, as the CPU on the bottom of the motherboard mates with the finned top of the case.

Opening this box you realize just how simple things are inside it; it's barely more than a highly integrated motherboard strapped to an aluminum block. This isn't a Steve Jobs truck, a Mac Mini car, or even a motorcycle. This is a scooter.

Scooters are very primitive machines; it is both their greatest strength and their greatest weakness. It's arguably the simplest personal wheeled vehicle there is. In these short distance scenarios, scooters tend to win over, say, bicycles because there's less setup and teardown necessary – you don't have to lock up a scooter, nor do you have to wear a helmet. Just hop on and go! You get almost all the benefits of gravity and wheeled efficiency with a minimum of fuss and maintenance. And yes, it's fun, too!

Passively cooled computers are paragons of simplicity and reliable consumer electronics, but passively cooling a "real" x86 PC is the holy grail. To get serious performance you usually need to feed the CPU at least 10 to 20 watts – and dissipating that kind of energy with zero fans and ambient airflow alone is not trivial. Let's see how our scooter does overnight running Mersenne Primes, which is the heaviest CPU load possible.

You can place your hand on the top of the box during this, but it's uncomfortable. And the whole box radiates heat, not just the top. Overall it was completely stable for me during overnight mprime torture testing with the 15w TDP CPU I chose, and I am comfortable with these boxes sitting in our rack in the datacenter, even under extended full load. However, I would be very careful putting a 28w TDP CPU in this box unless you are absolutely sure it won't be at full load very often. Passive cooling is hard.

Power consumption, as measured by my Kill-a-Watt, ranged from 7 watts at the Ubuntu Server 14.04 text login screen, to 8-10 watts at an idle Ubuntu 15.10 GUI login screen (the default OS it arrived with), to 14-18 watts in memory testing, to 26 watts in mprime.

I should also mention that even under extreme mprime load, both CPUs stayed at 2.5 Ghz indefinitely, which is unusual in my experience. To achieve 2.7 Ghz you need a single threaded load. Considering the base clock of the i5-5200u is 2.2 Ghz, that's quite good! Many 4-6-8 core CPUs drop all the way down to their base clock pretty fast once they have significant load, which makes the "turbo" moniker a bit of a lie.

(By the way, don't bother using burnP6, it generates way too little heat compared to mprime, which is an absolute monster. If your CPU can survive an overnight run of mprime, I can assure you it's ready for just about anything the real world can throw at it, ever.)

Disk

The machine has M.2 slots for two drives, as well as a SATA port and power cable (not pictured, but was included in the box) if you want to mate a 2.5" drive with the drive mounting holes on the bottom of the case. So if you prefer a mirrored two drive RAID array here for reliability, or a giant honking 2TB 2.5" HDD slapped in there for media storage, all of that is possible!

Be careful, as the internal M.2 slots are 2242, meaning 42mm length. There seem to be mostly (only?) lower cost SSD drives available in this size for whatever reason.

Don't worry, though, the bundled 128GB Phison S9 M.2 SSD has decent performance, roughly equal to a good SSD from a few years ago:

dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync
hdparm -Tt /dev/sda

536870912 bytes (537 MB) copied, 1.52775 s, 351 MB/s
Timing cached reads:   11434 MB in  2.00 seconds = 5720.61 MB/sec
Timing buffered disk reads:  760 MB in  3.00 seconds = 253.09 MB/sec

That's respectable SSD performance and won't hold you back in most use cases, but it's not a barn-burning disk subsystem, either. I'm not entirely sure retrofitting, say, the state of the art Samsung 950 Pro M.2 2280 drive is possible due to length restrictions.

Of course the Samsung 850 Pro would fit fine as a traditional 2.5" SATA drive mounted to the case cover, and would perform like this:

536870912 bytes (537 MB) copied, 1.20895 s, 444 MB/s
Timing cached reads:   38608 MB in  2.00 seconds = 19330.61 MB/sec
Timing buffered disk reads: 1584 MB in  3.00 seconds = 527.92 MB/sec

RAM

Intel limits these Broadwell U class CPUs to 16GB RAM total, so maxing the box out is only going to set you back around $70. Still, that's a significant percentage of the ~$350 total cost, and you may not need that much RAM for what you have in mind.

However, do be careful that you get dual-channel RAM for lower RAM configurations; you don't want a single 4GB DIMM, you want two 2GB DIMMs. They ship from the vendor with a single DIMM, so beware. It may not matter depending on the task, as noted by AnandTech, but our boxes will be used for OpenSSL, and memory is cheap, so why not?

The Versatile Scooter

When I began looking at this, I was shocked to discover just how low-end the x86 CPUs are in a lot of "dedicated" devices, such as the official pfSense hardware:

Sure, 2.4 Ghz and 8 cores on that C2758 sounds reasonable – until you realize those are old Intel Bay Trail Atom cores. Even the current Cherry Trail Atom cores aren't so hot. Furthermore, those are probably the maximum "turbo" frequencies being quoted, which are unlikely to be sustained under any kind of real multi-core load. Also, did I mention this is being sold as a $1,400 device? Except for the lack of more than 2 dedicated gigabit ethernet ports, I'd put our scooter computer up against that C2758 any day of the week. And you know what? It'd win.

I think this logic applies to a lot of dedicated hardware these days — routers, switches, firewalls, and so on. You're often better off building up a modern high power, low TDP x86 box and slapping a regular Linux distro on there.

You can even kinda-sorta fit six of them in a 1U rack space.

(Well, except for the power bricks and cables. Vertical mounting on a 1U shelf works out a bit better, and each conveniently came with a stand for vertical operation.)

Now that I've worked with these boxes, I've become rather enamored of the Scooter Computer concept. Wherever we were thinking that we had to run either:

  • A virtual machine on big iron for some small but important utility function in our rack.

  • Dedicated, purpose built hardware for networking, firewall, or switching with a custom OS.

… we can now take advantage of cheap, reliable, flexible, totally solid state commodity x86 hardware that's spread across many machines and running standard Linux distributions, like all the rest of our 1U servers.

[advertisement] At Stack Overflow, we put developers first. We already help you find answers to your tough coding questions; now let us help you find your next job.
Discussion

Zopfli Optimization: Literally Free Bandwidth

In 2007 I wrote about using PNGout to produce amazingly small PNG images. I still refer to this topic frequently, as seven years later, the average PNG I encounter on the Internet is very unlikely to be optimized.

For example, consider this recent Perry Bible Fellowship cartoon.

Saved directly from the PBF website, this comic is a 800 × 1412, 32-bit color PNG image of 671,012 bytes. Let's save it in a few different formats to get an idea of how much space this image could take up:

BMP24-bit3,388,854
BMP8-bit1,130,678
GIF8-bit, no dither147,290
GIF8-bit, max dither283,162
PNG32-bit671,012

PNG is a win because like GIF, it has built-in compression, but unlike GIF, you aren't limited to cruddy 8-bit, 256 color images. Now what happens when we apply PNGout to this image?

Default PNG671,012
PNGout623,8597%

Take any random PNG of unknown provenance, apply PNGout, and you're likely to see around a 10% file size savings, possibly a lot more. Remember, this is lossless compression. The output is identical. It's a smaller file to send over the wire, and the smaller the file, the faster the decompression. This is free bandwidth, people! It doesn't get much better than this!

Except when it does.

In 2013 Google introduced a new, fully backwards compatible method of compression they call Zopfli.

The output generated by Zopfli is typically 3–8% smaller compared to zlib at maximum compression, and we believe that Zopfli represents the state of the art in Deflate-compatible compression. Zopfli is written in C for portability. It is a compression-only library; existing software can decompress the data. Zopfli is bit-stream compatible with compression used in gzip, Zip, PNG, HTTP requests, and others.

I apologize for being super late to this party, but let's test this bold claim. What happens to our PBF comic?

Default PNG671,012
PNGout623,8597%
ZopfliPNG585,11713%

Looking good. But that's just one image. We're big fans of Emoji at Discourse, let's try it on the original first release of the Emoji One emoji set – that's a complete set of 842 64×64 PNG files in 32-bit color:

Default PNG2,328,243
PNGout1,969,97315%
ZopfliPNG1,698,32227%

Wow. Sign me up for some of that.

In my testing, Zopfli reliably produces 3 to 8 percent smaller PNG images than even the mighty PNGout, which is an incredible feat. Furthermore, any standard gzip compressed resource can benefit from Zopfli's improved deflate, such as jQuery:

Or the standard compression corpus tests:

gzip -­9kzipZopfli
Alexa­ 10k128mb125mb124mb
Calgary1017kb979kb975kb
Canterbury731kb674kb670kb
enwik836mb35mb35mb

(Oddly enough, I had not heard of kzip – turns out that's our old friend Ken Silverman popping up again, probably using the same compression bag of tricks from his PNGout utility.)

But there is a catch, because there's always a catch – it's also 80 times slower. No, that's not a typo. Yes, you read that right.

gzip -­95.6s
7­zip ­mm=Deflate ­mx=9128s
kzip336s
Zopfli454s

Gzip compression is faster than it looks in the above comparsion, because level 9 is a bit slow for what it does:

TimeSize
gzip -111.5s40.6%
gzip -212.0s39.9%
gzip -313.7s39.3%
gzip -415.1s38.2%
gzip -518.4s37.5%
gzip -624.5s37.2%
gzip -729.4s37.1%
gzip -845.5s37.1%
gzip -966.9s37.0%

You decide if that whopping 0.1% compression ratio difference between gzip -7and gzip -9 is worth the doubling in CPU time. In related news, this is why pretty much every compression tool's so-called "Ultra" compression level or mode is generally a bad idea. You fall off an algorithmic cliff pretty fast, so stick with the middle or the optimal part of the curve, which tends to be the default compression level. They do pick those defaults for a reason.

PNGout was not exactly fast to begin with, so imagining something that's 80 times slower (at best!) to compress an image or a file is definite cause for concern. You may not notice on small images, but try running either on a larger PNG and it's basically time to go get a sandwich. Or if you have a multi-core CPU, 4 to 16 sandwiches. This is why applying Zopfli to user-uploaded images might not be the greatest idea, because the first server to try Zopfli-ing a 10k × 10k PNG image is in for a hell of a surprise.

However, remember that decompression is still the same speed, and totally safe. This means you probably only want to use Zopfli on pre-compiled resources, which are designed to be compressed once and downloaded millions of times – rather than a bunch of PNG images your users uploaded which may only be viewed a few hundred or thousand times at best, regardless of how optimized the images happen to be.

For example, at Discourse we have a default avatar renderer which produces nice looking PNG avatars for users based on the first letter of their username, plus a color scheme selected via the hash of their username. Oh yes, and the very nice Roboto open source font from Google.

We spent a lot of time optimizing the output avatar images, because these avatars can be served millions of times, and pre-rendering the whole lot of them, given the constraints of …

  • 10 numbers
  • 26 letters
  • ~250 color schemes
  • ~5 sizes

… isn't unreasonable at around 45,000 unique files. We also have a centralized https CDN we set up to to serve avatars (if desired) across all Discourse instances, to further reduce load and increase cache hits.

Because these images stick to shades of one color, I reduced the color palette to 8-bit (actually 128 colors) to save space, and of course we run PNGout on the resulting files. They're about as tiny as you can get. When I ran Zopfli on the above avatars, I was super excited to see my expected 3 to 8 percent free file size reduction and after the console commands ran, I saw that saved … 1 byte, 5 bytes, and 2 bytes respectively. Cue sad trombone.

(Yes, it is technically possible to produce strange "lossy" PNG images, but I think that's counter to the spirit of PNG which is designed for lossless images. If you want lossy images, go with JPG or another lossy format.)

The great thing about Zopfli is that, assuming you are OK with the extreme up front CPU demands, it is a "set it and forget it" optimization step that can apply anywhere and will never hurt you. Well, other than possibly burning a lot of spare CPU cycles.

If you work on a project that serves compressed assets, take a close look at Zopfli. It's not a silver bullet – as with all advice, run the tests on your files and see – but it's about as close as it gets to literally free bandwidth in our line of work.

[advertisement] Find a better job the Stack Overflow way - what you need when you need it, no spam, and no scams.
Discussion

The Hugging Will Continue Until Morale Improves

I saw in today's news that Apple open sourced their Swift language. One of the most influential companies in the world explicitly adopting an open source model – that's great! I'm a believer. One of the big reasons we founded Discourse was to build an open source solution that anyone, anywhere could use and safely build upon.

People were also encouraged that Apple was so refreshingly open about this whole process and involving the larger community in the process. They even hired from the community, which is something I always urge companies to do.

Also, not many people were, shall we say … fans … of Objective C as a language. There was a lot of community interest in having another viable modern language to write iOS apps in, and to Apple's credit, they produced Swift, and even promised to open source it by the end of the year. And they delivered, in a deliberate, thoughtful way. (Did I mention that they use CommonMark? That's kind of awesome, too.)

One of my heroes, Miguel de Icaza, happens to have lots of life experience in open sourcing things that were not exactly open source to start with. He applauded the move, and even made a small change to his Mono project in tribute:

Which I also thought was kinda cool.

It surprises me that anyone could ever object to the mere presence of a code of conduct. But some people do.

  • A weak Code of Conduct is a placebo label saying a conference is safe, without actually ensuring it’s safe.

  • Absence of a Code of Conduct does not mean that the organizers will provide an unsafe conference.

  • Creating safety is not the same as creating a feeling of safety.

  • Things organizers can do to make events safer: Restructure parties to reduce unsafe intoxication-induced behavior; work with speakers in advance to minimize potentially offensive material; and provide very attentive, mindful customer service consistently through the attendee experience.

  • Creating a safe conference is more expensive than just publishing a Code of Conduct to the event, but has a better chance of making the event safe.

  • Safe conferences are the outcome of a deliberate design effort.

I have to say, I don't understand this at all. Even if you do believe these things, why would you say them out loud? What possible constructive outcome could result from you saying them? It's a textbook case of honesty not always being the best policy. If this is all you've got, just say nothing, or wave people off with platitudes, like politicians do. And if you're Jared Spool, notable and famous within your field, it's even worse – what does this say to everyone else working in your field?

Mr. Spool's central premise is this:

Creating safety is not the same as creating a feeling of safety.

Which, actually … isn't true, and runs counter to everything I know about empathy. If you've ever watched It's Not About the Nail, you'll understand that a feeling of safety is, in fact, what many people are looking for. It's not the whole story by any means, but it's a very important starting point. An anchor.

People understand you cannot possibly protect them from every single possible negative outcome at a conference. That's absurd. But they also want to hear you stand up for them, and say out loud that, yes, these are the things we believe in. This is what we know to be true. Here is how we will look out for each other.

I also had a direct flashback to Deborah Tannen's groundbreaking You Just Don't Understand, in which you learn that men are all about fixing the problem, so much so that they rush headlong into any remotely plausible solution, without stopping along the way to actually listen and appreciate the depth of the problem, which maybe … can't really even be fixed?

If women are often frustrated because men do not respond to their troubles by offering matching troubles, men are often frustrated because women do … he feels she is trying to take something away from him by denying the uniqueness of his experience … if women resent men's tendency to offer solutions to problems, men complain about women's refusal to take action to solve the problems they complain about.

Since many men see themselves as problem solvers, a complaint or a trouble is a challenge … Trying to solve a problem or fix a trouble focuses on the message level. But for most women who habitually report problems at work or in friendships, the message is not the main point … trouble talk is intended to reinforce rapport by sending the metamessage "We're the same; you're not alone."

Women are frustrated when they not only don’t get this reinforcement but, quite the opposite, feel distanced by the advice, which seems to send the metamessage "We’re not the same. You have the problems; I have the solutions."

Having children really underscored this point for me. The quickest way to turn a child's frustration into a screaming, explosive tantrum is to try to fix their problem for them. This is such a hard thing for engineers to wrap their heads around, particularly male engineers, because we are all about fixing the problems. That's what we do, right? That's why we exist? We fix problems?

I once wrote this in reply to an Imgur discussion topic about navigating an "emotionally charged sitation":

Oh, you want a master class in dealing with emotionally charged situations? Well, why didn't you just say so?

Have kids. Within a few years you will learn to be an expert in dealing with this kind of stuff, because what nobody tells you about having kids is that for the first ~5 years, they are constantly. freaking. the. f**k. out.

46 Reasons My Three Year Old Might Be Freaking Out

If this seems weird to you, or like some kind of made up exaggerated hilarious absurd brand of humor, oh trust me. It's not. Real talk. This is actually how it is.

In their defense, it's not their fault: they've never felt fear, anger, hunger, jealousy, love, or any of the dozen other incredibly complex emotions you and I deal with on a daily basis. So they learn. But along the way, there will be many many many manymanymanymany freakouts. And guess who's there to help them navigate said freakouts?

You are.

What works is surprisingly simple:

  • Be there.
  • Listen.
  • Empathize, hug, and echo back to them. Don't try to solve their problems! DO NOT DO IT! Paradoxically, this only makes it way worse if you do. Let them work through the problem on their own. They always will – and knowing someone trusts you enough to figure our your own problems is a major psychological boost.

You gotta lick your rats, man.

(protip: this works identically on adults and kids. Turns out most so-called adults aren't fully grown up. Who knew?)

I guess my point is that rats aren't so different from people. We all want the same thing. Comfort from someone who can tell us that the world is safe, the world is not out to get you, that bad things can (and might) happen to you but you'll still be OK because we will help you. We're all in this thing together, you're a human being much like myself and we love you.

That's why a visible, public code of conduct is a good idea, not only at an in-person conference, but also on a software project like Swift, or Mono. But programmers being programmers – because they spend all day every day mired in the crazy world of infinitely recursive rules from their OS, from their programming language, from their APIs, from their tools – are rules lawyers par excellence. Nobody on planet Earth is better at arguing to the death over a set of completely arbitrary, made up rules than the average programmer.

I knew in my heart of hearts that someone – and by someone I mean a programmer – would inevitably complain about the fact that Mono had added a code of conduct, another "unnecessary" ruleset. So I made a programmer joke.

This is the second time in as many days that I made what I thought was an obvious joke on Twitter that was interpreted seriously.

OK, maybe sometimes my Twitter jokes aren't very good. Well, you know, that's just, like … your opinion, man. I should probably switch from Twitter to Myspace or Ello or Google Plus or Snapchat or something.

But it bothered me that people, any people, would think I actually asked new hires to put the company above their family.* Or that I didn't believe in a code of conduct. I guess some of that comes from having ~200k followers; once your audience gets big enough, Poe's Law becomes inevitable?

Anyway, I wanted to say I'm sorry. And I'm particularly sorry that eevee, who wrote that awesome PHP is a Fractal of Bad Design article that I once riffed on, thought I was serious, or even worse, that my joke was in bad taste. Even though the negative article about Discourse eevee wrote did kinda hurt my feelings.

I know we have our differences, but if we as programmers can't come together through our collective shared horror over PHP, the Nickelback of programming languages, then clearly I have failed.

To show that I absolutely do believe in the value of a code of conduct, even as public statements of intent that we may not completely live up to, even if we've never had any incidents or problems that would require formal statements – I'm also adding a code of conduct as defined by contributor-covenant.org to the Discourse project. We're all in this open source thing together, you're a human being very much like us, and we vow to treat you with the same respect we'd want you to treat us. This should not be controversial. It should be common. And saying so matters.

If you maintain an open source project, I strongly urge you to consider formally adopting a code of conduct, too.

The hugging will continue until morale improves.

* That's only required of co-founders

[advertisement] Building out your tech team? Stack Overflow Careers helps you hire from the largest community for programmers on the planet. We built our site with developers like you in mind.
Discussion

The 2016 HTPC Build

I've loved many computers in my life, but the HTPC has always had a special place in my heart. It's the only always-on workhorse computer in our house, it is utterly silent, totally reliable, sips power, and it's at the center of our home entertainment, networking, storage, and gaming. This handy box does it all, 24/7.

I love this little machine to death; it's always been there for me and my family. The steady march of improvements in my HTPC build over the years lets me look back and see how far the old beige box PC has come in the decade I've been blogging:

2005~$1000512MB RAM, 1 CPU80w
2008~$5202GB RAM, 2 CPU45w
2011~$4204GB RAM, 2/4 CPU + GPU22w
2013~$3008GB RAM, 2/4 CPU + GPU×215w
2016~$3208GB RAM, 2/4 CPU + GPU×410w

As expected, the per-thread performance increase from 2013's Haswell CPU to 2016's Skylake CPU is modest – 20 percent at best, and that might be rounding up. About all you can do is slap more cores in there, to very limited benefit in most applications. The 6100T I chose is dual-core plus hyperthreading, which I consider the sweet spot, but there are some other Skylake 6000 series variants at the same 35w power envelope which offer true quad-core, or quad-core plus hyperthreading – and, inevitably, a slightly lower base clock rate. So it goes.

The real story is how idle power consumption was reduced another 33 percent. Here's what I measured with my trusty kill-a-watt:

  • 10w idle with display off
  • 11w idle with display on
  • 13w active standard netflix (720p?) movie playback
  • 14w multiple torrents, display off
  • 15w 1080p video playback in MPC-HC x64
  • 40w Lego Batman 3 high detail 720p gameplay
  • 56w Prime95 full CPU load + Rthdribl full GPU load

These are impressive numbers, much better than I expected. Maybe part of it is the latest Windows 10 update which supports the new Speed Shift technology in Skylake. Speed Shift hands over CPU clockspeed control to the CPU itself, so it can ramp its internal clock up and down dramatically faster than the OS could. A Skylake CPU, with the right OS support, gets up to speed and back to idle faster, resulting in better performance and less overall power draw.

Skylake's on-board HD 530 graphics is about twice as fast as the HD 4400 that it replaces. Haswell offered the first reasonable big screen gaming GPU on an Intel CPU, but only just. 720p was mostly attainable in older games with the HD 4400, but I sometimes had to drop to medium detail settings, or lower. Two generations on, with the HD 530, even recent games like GRID Autosport, Lego Jurassic Park and so on can now be played at 720p with high detail settings at consistently high framerates. It depends on the game, but a few can even be played at 1080p now with medium settings. I did have at least one saved benchmark result on the disk to compare with:

GRID 2, 1280×720, high detail defaults
MaxMinAvg
i3-4130T, Intel HD 4400 GPU322127
i3-6100T, Intel HD 530 GPU503239

Skylake is a legitimate gaming system on a chip, provided you are OK with 720p. It's tremendous fun to play Lego Batman 3 with my son.

At 720p using high detail settings, where there used to be many instances of notable slowdown, particularly in co-op, it now feels very smooth throughout. And since games are much cheaper on PC than consoles, particularly through Steam, we have access to a complete range of gaming options from new to old, from indie to mainstream – and an enormous, inexpensive back catalog.

Of course, this is still far from the performance you'd get out of a $300 video card or a $300 console. You'll never be able to play a cutting edge, high end game like GTA V or Witcher 3 on this HTPC box. But you may not need to. Steam in-home streaming has truly come into its own in the last year. I tried streaming Batman: Arkham Knight from my beefy home office computer to the HTPC at 1080p, and I was surprised to discover just how effortless it was – nor could I detect any visual artifacts or input latency.

It's super easy to set up – just have the Steam client running on both machines at a logged in Windows desktop (can't be on the lock screen), and press the Stream button on any game that you don't have installed locally. Be careful with WiFi when streaming high resolutions, obviously, but if you're on a wired network, I found the experience is nearly identical to playing the game locally. As long as the game has native console / controller support, like Arkham Knight and Fallout 4, streaming to the big screen works great. Try it! That's how Henry and I are going to play through Just Cause 3 this Tuesday and I can't wait.

As before in 2013, I only upgraded the guts of the system, so the incremental cost is low.

That's a total of $321 for this upgrade cycle, about the cost of a new Xbox One or PS4. The i3-6100T should be a bit cheaper; according to Intel it has the same list price as the i3-6100, but suffers from weak availability. The motherboard I chose is a little more expensive, too, perhaps because it includes extras like built in WiFi and M.2 support, although I'm not using either quite yet. You might be able to source a cheaper H170 motherboard than mine.

The rest of the system has not changed much since 2013:

Populate these items to taste, pick whatever drives and mini-ITX case you prefer, but definitely stick with the PicoPSU, because removing the large, traditional case power supply makes the setup both a) much more power efficient at low wattage, and b) much roomier inside the case and easier to install, upgrade, and maintain.

I also switched to Xbox One controllers, for no really good reason other than the Xbox 360 is getting more obsolete every month, and now that my beloved Rock Band 4 is available on next-gen systems, I'm trying to slowly evict the 360s from my house.

The Windows 10 wireless Xbox One adapter does have some perks. In addition to working with the newer and slightly nicer gamepads from the Xbox One, it supports an audio stream over each controller via the controller's headset connector. But really, for the purposes of Steam gaming, any USB controller will do.

While I've been over the moon in love with my HTPC for years, and I liked the Xbox 360, I have been thoroughly unimpressed with my newly purchased Xbox One. Both the new and old UIs are hard to use, it's quite slow relative to my very snappy HTPC, and it has a ton of useless features that I don't care about, like broadcast TV support. About all the Xbox One lets you do is sometimes play next gen games at 1080p without paying $200 or $300 for a fancy video card, and let's face it – the PS4 does that slightly better. If those same games are available on PC, you'll have a better experience streaming them from a gaming PC to either a cheap Steam streaming box, or a generalist HTPC like this one.

The Xbox One and PS4 are effectively plain old PCs, built on:

  • Intel Atom class (aka slow) AMD 8-core x86 CPU
  • 8 GB RAM
  • AMD Radeon 77xx / 78xx GPUs
  • cheap commodity 512GB or 1TB hard drives (not SSDs)

The golden age of x86 gaming is well upon us. That's why the future of PC gaming is looking brighter every day. We can see it coming true in the solid GPU and idle power improvements in Skylake, riding the inevitable wave of x86 becoming the dominant kind of (non mobile, anyway) gaming for the forseeable future.

[advertisement] At Stack Overflow, we help developers learn, share, and grow. Whether you’re looking for your next dream job or looking to build out your team, we've got your back.
Discussion