Failures count more than Successes

Stuart Cheshire, August 1996.

User-experience is defined by the times when a computer doesn't work, not by the times when it does.

When a device is functioning correctly, the details of it's operation are almost invisible to the user, and that is the way it should be. It's only when it goes wrong that the device forces itself strongly into the user's awareness.

This means that the quality of the software and user interface for handling failures is just as important, if not more important than the code for the normal (successful) case.

Most software authors concentrate on the behaviour of the program when it is working properly, because the software is supposed to perform some task, and the programmer is concerned with making it perform that task quickly, efficiently, elegantly. What happens when something goes wrong (like the disk fills up) is rarely a priority. Once something has gone wrong we're looking at a case where, in a sense, the program has already failed. It's not going to be able to complete it's task now (for no fault of it's own) so the exact manner in which it fails hardly matters.

The programmer is still busy working on improving the performance and getting the bugs out of the correctly-functioning cases. Why devote time to working on a case where we know there's no way for the program to finish the task? In computer game terms, that case is already "game over", and no amount of programming work is going to change that and seize victory out of defeat. In an ideal world, that case should never happen anyway, so it would be stupid to waste time working on it, wouldn't it? What really matters is all that elegant efficient code that's executing in the common case.

The problem is, the user doesn't see it that way. In normal use, the computer is correctly executing millions of instructions every second. Disks seek, interrupts interrupt, network packets fly across the world, and progress continues smoothly. The user is no more impressed by all these little successes than they are impressed every time a spark plug in their car engine fires correctly. The user is not even aware of of all these little successes. They are completely invisible, which is the way it should be.

The only time a typical motorist takes a good detailed look at their car engine is when it's not working, and it's the same with computer software. When something goes wrong is precisely the time when a piece of software leaps up from invisibleness in the background and forces the user to pay close attention to it. That's the moment when the software is under the closest scrutiny, and it's usually the part of the user interface where least effort has been spent.

One image that appears in my mind is that completing a real world task is kind of like having to get from one side of a swamp to the other. Lots of hazzards and pitfalls lurk in the swamp that separates us from our goal. Software is the bridge that gets us from where we are to where we want to be.

A lot of software is like a six-inch wide polished chrome beam across the swamp. It snakes smoothly across the swamp, taking the shortest possible path around the rocks and trees and other obstacles, gleaming impressively in the sun. It has no direction signs or crash barriers to marr its elegant simplicity or distract the user as they speed across on their motorbike.

The author of the software proudly demonstrates how, on his motorbike with its 2.5 litre engine, he can cross the swamp in 17 seconds. That's great. All of the software has been developed and tested to achieve that goal quickly, efficiently, elegantly.

The problem is, when a new user first gets hold of this software they make some mistake. They type some incorrect command, or they fail to configure some setting correctly before executing some other command, and they miss a turn and fail to keep the motorcycle on the beam. The beautiful chrome beam is still gleaming impressively in the sun, but the user can't even see it because they're lying face down in the mud. The fact that they could have completed the task in 17 seconds is little consolation if it takes them several hours to get out of the mud. Sure, after a few times the user might learn to make every turn perfectly, but the first few times they use the software they have a very unpleasant experience with it.

We need a bridge across the swamp that's a little bit wider, and has crash barriers so that even when you make a mistake you are guided back onto the right path, instead of being allowed to plunge face-first into the swamp.

We recently had an experience like this setting up an ISDN bridge. After several hours of trouble-shooting on the telephone with SUN's network administrators we finally got it working. It turned out that there had been five or six different things that were wrong, but for every one the message was the same: "Connection Failed". First, the phone number it was programmed to dial was wrong. SUN's network administrators could see that there was no call coming in, but all we saw was "Connection Failed". After we corrected the phone number SUN's network administrators could see that there was now a call coming in, but all we saw was "Connection Failed". SUN's network administrators discovered that they had made a typo in the list of usernames at their end, so they corrected that. Now they could see that a call coming in and the username was being recognised, but all we saw was "Connection Failed". This went on for several hours until each individual problem had been fixed, and finally we were able to connect.

At every stage the ISDN bridge told us only that it wasn't working (which we could tell ourselves pretty easily anyway). It didn't say why it wasn't working. It didn't say what parts of the connection process had worked correctly. It didn't tell us what we might do to fix the problem. We couldn't even tell if the ISDN line that Pacific Bell had installed was connected properly, because the ISDN bridge didn't give any indication of whether or not it was detecting the ISDN equivalent of a "dial tone".

Now it's finally working I'm sure it will continue to work fine and we'll not give it a second thought, but those hours struggling to get it set up were a nightmare. There's no way we could have done it without outside help, and we're networking experts.

All this is not the programmers fault alone. Programmers implement what their written specifications say they should implement, and software specifications always go into great detail about what the software is supposed to do, and rarely make any mention of how it should fail. Failures are regarded as, well, failures, so what more is there to say about them, except that they shouldn't happen?

Well, failures do happen, and how they are handled may be the most important aspect of defining the quality of a human being's interaction with a computer.

Page maintained by Stuart Cheshire
(Check out my latest construction project: Swimming pool by Swan Pools)