« Office Supply Recycling | Main | Microsoft's Corporate Network as Recruiting Device »

September 30, 2006

Software Engineering Goal: Expose Bugs Faster

Back in 1992 a Microsoft developer named David Thielen wrote a small book call No Bugs!: Delivering Error-Free Code in C and C++. It's an interesting book, although given the tools we have today, it mostly serves to make you glad you aren't working in the days of assembly-mode and primitive debuggers (if you go to Thielen's website, he has a pretty good presentation titled "Why Won't They Push Out Christmas? Delivering High Quality Software On Time and Under Budget". It's from around the year 2000, so it misses a lot of the recent progress in tools). In the book he makes this one great point, which is that you don't want bugs to "lurk" in the software; you want them to jump right out and hit you in the face immediately.

I was thinking of this because I happened to watch an Office bug triage meeting the other day. This is the whole-product triage that happens as the ship date nears, where every bug needs to be approved by the assembled experts before the fix can be checked in. At this stage, very few bugs are brought to triage that aren't completely understood with a tested fix ready to go; the only question is whether to take the fix. The reason NOT to take a fix is the hard-earned knowledge that every N fixes is going to introduce a new bug. For example, one of the bugs I saw discussed was basically, "We fixed this bug last week but it turned out it causes a worse bug somewhere else, so we just want to back the first fix out and pretend the whole thing never happened." Plus, each time you change the code for any reason you basically need to retest the whole product. For example if you fix a bug that relates to running Excel in a remote desktop on Vista when running as admin, you don't know if you have messed up Powerpoint or running Excel locally or running on Windows XP or running as non-admin. The upshort is that towards the end you don't accept fixes that would have been accepted with no discussion earlier on.

When I teach classes to developers at Microsoft, I often have people go around the room and imagine that they had a bug where the actual code change was trivial; let's say you had a typo in an error message and you needed to fix it. The question is, how long would it take for you to get that fix checked in to the source code control system? The actual code change is trivial, but most teams have a series of steps they go through: compiling the code, obviously (although I once managed to skip that step, which is another story), but also possibly code review, build on a second machine, sync to current source code, run unit tests, etc, etc. In the old days the answer was basically compile that module and check in, total time a couple of minutes. Nowadays most students surveyed report times much longer than a couple of minutes. But the thing is, two changes or three changes or ten changes don't take much longer than one change, because all the other stuff can be done once for all of them.

This leads me to posit Adam's Theory of the Future of Software Engineering, which is that the critical task in improving software engineering is decreasing the lurkability of bugs. Yes, there's work to be done in better redundancy and faster performance and stronger security and easier parallelism and better component isolation and even techniques like Agile. But if you look at a lot of what we teach--design reviews, code reviews, static code analysis, unit tests--it's all about making the bugs jump out sooner. And I don't just mean sooner as in the old rule about finding bugs during design is better than finding bugs during coding which is better than finding bugs during stabilization which is better than finding bugs after shipment. I mean it's better to find bugs yesterday than today, and even better to find them two days ago, and last week, and so on. Every day earlier that you find a bug lets you make better decisions about prioritizing fixes and lets you do more overlapping of all the other stuff you need to do before you can check in. And that is how we can dramatically improve the software engineering process.

Posted by AdamBa at September 30, 2006 09:07 PM

Trackback Pings

TrackBack URL for this entry:


Hi Adam

I totally agree! Every advance made in the SDLC is centred around finding bugs as early as possible (regardless of the methodology).

My blog title was originally titled "Get the bad news early!" but a colleague said it had negative connotations, but I now wish I'd left it as that!

Posted by: Mitch Wheat at October 1, 2006 09:55 PM