« Workspace Advantage: We've Got Swatches! | Main | The DST Fixup »

March 17, 2007

Engineering for Software Performance

If you dig around Microsoft for guidance on how to make your software have good performance ("performance" in this case meaning not "what it's supposed to do" but rather "it runs fast"), you will probably find some advice along the lines of "performance is not something you can leave until the end; performance is something you need to work on all throughout the life of your product." This seems like self-evident advice: if you delay worrying about performance until (say) you are integrating and preparing to ship, it may have drifted out of control and you won't have enough time to improve it before you need to ship. The comparison is made to security, where a design mistake can be just as painful as a coding mistake, and it took Microsoft a while to appreciate the (correct) assertion that "security is not something you can leave until the end; security is something you need to work on all throughout the life of your product."

The problem is when I think about my programming past, this really hasn't been the way it has been; I have pretty much ignored performance until the end, or more particularly I've ignored performance until a user noticed it was so bad that we had to fix it. The user could have been me, or someone else on the team, or someone in a beta...but the point is I didn't have to dig into the details of design to see ahead of time where the potential performance problems were going to be, the way you have to do for security. If something was so bad a user noticed it, then I fixed it, and the fix was always some local thing where I had to finetune the particular piece of code that was slower than it needed to be--nothing that destabilized all my work and made me wish I had analyzed the design in enough detail to perceive the performance problem before I started writing code.

Performance is not like security, where your system is only as secure as the weakest link on the chain; if some code that rarely executes is slow, it probably doesn't matter. Someone was telling me that at some point during the early work on Word 2007, they discovered they were writing a value to the registry on every keystroke. That sounds really bad, and it was, which is why they noticed...but the fix was nothing too fancy, just stop doing it.

On the other hand, I have seen LOTS of bugs in code that was attempting to optimize the speed of something that didn't need to be optimized, because it wasn't something a user was ever going to care about. Implementing a cache of some sort seems to be the most common mistake, I suppose because caches are the kind of thing you learn about in college and rush out to implement when you get a job. Overly complicated algorithms, like trying to get your sort down from O(n^2) to O(n log n) when all you ever sort is 10 items, is another one, for similar reasons. This has led some people to say that premature optimization is the root of all evil, a sentiment that I agree with more than I disagree with.

Microsoft has now realized that performance tuning needs to be scenario based, that is you have to actually think of something a user would actually do and then decide if the performance is going to be good enough. The problem, of course, is that you can't REALLY tell how long something is going to take until you run it, so this perforce does lead to optimization being done late in the game. And I really don't mind that; I have seen more bugs due to premature optimization than I have seen...well, I can't quite figure out how to parallel-and-reverse that previous sentence fragment, but you get the idea.

There have been times (particularly when I was working on Windows NT networking and trying to win performance tests against Novell's Netware) where we knew we had to absolutely be as fast as possible, and before we had any test data we would sit down with the assembly language that the compiler had produced and try to save clock ticks by having our if statements fall through more often than they jumped. But that is a pretty unusual case where we basically knew the exact user scenario that was going to matter, and also the code that was going to be executed the most.

There is something else interesting that is going on with computers, especially big servers. It turns out that memory usage is actually much more of a performance issue, especially as you scale out, than raw speed is. A server with thousands of connections is probably only doing active processing on a few of them, but it is holding the memory for all of them. And if you use up too much memory you wind up paging to disk, which typically dwarfs all of the goodness or badness in your algorithm. I once heard a story that the Microsoft C compiler used to have switches for "optimize for speed" (which would do things like unroll short loops) or "optimize for size" (which would try to squeeze the generated code into fewer bytes) but it was discovered that optimizing for size actually made things run faster also, because they used less memory. And, thinking about it, memory usage is a) something you probably can get a better handle on in design and have a bigger effect on during design, b) something where if you have to make changes late in the game they might be very disruptive, and c) something, like security, where every little bit helps. So if I had to give someone guidance on how to optimize for quote-unquote performance during the design phase, I would tell them to worry about optimizing their memory usage during the design phase; they could worry about execution speed once the thing was running and they could put it in front of a customer (the one case where you would want to think about speed during design was if you had a public API that had baked in some bad speed characteristics; the old DOS way of reading files in a directory, with its "find first/find next" type API, seems like one such case).

The other big piece of advice is that you do want to be able to figure out how your code is doing, so make sure your design allows you to measure the performance, whatever it is. This may just be baking in some stuff that can count how many times a method runs, and how much memory is used...which are probably things the compiler/IDE can help you with later no matter what you do in upfront design. But just in case, keep it in mind.

I also once heard someone say that when you design an algorithm, you should have in mind a way to improve it IF it turns out to be the performance bottleneck. So I'm fine with that, as long as it's just a quick thought and the IF is understood. Yes, if my sort is the problem, we can replace it with a better sort...but chances are it won't be, and I'll have saved time. As the saying goes, never put off until tomorrow what you can put off until the day after tomorrow.

Posted by AdamBa at March 17, 2007 08:58 PM

Trackback Pings

TrackBack URL for this entry:


Your example of the 'we're only ever going to sort just 10 items' gets people into so much trouble. The difference between O(N^2) and O(NlogN) can be so huge to be termed a deadlock rather than slower performance. It's why Bubble Sort is still taught in college. Please, please don't say its ok to EVER use a ON^2) sort algorithm.

Did you know that back in the early C libs that came with whichever version of Visual C shipped in the windows 3.11 days (1.0 ?), the qsort() function had a bit of preamble code that first checked if the array was already in ascending order, so that the pathlogical O(N^2) behaviour never occurred! Unfortunately, it didn't check for already sorted in descending order...

I do agree that programmers often introduce subtle bugs when trying to perform 'cute' micro optimisations. There are a few classic stories such as programmers trying to write their own random no. generators by 'improving' existing ones

Rules of Optimisation (first 2, paraphrased from Knuth, who I believe quoted someone else):

1) Don't.
2) Don't do it yet.
3) Benchmark first.
4) Optimise algorithms first, optimise implementation code as a last resort.

Posted by: Mitch Wheat at March 18, 2007 06:50 AM

I disagree: Most of the time, it's perfectly fine to use an O(n^2) sort algorithm. Usually you care more about not having bugs than how fast it runs. You have to base it on a benchmark, as Knuth said.

Sure you can take any advice and apply it in the wrong situation, but if I could change something about "speed" optimizations (as opposed to "memory usage" optimizations) I would rather see people do less of it than they do now, not more of it.

Of course sorting has been "solved" in that you can call a library routine, or copy a decent algorithm out of a textbook, but I would repeat that statement for any algorithm.

- adam

Posted by: Adam Barr at March 18, 2007 08:16 AM

While I agree that premature optimization is a reliable source of bugs I actually think that premature generalization is the root of all evil. You can find a bottleneck O(n^2) sort algorithm and fix it with a better algorithm quite easily (or use a priority queue or whatever) but it's much harder to reduce the number of layers of abstraction in an application.

Getting the abstraction model right is why you have to think about performance throughout the design cycle. Unfortunately most discussions about abstraction models tend to revolve around implementation time versus future flexibility, which is a shame because if you want to reduce memory footprint of an app then cutting out a layer of abstraction is often the way to go.

I work in the realtime embedded world and while we don't have to deal with paged virtual memory we still spend significant time in design reviews figuring out the right level of abstraction to ensure our apps can meet their realtime deadlines while still giving us some future expansion flexibility. Finding the right balance is tricky and time consuming, but essential.

Posted by: Andrew at March 19, 2007 09:53 AM

The Microsoft C++ compiler still has optimise-for-size versus optimise-for-speed switches as of VS2005, unless they're just there for compatibility and don't do anything any more. /Os versus /Ot, best used in combination with /Ox (full optimisation). /O1 enables /Ox /Os and a few other things too.

The key point is that for small values of n, the constant terms can end up dominating your execution time, and therefore if you know you have a small amount of data you want the algorithm with the smallest constants, not necessarily the smallest 'big O'.

Posted by: Mike Dimmick at March 19, 2007 12:14 PM