May 06, 2008
"Hard Code" KerfuffleAs I mentioned a little while ago, Eric Brechner's "Hard Code" column is now a public blog. This means you get a bit of a glimpse inside Microsoft in a public forum. Eric writes in his own inimitable style in order to provoke discussion, an area that he has certainly succeeded with his most recent column about recovering from errors.
First of all, ANY discussion of errors/exceptions/asserts/etc will generate controversy because it's an area where everybody seems to have an opinion on the right way to do it. Like all programming opinions, it's based in large part on the previous formative or traumatic experiences of the individual. Since we've all had different experiences we all have different opinions, and since we're programmers we have zero inclination to believe that differing opinions have any merit.
If you want to follow the argument along at home, it helps to know who the people involved are:
- Eric Brechner: aka I.M. Wright, Director of Learning and Development for Engineering Excellence, which means he owns the various different discipline excellence teams (as in Dev Excellence, Test Excellence, etc).
- Alan Page: Manages the Test Excellence team, reports to Eric.
- Alan Auerbach: Works on the Dev Excellence team.
- Larry Osterman: Way oldtime Microsoft employee and blogger.
- Kinshuman: (I assume it's the same guy) Works on Watson at Microsoft.
- Various other people: Don't know who they are.
There's also me, who manages the Dev Excellence team, thus reports to Eric and is Alan Auerbach's manager.
OK, so the fun started when Eric wrote his column saying that letting Watson catch exceptions was bad, instead you should handle them and crash. Larry blogged that this was a really stupid thing to say, and Kinshuman concurred in a comment. Alan Auerbach jumped in to defend Eric and also state that asserts are bad, Alan page replied and said that asserts are good, then Alan and Alan got into a brief back-and-forth on that.
Most of the arguments are of the ships-passing-in-the-night variety. Larry is saying it's bad to catch all exceptions; Eric is saying it's bad to let all exceptions through. These aren't contradictory positions. If you have spent a lot of your career working in an error-code-returning environment (like Larry, or Joel Spolsky, or Raymond Chen, or me) you probably have a natural bias against structured exceptions, but they are a fact of life in some environments (like .NET). But the more relevant fact here is that most people seem to have an argument pro or con exceptions that they deploy whenever they get a whiff of a discussion on the topic, and not much Socratic dialogue ensues.
Eric made a side comment about asserts in his article ("It's like the logic behind asserts—the moment you realize you are in a bad state, capture that state and abort") which misrepresents what asserts are for ("capturing the state" yes, "abort"--with the implication that it's similar to throwing an exception--no) although I think he threw that in there without really thinking about it. But it led to an interesting argument between Alan and Alan: are asserts good or not? I always liked asserts because I worked on networking code and an error might only occur once in a blue moon, so I wanted to break into the debugger when it happened; Alan (Auerbach's) assertion that you don't need asserts because you can set a breakpoint only holds true if you work on reproducible bugs, and I used to scoff at people like that--how hard can fixing your bugs be if every one of them repros on demand? But now that I think about it, relying on any kind of stress failure debugging to catch your errors is pretty outmoded. If I were writing a network protocol today, the first thing I would do is write a fake version of the layer below me that did odd things on demand, and next I would write a fake version of the layer above me that did odd things on demand, and then I would beat on my protocol with this in ever-more-interesting configurations. In such an environment all of my crashes *would* be reproducible and I could set breakpoints as needed. It's funny because I definitely thought of writing automated tools for stress (when I wrote my first NT network card driver in 1990 I also wrote a packet-blasting-and-counting protocol to help test it, which wound up becoming part of the network driver development kit) but never for causing the unexpected timing and dropped packets that lead to those hard-to-debug problems in protocols. I guess I have learned something in 20 years.
Posted by AdamBa at May 6, 2008 10:09 PM
TrackBack URL for this entry:
I think one of the issues is that Eric deliberately chooses to write his blog in a confrontational style. That is certainly one way to generate a reaction but in my experience confrontation doesn't foster constructive dialog. If you are going to go down the confrontational route then your argument better be 100% waterproof because if it isn't people aren't going to cut you any slack. If they find a hole it's all too easy to be confrontational back and the whole thing spirals.
In this particular case Eric erred by not discussing exceptions that are truly unexpected (i.e. all you know is that your app is in an inconsistent state). At this point human nature takes over and whatever good information was in the original post gets lost in the ensuing dust-up.
Posted by: Andrew at May 7, 2008 12:20 PM
I am interested in the strategy that you outlined for how you would do protocol development today using fakes. Is this similar to the Test First Development practices? Do you have some sample code or maybe a new book that explains this in greater detail?
Posted by: Sudarshan at May 7, 2008 03:55 PM
Andrew: Eric, when writing as I.M. Wright, intentionally writes in a confrontational style. In person he is exactly the opposite. He did say he felt a little bad about the possible effect of the column: not that people questioned the size of his brainpan, but the fact that somebody might misinterpret it to mean throw a big try/catch of Exception around all their code.
Sudarshan: No sample or book; it's just the general idea of making your code testable no matter what it does. I would write a loopback network card driver that intentionally dropped packets, duplicated packets, misordered delivery, delayed completion of IRPs, etc under control of my tests (so it would misbehave in a consistent reproducible way). Then I would write a test client to my protocol that sent lots of data back and forth--there is less expectation that a well-behaved client would do "incorrect" things, but it could do 0-length sends, max-length sends, highly fragmented sends, etc again under control of the test, and do similar things on the receiving end. Then I would create a bunch of tests that went through various combos of the two test pieces, and ensure that the protocol always passed all those tests.
Posted by: Adam Barr at May 7, 2008 09:38 PM
Asserts are not supposed to be used for handling of input, data, user or device errors. Exceptions should handle these.
Assert are supposed to be placed to detect appearance of impossible (internal/logic error) conditions. If the assert() doesn't call the debugger in that state, then what is the point of using asserts at all?
I'm amazed to read that there are developers in MS that think a breakpoint could replace assert().
The problem with breakpoints is that once debugging session is over, they will vanish into the void. If the code is changed after that, the developer may miss newly introduced bug, because the breakpoint that could have detected it is not there.
The stress debugging can't replace asserts(). Actually having full set of asserts() while doing stress debugging could be enormously effective. If you want to catch every bug.
If you just want to bring your code to "working" state and ship it... then I guess asserts are just nuisance.
The funny thing is that I learned how to properly use asserts from 'Writing Solid Code' by Steve Maguire (1994, Microsoft Press).
Posted by: Ivan at May 17, 2008 03:48 PM
After reading the original blog, I see that the weight is more on the release/enduser case scenario. Usually asserts are no-op in this case. The blog really rises interesting question. But gives wrong answers.
There is no single correct behavior that would cover all cases.
The right answer is that if program detects internal error condition, it should ask the user what to do.
Abort, Retry Fail. "Send Report" as new option would be improvement.
I think that in the old days Win3.1 had something similar.
The user may have valuable data that he must save. He should know that the data may be corrupt, so he'll save it to another file to recover the important parts later.
Of course in scenarios where we don't have user (aka servers) should behave differently. But that area seems already well developed.
Posted by: Ivan at May 17, 2008 05:28 PM
Using a pseudonym does not somehow mean that different rules apply to the standards of argument. It's quite simple, I.M. Wright == Eric Brechner, period. If Eric wants to write in a confrontational style then that's his choice, but those are his words and his ideas, he doesn't somehow get a pass if he is offensive or something doesn't work out how he intended.
There are plenty of confrontational opinion writers in the world, just open any newspaper or current affairs magazine. Unlike Eric they don't try to hide behind a made-up name, they leave that particular rationalization technique to 3 year olds.
Posted by: Andrew at May 19, 2008 03:04 PM