May 06, 2008

"Hard Code" Kerfuffle

As I mentioned a little while ago, Eric Brechner's "Hard Code" column is now a public blog. This means you get a bit of a glimpse inside Microsoft in a public forum. Eric writes in his own inimitable style in order to provoke discussion, an area that he has certainly succeeded with his most recent column about recovering from errors.

First of all, ANY discussion of errors/exceptions/asserts/etc will generate controversy because it's an area where everybody seems to have an opinion on the right way to do it. Like all programming opinions, it's based in large part on the previous formative or traumatic experiences of the individual. Since we've all had different experiences we all have different opinions, and since we're programmers we have zero inclination to believe that differing opinions have any merit.

If you want to follow the argument along at home, it helps to know who the people involved are:

  • Eric Brechner: aka I.M. Wright, Director of Learning and Development for Engineering Excellence, which means he owns the various different discipline excellence teams (as in Dev Excellence, Test Excellence, etc).
  • Alan Page: Manages the Test Excellence team, reports to Eric.
  • Alan Auerbach: Works on the Dev Excellence team.
  • Larry Osterman: Way oldtime Microsoft employee and blogger.
  • Kinshuman: (I assume it's the same guy) Works on Watson at Microsoft.
  • Various other people: Don't know who they are.

There's also me, who manages the Dev Excellence team, thus reports to Eric and is Alan Auerbach's manager.

OK, so the fun started when Eric wrote his column saying that letting Watson catch exceptions was bad, instead you should handle them and crash. Larry blogged that this was a really stupid thing to say, and Kinshuman concurred in a comment. Alan Auerbach jumped in to defend Eric and also state that asserts are bad, Alan page replied and said that asserts are good, then Alan and Alan got into a brief back-and-forth on that.

Most of the arguments are of the ships-passing-in-the-night variety. Larry is saying it's bad to catch all exceptions; Eric is saying it's bad to let all exceptions through. These aren't contradictory positions. If you have spent a lot of your career working in an error-code-returning environment (like Larry, or Joel Spolsky, or Raymond Chen, or me) you probably have a natural bias against structured exceptions, but they are a fact of life in some environments (like .NET). But the more relevant fact here is that most people seem to have an argument pro or con exceptions that they deploy whenever they get a whiff of a discussion on the topic, and not much Socratic dialogue ensues.

Eric made a side comment about asserts in his article ("It's like the logic behind asserts—the moment you realize you are in a bad state, capture that state and abort") which misrepresents what asserts are for ("capturing the state" yes, "abort"--with the implication that it's similar to throwing an exception--no) although I think he threw that in there without really thinking about it. But it led to an interesting argument between Alan and Alan: are asserts good or not? I always liked asserts because I worked on networking code and an error might only occur once in a blue moon, so I wanted to break into the debugger when it happened; Alan (Auerbach's) assertion that you don't need asserts because you can set a breakpoint only holds true if you work on reproducible bugs, and I used to scoff at people like that--how hard can fixing your bugs be if every one of them repros on demand? But now that I think about it, relying on any kind of stress failure debugging to catch your errors is pretty outmoded. If I were writing a network protocol today, the first thing I would do is write a fake version of the layer below me that did odd things on demand, and next I would write a fake version of the layer above me that did odd things on demand, and then I would beat on my protocol with this in ever-more-interesting configurations. In such an environment all of my crashes *would* be reproducible and I could set breakpoints as needed. It's funny because I definitely thought of writing automated tools for stress (when I wrote my first NT network card driver in 1990 I also wrote a packet-blasting-and-counting protocol to help test it, which wound up becoming part of the network driver development kit) but never for causing the unexpected timing and dropped packets that lead to those hard-to-debug problems in protocols. I guess I have learned something in 20 years.

Posted by AdamBa at 10:09 PM | permalink | Comments (3) | TrackBack (0)

April 29, 2008

The Myth of the Lone Programmer

There was a recent article on Slashdot with the salacious title Donald Knuth Rips On Unit Tests and More, implying that the Lord of Algorithms thought unit tests were a waste of time. If you read the actual interview, you will see that Knuth is simply saying that in his own experience writing code, he has never found much use for unit tests.

My interpretation of this is that Knuth has never had to write a particularly large piece of software where he was not the only author. Because that is where unit tests can really help: preventing somebody from accidentally breaking somebody else's code when they make a change. If you are the only author, you may have a good enough idea of the code in your head to prevent such errors (although it would be somewhat conceited to think you would never do it accidentally).

Knuth's idea on how people should write code is an idea he came up with called literate programming, which involves creating a document that describes the code and also can have the source code extracted from it. Knuth introduced literate programming in a language called "WEB", which he used to write TeX (my father says that Knuth chose the term "web" because back in 1981 it was a three-letter word that had no existing computer-related meaning). Literate programming is not a bad idea, but the notion that it would solve modern programming problems is simplistic--unless you are only thinking about software written by one person (as an aside, there was somebody, I don't know who, who used to work in Engineering Excellence and misinterpreted "literate programming" to mean embedding comments in your source code that could be extracted by a tool to produce documentation--what programs like Doxygen and Sandcastle do. So in our slides there are various incorrect uses of the term, every time we talk about documentation generators).

In my class this week I happened to have somebody who worked with me at Softimage back in 1996, and we were reminiscing about some of the programmers who worked there. In particular, a couple of the key architects on the product were "lone wolf" type programmers who spent their time cranking out code and didn't interact with people much, except possibly to point out things they were doing wrong. This led to various zany designs and rules that didn't make much sense, but which it was a losing proposition to try to argue against (the most notorious of these was the stipulation that include files in C++ sources had to be alphabetized by filename; it's a strange enough rule that it's hard to think of a real-world analogy, but you might consider a university where you were required to take all your courses in alphabetical order by course name).

You still hear people gooing and gahing over programmers like that: "they don't talk to anybody, but they sure write a lot of awesome code!" In prior eras at Microsoft such programmers were celebrated, and then we went through a period where they were tolerated, but now I think the pendulum has swung, correctly, to view such people as negatives on a team.

Looking back we probably should have cashiered those guys out of Softimage, but from the inside it always seems like it's worth sticking with them in the short term because the temporary pain of losing them would be so great...but it's not worth it. Hatching a giant code egg all by yourself, even if it is "good" code (and it very likely isn't), just isn't a key part of being a software architect. The job is really all about helping others succeed, growing the skills of the team, representing your team in interactions with other teams, and so on. Sitting in your office with the door closed accomplished none of that. Arguably it would be a case of "what got you here won't get you there", except that in retrospect it shouldn't even have gotten them here.

Posted by AdamBa at 09:46 PM | permalink | Comments (8) | TrackBack (0)