« Houses of Dreams | Main | "The Sound of Music" in Bellevue »

August 31, 2007

"Knowing" a Language

While obsessively trolling the Web for mentions of Find the Bug, I came upon this post from a fan of python, which is one of the 5 languages I used in the book.

He discusses one of the Python problems from the book where the code splits a buffer into words and then sorts the result, but does it in a low-level way: by walking through the buffer one character at a time to split the words and walking through the array one element at a time to find where to insert the word in its sorted spot. He is slightly annoyed that I don't simply call the split() function to split the buffer, and the sort() function to sort the array--although he does point out that it wouldn't make sense for me to do it this way since there is so little potential for bugs in the 3 lines of code (I have sometimes seen people discussing Microsoft interview questions on bulletin boards claiming that if somebody asked them to sort an array in C, they would call the qsort() library function, and if the person asked them to write it out themselves, they would refuse because that wasn't what a real programmer would do. Although I never actually had an interview candidate who refused, on those grounds, to write out the implementation of strlen() or itoa() or various other standard library functions we ask them to write).

I find this interesting because it involves a trend I have noticed: more and more, it seems that "knowing" a language involves the set of functions that are built-in, as opposed to knowing the syntax and semantics of the language. Meaning: for me, "knowing" Python means you understand how slices work, the difference between a list and a tuple, the syntax for defining a dictionary, that indenting thing you do for blocks, and all that. It's not about knowing that there is a sort() function. I suppose the basic thing is that if you know the syntax stuff you can read code in the language and understand it; if you see a function you don't recognize, you can look it up in the documentation. Whereas if you don't know the syntax, you'll just be lost.

Of course as you write more code in Python you will start to learn more of the built-in functions, and this would be expected of someone calling themselves an experienced Python programmer; even in the space of Find the Bug, as you work through the Python chapter I eventually mention index(), ord(), find(), count(), pop(), and a few others. But I wouldn't call that a part of knowing the language.

Jonathan Ellis, the poster I linked to, says I am writing Python that looks like C, and then says "Actually it's painful to read any language written at such a low level of expressivity, which is why I prefer not to use languages that really can't do any better". Of course, a language like C could do better; if you write managed C++, which is "like" C, you can call all the built-in functions provided by the .NET Base Class Library. But again, I wouldn't call knowing all that part of knowing C++; it's extra information you pick up over time (I actually do think Python is more expressive than C#, but it's because of things like slices and dictionaries, not because it has split() and sort()).

Nonetheless people do seem to associate knowing a language with knowing its standard calls. Even C, I suppose, has a couple of functions in the standard library that are basically part of "knowing" the language (printf(), in particular, and malloc() and free()). That's just because C is so stripped down that it includes none of this as built-ins (and of course you can write C programs that don't allocate memory with malloc() and don't print with printf()). I think perhaps what this really shows is that languages are becoming more similar in what they basically can do, so their built-in libraries become more of a differentiator. There's a basic jump up from a language like C to a modern language like Python that has foreach-type loops and automatic array handling, but once you make that leap it's mostly frosting.

Posted by AdamBa at August 31, 2007 10:43 PM

Trackback Pings

TrackBack URL for this entry:
http://proudlyserving.com/cgi-bin/mt-tb.cgi/610

Comments

If you look at a language like Smalltalk, you will see that there's very little to the syntax (even less than LISP, i'd say), so knowing Smalltalk really amounts to knowing its class library.

Posted by: Duncan at August 31, 2007 11:37 PM

I'm guessing that you've taken an extreme position to make your point.
One certainly does need to know how to write, (http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html), idiomatic Python to call oneself a good Python programmer, but one also needs more than a basic knowledge of the standard library and the methods of built-in objects. (And knowledge of doctest :-)

- Paddy.

Posted by: at September 1, 2007 03:21 AM

I'm guessing that you've taken an extreme position to make your point.
One certainly does need to know how to write idiomatic Python to call oneself a good Python programmer, but one also needs more than a basic knowledge of the standard library and the methods of built-in objects. (And knowledge of doctest :-)


- Paddy

Posted by: Paddy3118 at September 1, 2007 03:22 AM

Well, I suppose what I am partly saying is that my sense of "knowing" a language is out of whack with actual reality; these languages have such useful standard libraries that they become part of the language. This is helped by the way they support arrays and automatic memory allocation, which allows a lot of standardization in the libraries.

Arguably it's also because few people are using them to write software that is as complicated as that which is written in C (so the built-ins are enough to handle the 90% case, which is certainly not true for what C is used for), but if I said that I would get jumped on by fans of BitTorrent, or Movable Type for that matter.

- adam

Posted by: Adam Barr at September 2, 2007 11:09 AM

Hmmm. Pulling my 35+yo signed copy of K&R 'The C Programming Language' off the shelf it is interesting to note that by Chapter 2 of that very small tome the authors are already into utilizing the standard libraries. So I don't think this has been a trend of the last 10 years. More its an intrinsic desire to get to higher level modules quickly since the beginning.

My final observation is that most 'hiring managers' really screw things up thinking of programming skills in terms of how good someone is at C#. The bigger question is how well can a candidate grasp the problem and its scope.

Posted by: JohnMc at September 4, 2007 04:19 AM

In a language such as Forth or Lisp, there is very little syntax to be learned. Thus, "learning the language" consists mostly of learning which method calls do what and how to use them. The latter can involve some subtleties, however; programming with stack- or list-based data structures is quite different than, say, programming with arrays, lists, or hashes.

In moving to Ruby (from Perl), I found that conceptual issues (e.g., iterator design and use, method aliases, symbols) were the most challenging. In general, I knew that there "should be" calls to perform certain tasks; I simply needed to find out what their names were in Ruby, etc.

In any case, I would say that debugging examples should use the idioms of the language in question. This avoids reactions like "the author clearly doesn't understand the language, so why should I believe he knows about debugging", but that's only a minor reason. The major reason is that, in a high-level language, the language constructs and idioms allow code to be far shorter and cleaner, reducing the opportunity for bugs to arise. So, a book on debugging should concentrate on the kinds of bugs that _can_ arise in that language.

Posted by: Rich Morin at September 4, 2007 07:52 AM

I don't think I really see your point, Adam.

If you care to learn Scheme, for example, you can learn literally *all* of its built-in syntax in about an hour or so. Does this mean you "know" Scheme, even though you couldn't produce anything particularly useful in it?

Oh, and by the way, I suggest you add rel="nofollow" automatically, to try and stem the tidal wave of spam. See:
http://googleblog.blogspot.com/2005/01/preventing-comment-spam.html

Posted by: Peter Crabtree at September 4, 2007 08:09 AM

I think its more of an issue of new school vs. old school.

The massive draw of python for me is that it HAS so many optimized built-ins. I can prototype my code in python within minutes, have the general idea working, and then re-write it in a more clunky language across multiple files (for those now-non existing libraries).

There is definitely something to be said about each programmers personal preference - legends of the hardcore c guru's having personal libraries in the millions of lines of code. Personally? I'll stick with small python/ruby tools to generate what I need when I need it.

I do agree with your notion of an 'experience' language specific developer though - knowing both the syntax, list/tuple/dict differences (and how to create them without the calls), and knowing common built-in functions off the top of ones head.

Posted by: risomt at September 4, 2007 09:24 AM

I think part of this is actually due to Python itself. The more I learn Python, the more important it is to understand the subtleties of the language. You can get by ok with for loops and the like, but the real power happens with list comprehension and advance usage of the core types.

For example, Python will never need a "switch" statement because you can use a dictionary. When I first heard that it flew right over my head. A dictionary works well as a replacement because you can use a function as its value. This is important because functions are first class citizens, similar to javascript. When you throw in the concept of callables (objects implementing the __call__ method), it can be even more powerful.

I don't know if this sort of thing is entirely unique to Python. C# has some similar language constructs that can be very helpful beyond a "library". I seem to remember seeing a few helpful nuggets in PHP as well.

I suppose I should mention that I didn't get hired for a PHP job because I didn't know about the ksort function, so it is probably best to cover the language and the libraries if at all possible ;)

Posted by: Eric at September 4, 2007 09:41 AM

"A man should keep his little brain attic stocked with all the furniture that he is likely to use, and the rest he can put away in the lumber-room of his library, where he can get it if he wants it." -- Sherlock Holmes

Posted by: Stephen at September 4, 2007 01:43 PM

"While obsessively trolling the Web for mentions of Find the Bug"

I think you mean trawling. (Or better yet, "searching.") Trolling is something else entirely, especially in the context of the internet. :) (And even in the technical sense: to troll means to get at something by baiting it to follow or come to you (like one would with bait); trawling is the opposite method of sweeping an area searching for something and collecting it directly (like one would with a net).

I know this comment is loathesome, but interchanging trolling/trawling is a pet peeve of mine.

Posted by: T. Rawler at September 4, 2007 02:16 PM

Woosh, comment frenzy! Here I go:

JohnMc: Yes, in K+R they use the standard libraries, and today that is basically irrelevant for knowing C. I think the standard library for Python will remain more relevant.

Rich: Possibly I could have done some examples in Python that were 30-40 lines of dense Python. But I think those would have been too hard to find. And the point of the book is that the debugging skills you need are language-independent. It wasn't a "Python gotchas" kind of book.

Peter: Well, that's my point. Things have changed. As for nofollow, I probably should add it, although I confess I don't see how it actually avoids spam (do spammers check for nofollow before spamming?)

Eric: We agree (as for not being hired because you didn't know a specific fact, that is generally not the way to hire long-term, but I suppose if it was a contract thing, and knowing ksort would be expected, it could be viewed as a sign you didn't know it as well as you claimed).

Stephen: Right! I think Einstein said don't remember something you can look up.

T. Rawler: Apologies for choosing wrong among troll and trawl. I should of known the difference. I'll try to make less mistakes of that sort in the future. Seriously, I am also prone to language policing, so thank you for pointing that out.

- adam

Posted by: Adam Barr at September 4, 2007 09:02 PM

Could not agree less.

Posted by: hi at September 4, 2007 09:45 PM

1) Some Slashdot commenter had what I think is the best phrasing of this issue: "Libraries are the new languages." 2) "Trolling", in its literal sense, is dragging a bait or lure behind a moving boat. The word is derived from "trawling" because they're similar forms of fishing, with different equipment.

Posted by: JSinger at September 6, 2007 09:08 AM