« Microsoft's Annual Giving Campaign | Main | La Poutine de Chez Nous »

October 30, 2006

Level 1 Evaluations

We Engineering Excellencers were in some training last week and one of the topics that came up was the effectiveness of Level 1 evaluations.

To backtrack a bit after that snappy opener: in the world of Human Performance Technology, when you deliver a solution of some sort--be it to the knowledge, tools, motivation, etc. of the group you are helping--you will usually evaluate the results. There is a taxonomy of evaluations that has developed:

  • Level 1: Reaction
  • Level 2: Learning
  • Level 3: Behavior
  • Level 4: Results

Level 1 is the form they give you at the end of a class, where you rate the course and the instructor; level 2 is giving you a test at the end of the class, to see what skills you have acquired; level 3 is determining if you have changed how you perform on the job as a result; and level 4 is evaluating if your actual business results have improved.

The goal of a performance intervention is to improve your level 4 score; HPT is based on improving business results (there are a couple of higher levels that have been proposed, level 5 being Return on Investment (ROI) and level 6 being making-the-world-a-better-place-ism, but the "core 4" are the most commonly discussed). However, in many cases if an evaluation is done it is level 1 or perhaps level 2; these can both be done during the training/etc, rather than coming back later. Level 1 can be pretty generic questions ("the instructor was knowledgeable about the material") so it is easier to do.

In our courses we do level 1 evaluations with an occasional bit of level 2, although nobody "passes" or "fails" our classes. We are working to do level 3 evaluations. Anyway, during this training the claim was made that level 1 evaluation had no correlation with increased knowledge (a guy named Richard Clark at USC is the main researcher cited here). That is, whether people feel good about a class and an instructor has no bearing on whether people actual come away with better skills (let alone better job performance or business results). A well-liked instructor may teach well or badly, a disliked one may be effective or ineffective; it's random. The best predictor of whether a class will like an instructor is whether or not he/she brings them donuts on the last day.

I was interested in this because when I was at Princeton there was a book published every year in which student ratings of professors were tabulated. My father, being a professor, was against these things, since he felt they turned teaching into a popularity contest. Some of that feeling rubbed off on me (although I confess to consulting the Princeton guide when looking for electives), so it was heartening to hear that it was all bogus. I was also of course interested because I am now the subject of such evaluations, although they don't have much direct bearing on my annual review. I do try to take classes on public speaking and improve my teaching skills in other ways (and to be safe, I bring donuts on the last day of our week-long courses).

When this point was brought up in my class I opened up my big yap and said, if this is really true then why do we do level 1 evaluations at all? Not surprisingly I was opposed by most of the people in the room, who said that of course level 1 evaluations had merit. But I mostly respectfully disagree ("mostly" because there was one point made that if people went away from our courses hating the instructor they would tell their friends not to sign up. But these are Microsoft employees we are talking about; hopefully if we do a good job of improving their level 2, 3, and 4 performance then they will forgive us for level 1 failings). If you believe the studies (which people were not disputing) then level 1 evaluations are misleading and the results should be ignored. I really do think it is one of those cases where something is so counterintuitive that nobody will believe it until Malcolm Gladwell writes about it.

Posted by AdamBa at October 30, 2006 09:26 PM

Trackback Pings

TrackBack URL for this entry:


When I was teaching I noticed there was a direct correlation between the difficulty of the midterm and the course evaluations. I am convinced that student evaluations are a more-or-less direct result of grade inflation.

Of course, since the evaluations were done before the final one could give an easy midterm and a ball-busting final. It would be unfair to the students of course, but they brought it on themselves.

When I was a student (I am going back 45 years), there was a professor named Smbat Abian (he died recently so he can't sue for libel) who was by all odds the worst teacher I have ever experienced. He also published garbage research, but that is another story. When I took a course from him, he came the first day with a book and started reading it out loud. After two weeks, the last day of the add and drop period, a delegation of students went to him and told him we would drop the course unless he let us give all the rest of the lectures. He readily acceeded to our request. But the reason I am telling this is that in the undergraduate course he taught, "math for filling a math requirement", they loved him. He told jokes, he had games, he taught nothing and gave all As. Student evaluations were just getting started in those days and his were superb. When he was fired (or, the academic euphism, non-renewed) the students started a protest movement. Here was this superb teacher being fired for his lousy research. They were correct that that was the reason, but wrong that he was capable of teaching anything. Eventually, they found him a job at OSU and when that soured, at U KS, who somehow managed to put up with him till he retired. By that time, his first name was Alexander. I don't blame him at all for that change.

Posted by: Marble Chair at October 31, 2006 07:42 AM

I was sick and missed the training last week, but I've always been of the opinion that the level 1 surveys are pretty much useless. They can point out if something is completely out of whack, but it's ridiculous to use them as instructor evaluation (for the record, I say this as an instructor who is usually in the top 20-30% based on the level 1 surveys).

I could even argue against the level 2 having anytyhing to do with instructor effectiveness. It's one thing to teach somebody to pass a test, and something completely different to teach them to apply new concepts on the job.

The problem, of course, with the level 3 and 4 evaluations is that they are a lot more difficult to conduct, so are commonly passed on in favor of the easier path.

Posted by: Alan at October 31, 2006 09:37 AM

Coincidentally enough, on Greg Mankiw's economics blog today he addressed a topic related to the question of the value of level 1 (how much so, I'm not sure--I'm not exactly positive about the sort of classes we're talking about, Adam). You can find a free link to the .pdf here: http://www.economics.utoronto.ca/oreo/research/prof%20quality/prof%20quality%20oct6%202006.pdf

In the study, they looked at subjective evaluations of instructors and how those evaluations related to grades, likelihood of dropping the course and likelihood of taking anouther course in the same subject area. Granted, this is a discussion of undergraduate studies, not professional ongoing education, so the results may have greater or lesser applicability, but their conclusion is certainly interesting:

"The findings suggest
that subjective teacher evaluations perform well in reflecting an instructor’s influence on
students while objective characteristics such as rank and salary do not. Whether an
instructor teaches full-time or part-time, does research, has tenure, or is highly paid has
no influence on a college student’s grade, likelihood of dropping a course or taking more
subsequent courses in the same subject. However, replacing one instructor with another
ranked one standard deviation higher in perceived effectiveness increases average grades
by 0.5 percentage points, decreases the likelihood of dropping a class by 1.3 percentage
points and increases in the number of same-subject courses taken in second and third year
by about 4 percent."

Like I said--this is hardly proof that level 1 evaluations show the same sort of correlation as seen on undergraduates at a large university, but it is certainly interesting to ponder.

Posted by: Marc at October 31, 2006 06:12 PM

Interesting, Marc. On a related note, here is an article defending "smile sheets" (as level 1 evaluations are sometimes known), although it doesn't have much research to back it up, just a gut feeling:


- adam

Posted by: Adam Barr at October 31, 2006 08:07 PM