What’s wrong with numerical feedback?

In short: the numbers are wrong, and the statistics are worse.

Like other academics in the UK and elsewhere, I am judged as a teacher on the basis of feedback from students taking my courses. In some institutions, not doing well enough on this feedback can lead to dismissal. The problem is that this feedback is largely meaningless.

In my university, as in others, the feedback takes the form of comments (valuable and useful for thinking about teaching practice) and a numerical score between 1 and 5 under a number of headings. These scores are then averaged and, in my department, any score under 3.5 is reason to fill in a form explaining what action will be taken to make sure it does not happen next year.

The first problem with the numerical feedback is that students are not good judges of teaching. Insofar as there is evidence from proper trials, it seems that the numerical scores awarded by students do not reflect how well they have learned from their teachers. In other words, the numbers going in are unreliable, especially since with low return rates the results are dominated by students who are disgruntled or very gruntled.

Secondly, the final score is unreliable. As you will know from following opinion polls before elections, when you take a small sample of a group, there is an inevitable error in the resulting estimate of the average. This is especially true when the sample is biased towards the extremes. In my university, students give scores between 1 and 5: averages are presented to three significant figures 1.00 to 5.00.

To see what is wrong with this, think of the distinction between precision and accuracy, something every first engineering student must learn: precision is the number of decimal places, accuracy is the number of decimal places you can believe.

A typical class size might be 40 students. On a 25% submission rate (typical), ten students put numbers in to be averaged. Doing the sums, if one student changes a mark by one, say from 3 to 4, the average changes by 0.1. The academic is assessed on the basis of a difference of 0.01: 3.5 good, 3.49 is a problem. In other words, decisions are made by believing the noise in the signal. A single anonymous student, cheesed off because he has been set an exam question he has never seen before, can ruin a career.

We have numbers which are probably wrong to start with, in biased samples too small to be statistically valid, forced through an averaging process to give a spurious precision, and a management prepared to use these numbers as an `objective’ measure of teaching `quality’.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s