It occurs to me that target grades are a bit like Tribbles. They turned up innocuously in education and have been used with good intentions, but somewhere along the way they got out of control. Now they often get in the way, demotivate students and destroy teachers’ morale.
I thought I would just summarise the target grades that schools commonly use, some issues with their use in performance-related pay and how they might negatively impact students’ performance, especially at the lower end of the ability profile.
Hopefully by the end I will have a couple of suggestions about a fairer way of informing students of their ‘targets’ and also of measuring teachers against those grades.
Note: I’m not going to go into the Star Trek aspect of this analogy. If you haven’t seen TOS ‘The trouble with Tribbles’ or the DS9 ‘Trials and Tribble-ations’ then you are to be ashamed of yourselves.
What different target grades are used frequently?
FFT: The Fischer Family Trust (set up by Mike Fischer, who runs RM computers). They are a non-profit who create ambitious but relevant target grades. They have a suite of options based on students’ prior ability data:
FFT50: Gives you the grade that would put the student in the top half of the students’ prior ability profile, or “average”.
FFT20: Top 20%, which is defined as “high”.
FFT5: Top 5%, which is defined as “very high”.
CEM in Durham University have a suite of onscreen baseline testing that can be used to to statistically model where a student is on the ‘bell curve of the nation’ and produce both likely outcomes and ‘chances graphs’.
GL assessment also offer the CAT test as a way of generating targets based on the students’ performance in a baseline test.
As an aside I recall sitting the CAT test in year 7 back in ’91, but the only thing I remember is I got a free pencil and rubber. I think that gives us all a valuable insight into memory.
Like Tribbles, none of these institutions have deliberately set out to cause problems. In fact without their work we would be in a much more difficult situation. Their intentions were just to support the setting of targets that were robust and challenging for pupils in a variety of contexts and abilities. The problem is not the way target grades are derived, but how schools and successive overseeing bodies have used them.
The problems with target grades being used in performance-related pay:
- Education is a zero sum game.
When you consider how grades are set, often using the FFT20, it completely ignores the concept that for that target to exist by definition 80% of students given it will fail to achieve it. This is often sold as an ‘aspirational’ target but gets really screwed up by point 2 below.
2. Performance-related pay means targets have more value than they were ever meant to have.
Often schools will have a performance- and data-driven target by which a teacher is judged. The problem is that this takes the generated target grade as sacrosanct and ignores the fine detail of how and why the target was generated. If it uses scores scaled from KS2 then it has a risk of being inflated if the student received ‘support’ in KS2. The pressure on primary schools for the accountability measures distorts the process and has led to extreme behaviours in some cases. As professor Becky Allen put in her blog:
“accountability is the enemy of inference”
So although there is a standardised score, each individual score is the cause of various external factors.
3. Targets that might seem reasonable are actually statistically pretty hard.
Right, so this is where my maths might get embarrassing. A common way performance management targets are set is to ensure a certain number of pupils hit their target grade. This is not done in all schools but seems common, especially in the larger multi-academy trusts. Lets say for argument’s sake we had a target that seemed reasonable on the surface, like “To ensure through good and better teaching that 60% of students taught achieved their FFTPA20 target.” Just how hard is this to do for a cohort? So if the cohort was a random distribution of 100 students, all with the same target grade, the chance of 60% of them falling into the top 20% of the national bell curve would be 0.2^60 which when I put it into my calculator it just wrote 55378008 and shut itself down (well not exactly, but you get what I mean, it’s a stupidly small number).
But it’s not actually that hard because within our cohort there are bands and these bands have decreasing chances of hitting their target. So our top 20% will most likely hit their targets, our next 20% would have an 80% chance and so on.
This is where I got stuck, but luckily I had someone who has the brains and the skills and was willing to do me a favour.
So I reached out to Reece Mears, an ex-student of mine. He is an all-round awesome dude and recipient of School of Statistics award for excellence at Warwick Uni two years running. He told me in detail about combinatorics and drew me this nice graph.
He then explained to me;
The calculation (1^6)x(0.8^6)x(0.6^6) finds the probability that everyone in the top three bands reach their target, and no restrictions are made on anyone in the lower bands (this is about 1.2%). This sounds quite low, but this is due to how many possibilities there are in total. For a class of 30, each student can either reach their target or not reach their target, so two possibilities for 30 people means there are 2^30 (over a billion) combinations of students reaching/not reaching their target. As you can imagine, the number of possibilities grows really quickly for bigger class sizes. However, this is only one specific event; we actually need to count every event where the desired proportion of students reach their target. This includes the event where just the top bands reach their target, just the bottom bands, some from each band etc. Imagine a list of length 30, where for each position we can put a 0 or a 1. We need to consider every single way of filling this list where there are at least 18 1s (if we’re looking at 60% of a class of 30). For a class of size 30, if we want 60% of students to reach their target, there are (30 choose 18) different ways of doing that (which is about 86.5 million, so there’s no way to do it by hand). There are a few shortcuts you can take with some combinatoric theory but the only way to find the true probability is to look at each of these possibilities and sum their respective probabilities. I’m not sure if you do any programming but I can send you the code if you’re interested. I’ve included a plot to show how the probability changes when asking for a certain proportion of students to reach their target. Key values are that for a class of 30, there is a 59% chance for 60% reaching their target, and 13% chance for 70%, and for a class of 100, a 55% chance for 60% reaching their target, and 0.9% chance for 70%. (This target is set on the assumption that 50% of the national will hit their target).
So if, as a profession, we agreed to a performance target of 60% of our students reaching their FFT50 target then statistically the majority of teachers are failing to progress along the pay spine (which makes me wonder if the DfE is aware of this?). This ignores the idea that a lot of schools use FFT20, which would drastically reduce these chances.
The problem with using target grades with students
- “They are the dunce cap of the 21st century”
Let me preface this part by linking back to the Tribbles. They themselves were benign; they just got in the way due to growing too fast (like a tumour). In an ideal world where all grades are treated equally and a grade 3 is just as valuable as a 4, telling a low-ability student that they are targeting a grade 3 would give them perspective on their relative success and allow them to fairly judge their performance compared to more able friends. Unfortunately that is not the world we live in and to inform a student their target grade is a 3 culturally tells them they are “stupid”. Schools don’t say this to them, the government does with its categorising of a ‘pass’ and society knows the value of the C/4 in terms of further education etc. It’s the dunce cap of the 21st century (OK so we are 18 years into it now but that still feels like the future to me, showing my age. Also, this is not my original phrase, I read/heard it somewhere but cant remember who to attribute it to.)
2. They are the glass ceiling
By issuing target grades to students we create a ceiling on their ambitions. I admit this is slightly hypocritical, after all just above I was claiming how unobtainable these targets are, but here I am talking about the effect the target has on the students’ motivation and effort.
The other day I weighed myself and I was 4 pounds lighter than I expected to be. I took this as good news and as a success. How did I react? Did I increase my focus on healthy eating and reduce my biscuit intake even further to get myself in shape for the upcoming basketball season? Of course not! I eased off, I ate more biscuits and let myself have more treats, even if I wasn’t hungry.
Hitting your target does not make you work harder, generally if you are doing as well as you expect you will reduce your effort, unless you make a new, more challenging, target. This is a huge problem in the world of sports and a reason why back-to-back championships are such a lauded achievement. By setting a single target for students we create a finite and fixed concept of good performance, which removes all the other options and narrows their understanding of their potential. The reality is for all abilities there is a chance of huge success, although the more able have a much greater chance of achieving the highest grades.
So what is the best solution for teachers?
For schools that have to have data-based targets I would avoid anything based on a certain percentage of students reaching their target. It may seem fair but I think it they often sound easier than they actually are to achieve.
From the different measuring systems I have heard of I think P8 is pretty solid (although it favours higher achieving students and the degree of confidence complicates things slightly).
I also know some schools use an average points score (APS). This also works well, especially now we have numerical grade systems. This could provide a fairer way of assessing the “70% on target” as you could compare the students’ outcomes to what their target APS would be, multiplied by 0.7. This would credit teachers with over-performing students as much as it penalises students with significant under-performance.
The other way is to look at the ability profile of the cohort compared to the national average and adjust the targets to account for the distribution of grades within that course (e.g. 55% of combined science grades are 4+ so a cohort with a scaled score of 100 it would be reasonable to expect to achieve 55% 4+). This would not be a great idea for a small school as they might have a skewed distribution, but a place like my school with 250-300 in a year would be a fair judge.
I also think given the nature of how class performance varies within a school, and the shared nature of classes in some subjects (often science classes will have 2 or 3 teachers), there is probably a benefit on having departmental targets instead of individual class targets. It seems to me that it is easier to make progress with an eager top set than a reluctant bottom set. As someone who timetables this can mean a death knell for the teacher who is arguably doing the harder and more important job with that set 4. This runs into a problem for a strong teacher being held back by weaker teachers in their department. I think this is where a detailed analysis of paper performance etc. might be able to provide that teacher with a chance of progression by demonstrating their individual impact.
Obviously the best solution for teachers is a move away from performance-related pay and greater recognition by the government that they are valued and respected as professionals, but I’m a realist. A quick shout out to those schools like Durrington High school who use a disciplined inquiry to drive progression.
So what’s the best solution for students?
I think for students it has to be chances graphs. They give the student a context of their natural ability but also illustrate that other students have gone to do great things from the same starting point. Here is an example that is generated by MidYIS by CEM
In the above example this student would be given a target of a C/4. I think this kind of diagram is incredibly powerful. It allows for a different kind of discussion during target setting and empowers the students to see how previous students of the same ability have performed. I’m a big believer in the power of the stories we tell ourselves and how they shape our lives. I think this graph tells gives the students a chance to dream big, but accept the effort required to get there. I would then allow them to pick their own target based on the graph. This will be what they want to achieve and should empower them to be part of their success. Then any following conversations can be about how their actions on a day-to-day level either increase or decrease the likelihood of getting to that point. These could be their ‘steps to success’ and form the fulcrum for all their progress conversations. We often talk in leadership training about getting staff ‘over the line’, but how often do we provide a system to encourage the same from the students? In my experience this often becomes a sea of mottoes and soundbites, which generate a derisory eye-roll as much as a motivational spark.
So in conclusion, if you are in a tricky situation where you have to use target grades in performance management, try to ensure those grades are fairly set and statistically achievable; have discussions with your line manager to make sure they understand why you need to change the language and that it will still deliver what they want. If you have to use target grades with students, find a way to demonstrate that the glass ceiling is smashable: use anecdotal stories of previous years’ most successful students, or generate some chances graphs.
There is a chance I might be completely wrong. If you think so please let me know in the comments section or on twitter @MrARobbins
Thanks for reading.
Agree that targets are a ceiling on ambition; personally their existence is acknowledged but in reality with discussions to students, they are all told to aim to the same highest target (i.e. the top). That’s equality…
> …P8 is pretty solid (although it favours higher achieving students…
Evidence of favouring high achievement students? And so what if true? Everyone needs to be challenged, including those of high ability.
If you read the DfE guidance on p8 calculation there is a table of point tariffs for each grade for 2016 and then 2017+18 (can’t link as on mobile app) you can see in its first year the measurement awarded 2 points for an F grade and then following years only1.5 points. Similarly an A was given 8 points then rising to 8.5. so a student moving from G to F was devalued and a student moving from B to A was over valued. Therefore schools with less able students were getting less credit for the same equivalent improvement in performance . P8 is intended to level the playing field for schools but there is an inherent issue with the conversion. Hopefully numerical grading in all subjects will reduce this although there are now more higher grades so assuming a normal distribution it is easier to move from a 7 to 8 than a 2 to a 3 if students are both at the edge of their natural ability
Unless misunderstood, the conversion of alphabetic grades to p8 score is a legacy that will disappear when all subject grades are numerical, so why worry about G-to-F being devalued; similarly why should B-to-A _not_ be of greater value? High achievement and effort should be valued more than high effort alone and there is nothing wrong with a student trying their best to achieve grade 3. Students need honesty: some are good at all exams, some bad, some are variable.
Based upon Ofqual communication (https://ofqual.blog.gov.uk/2018/07/20/setting-gcse-grade-boundaries-this-summer/), a normal distribution for grade boundaries can’t be assumed, especially as grade 9 is reserved for (approximately) 97th percentile. This should prevent grade inflation which some will resent…
I said it favoured the more able. That may be by design I don’t know. Personally I think it would be better to treat them all equally as if the nature of p8 is a school measuring device then it most likely should measure a schools effectiveness with all students. And it’s not just a legacy issue. There are 2 new grades at the top end. So if the 4 is fixed to previous years (which it was this year) then by definition all the higher grades have fewer students in each grade. This means there are less students to outscore to move up a grade.
The grade 4 “fix” this year may have been a politically expedient move to minimise pushy middle class parents and their media supporters from whining about the “system” being unfair to this cohort being “guinea pigs” etc.. Ofqual have stated that grade boundaries will not be announced until exams are collated, so this makes the statistical norm-referenced model robust. There is nothing wrong with the concept of “fewer students to outscore”, unless criterion-referenced assessment is preferred to norm-referenced assessment.