We often get asked about rating scales – for interviews, scoring people in assessment exercises, performance and talent management…

How should we rate people?

How many points should there be on the rating scale? Five? Four? Ten?

What labels should we use?

And our question is, what is the purpose?  Why do you want to rate people?  Because clarifying that can help us find the best way of doing things. If we want to make decisions, then we need to know what we are deciding and remove a mid-point rating to avoid people being indecisive.

If we want to improve people’s self-awareness, then we’ll want to ensure there is enough detail and specifics in what we have rated, for everyone to truly understand how they are doing.

Either way, we will need to be crystal clear on what we are measuring. And here lies the issue. How often are we all on the same page about what we are looking for in a candidate? And when it comes to rating performance, do we have enough detail written anywhere to explain how someone is doing? And the moment we have that level of detail written out, isn’t it out of date? Due to new technology or the changing nature of the business and roles?

For recruiting purposes, you could create a bespoke rating scale for each role or even each assessment exercise. You can write out what a 2 looks like in this exercise and what a 3 looks like, enabling a highly objective rating process for assessors. This is great for graduate recruitment, where there is volume involved and it’s worth taking the time.

But for lower volume recruitment, or more generally for performance management, you will need a universal rating scale. Something that every leader can use to assess people, across levels and roles. Immediately then, this becomes subjective.

“Do I think you’re a 2 on communication skills, or a 3?”

Some businesses have attempted over the years to make this more objective, adding percentages or descriptions that might encourage less bias. But you simply cannot remove subjectivity from a judgement exercise. Because that’s what this is, me making a judgement about your performance or potential. And where there is judgement, there is bias.

Perhaps a helpful starting point then, is to acknowledge the limitations of the rating scale. It will never do the job for us of explaining to someone what their strengths and development areas are, and how they can go about improving. Rating the person could be a good starting point for that conversation, but it can’t cover off all the nuances and specifics that will be different for each individual.

Likewise, when it comes to hiring, the rating scale will not achieve for us a perfect level of objectivity, and a shared understanding of what we’re looking for. Even if everyone has read the JD and we all think we’re on the same page, we will have different aspects of the role we emphasise and we will have our biases at play in our judgement of the candidate’s ability.

That leaves us with three options:

Go basic. Accepting that a rating scale will not do the job for you of giving a detailed assessment and/or feedback to the individual, means it could be better to go high-level. A very simple rating scale, like this one for an interview:

Concern – Development Need – Acceptable – Strength

It’s the quick and simple option to get you to an outcome of rating something, and because the rating is so vague, it encourages discussion about why that score was given. For performance reviews, you might scrap the wording and simply say it’s a 1-4 rating scale, where 3 is what is expected. Someone doing the role to an acceptable level would be rated a 3. It’s simple, clear and managers can use a 1 or 2 to communicate a problem and a 4 to celebrate a great result or strength in behaviour.

Go complex.

Invest in making your rating system more detailed, more specific and therefore more accurate and useful. This will take more time and will provide you with greater objectivity. As mentioned above, this works well in assessment centres and volume recruitment, because you’ve got the number of roles and candidates to make it worth spending the time.

Get super specific on what each rating would look like in each exercise, giving examples of the sorts of behaviours you would see from candidates. Here is an example from a management development centre presentation exercise:

Taking this route for performance reviews and succession planning is far more challenging. How can you be specific about every role in the business? You could do this for leadership competencies, which would be generic across roles.

Detail what it would look like for someone to be displaying each leadership competency as a development need, and what it would look like if it were a skill or a strength. But for role-specific items, you will be relying on each manager to give the detail. This is where a mix of the two approaches can be helpful, so you have a basic rating scale for role performance against objectives and role-specific behaviours, then use the more detailed rating scale for leadership behaviours.

Scrap it.

Do you really need a rating scale? What does it give you? Is there another way to get the outcome you want, without ratings? The shift away from the old annual appraisal cycle and ratings linked to bonuses, emphasised the need for better, richer conversations every day.

And when it comes to succession planning, could we have conversations about who sits where and who has the potential to move, without the need for a 9-box grid? Ultimately, we need a way of assessing behaviour and giving feedback. Those are skills people can learn. And a rating scale won’t replace the need for these skills.

In recruitment, a rating scale is always helpful: scoring candidates against the job criteria and seeing who is the best fit. In performance and development, the usefulness of the rating scale is generally linked to the quality of the manager: how well can they lead a conversation and offer useful specific feedback?

If you do go ahead with a rating scale, here are some extra tips to help you:

Even numbers: ever notice how people will score down the middle on a lot of things? In 360 degree feedback surveys, customer satisfaction surveys – anything at all, you will often see a lot of mid-point ratings. This is known as central tendency bias: our tendency to rate in the middle.

It is often made worse in people ratings because we see strengths and development areas. For example, “How shall I score this person on their project management ability? Well, they’re good at setting up the project and getting people involved, so that could be a 4 out of 5. But they’re not so good at risk management and getting the project finished on time, so that sounds like a 2. I’ll give them a 3.”

Maybe that’s ok in a performance management setting, but if you need to make decisions, then having everyone score a load of 3s will not help. Having an even number of ratings on your scale, four being the most common one I see, forces a decision. If someone might score in the middle, we must ask, are they good enough? Then it’s a 3 out of 4. Not good enough and it’s a 2 out of 4.

Commentary: with the example above, it is a huge missed opportunity for the person with mixed project management skills, to not get that detailed feedback. If the individual was told that detail: what they’re good at and what they’re not so good at, it would be really useful.

They would know where they need to improve. But the reality is, they will probably just be told they scored a 3. Adding commentary to your rating scales can encourage managers to explain their rating. And rather than just saying “add comments here…” you can gain more specifics by asking “what is this person good at in project management? What do they need to improve in project management?”

The downside to that of course is that the rating process takes much longer, but we come back again to the purpose. If the purpose of this rating activity is to give an employee feedback so that they can improve their performance, then why would we not invest the time to help them do just that?

Word-smithing: the pain for many People functions is often the amount of word-smithing that goes into creating these scales. The issue here is that there is no one set of wording that will work for everyone. And so you can expect to have some disagreement and lots of to and fro as you work to get an end product. One word I would advise against is “average,” because the rater will probably not have a benchmark to know what average is.

And nobody likes to be told that they are average.

To a similar point, beware of language that sounds more personal like “good” as this suggests you are commenting on my character. If I’m not good, am I bad?

If you want to chat through your thinking on this, have a sounding board, or talk to us about supporting the design or your recruitment or performance processes, contact us here.