In Defense of Teacher Rating Scales for SEL Assessment

Share on facebook
Share on twitter
Share on linkedin

As I’ve worked with educators to support their wise use of SEL assessment to inform practice, a few themes have emerged. I’d like to share one with you, and some reflections on that theme.

The Rap on Teacher Rating Scales

Many of my colleagues in school districts report that they use teacher rating scales to assess student social and emotional competence. Here’s how a teacher rating scale works: Teachers rate the frequency of a variety of student behaviors on a questionnaire. Their ratings are converted to one or more scores to indicate how each child’s behavior compares to other children their age. Rating scales focused on social and emotional competencies, for example, yield scores for each child that indicate the degree to which a child demonstrates the social and emotional competencies assessed, with a higher score reflecting more competence, as rated by teachers.

Colleagues who use teacher rating scales often express doubt about their validity. Specifically, they are concerned that teacher ratings may be “biased” in one way or another. This could take the form of overly lenient or punitive ratings of an individual student, or overly lenient or punitive ratings of whole groups of students—imagine a grumpy teacher, for example, who rates everyone low on social and emotional competence at the end of a hard day.

The Evidence is in: The Rap is (Mostly) Unwarranted

So long as test scores are not being used for high-stakes accountability, and therefore vulnerable to distortion, these fears are, in my view, generally out of proportion to the bias that is likely to exist in scores derived from teacher rating.

Let me explain.

Rating scales, like all assessments, vary in their technical quality. Most well-validated teacher rating scales have technical properties comparable to any high-quality assessment. They generally demonstrate high score reliability, suggesting that teachers rate students in a consistent way across behaviors that reflect the same competencies, and that they rate the same students similarly across separate occasions. Scores on high-quality rating scales are also generally correlated with scores on other kinds of measures in ways we would expect. This suggests that, in general, the scores are not arbitrary or invalid, but are capturing important dimensions of social and emotional competence.

So on the one hand, there’s a lot of evidence that rating scales yield reliable scores that reflect important competencies. On the other, there’s a lot of skepticism that the scores accurately reflect student competencies. How can we reconcile these two views?

The truth is that no assessment data can always, every time, accurately reflect the thing it is designed to assess, and it is up to each person interpreting assessment results to evaluate the likely validity of the scores obtained.

Consider a reading test. Imagine a student who is ill, or slept particularly badly the night before, or had something weighing on her mind, before taking the test. Her performance on the assessment might be negatively affected by these factors. Anyone interpreting the validity of her score on the assessment must make a judgment call about how accurately the scores on the assessment reflects the student’s reading competence. In this case, I think we would all want the assessment interpreter to account for how these factors might have affected the test-taker’s performance. And that requires the test reviewer to consider things like the student’s attendance record, past behavior, and other sources of data on the student’s reading level. Ultimately, it requires a judgment call.

Similarly, it is true that a teacher’s biases, mood, or other extraneous factors may affect how he rates an individual child or a group of children. As was the case with the reading test, when interpreting the meaning of scores on teacher ratings, the interpreter must make a judgment call about the validity of the scores. As was the case with the reading test, to make this judgment call, the assessment interpreter may consider other sources of information about the student or students. In this case, the interpreter might also consider the teacher’s state of mind in making this judgment call.

I don’t know if anyone has done the study, but I am going to guess that the prevalence of substantially biased rating scale data is similar to the prevalence of off-base academic test scores.

Why I am Defending Rating Scales

Some of you might be wondering why I’m writing in defense of rating scales, when we offer direct assessments. Aren’t I just steering potential partners to the competition? If you’ve been following my work, you know that my goal is to support educators in finding the social and emotional assessment tool that measures what they care about that can best help them achieve their assessment goals. Of course I hope that will lead them to partner with us.

But when it comes to obtaining a quick picture of student behaviors, particularly observable behaviors, rating scales are in invaluable tool. I think everyone, including developers of rating scales, would prefer that fallible human judgment be removed from an assessment of student social and emotional competence.

However, until something better comes along to assess student behaviors efficiently, I hope my colleagues in schools will consider them an important tool in the assessor’s toolkit, that, like any other assessment, whose validity, in the end, requires no small modicum of human judgment.

xSEL Labs’ Simple Rating Scale

Lots of rating scales are available that provide a useful view of student behavior. One of the challenges is that many of these assessments require teachers to fill out lengthy rating scales, which can be burdensome to teachers whose time is already at a premium.

Because good rating scales have been created, we didn’t feel like we needed to reinvent the wheel. But we did feel like it would help our school partners who use SELweb to have some indicator of teacher-rated behavior. So, next year we will offer an optional two item (yes, two!) teacher rating scale that provides a very broad teacher-rated assessment of student behavior.

Our goal is to provide a low-cost and simple way of getting a broad picture of student behavior. SELweb measures students’ ability to read cues, think through problems, and manage emotions—things that happen largely between children’s ears; the rating scale will provide complementary information about what students are actually doing. Together, these tools will give educators a more complete picture of student social and emotional competence.


You might also like

Leading with SEL Data

Leading with SEL Data

All Educators Are Leaders One important job of a leader is to make difficult decisions about where to invest scarce resources. In that sense, all

Read More »