Reflecting on Rubrics

Rubrics are like multiple choice questions. They are difficult to write, but if you put the time in and do it well, you are saved from later work and frustration. A good rubric makes grading complex tasks easier, as the criteria and degrees of quality are clearly defined and leave less room for doubt and indecision. However, there are a lot of bad rubrics out there (I’ve certainly written a few myself), and seeing as I have been working several rubrics recently, I thought the topic of writing and using rubrics merited some exploration.

What Are Rubrics?

A rubric is “a type of matrix that provides scaled levels of achievement or understanding for a set of criteria or dimensions of quality for a given type of performance” (Allen & Tanner, 2006). To put it simply, criteria down the lefthand side, and levels of performance along the top. I feel silly defining a rubric for teachers, but it is helpful to understand the components of a rubric, as all must be carefully written for the rubric to be an effective tool.

There are different types of rubrics. Analytic rubrics divide performance into discrete criteria to be evaluated separately, while holistic rubrics employ more general descriptions and all criteria are evaluated simultaneously. Rubrics can also be categorized as general or task-specific. General rubrics are more broad and can be applied to a family of tasks, such as argumentative writing. Task-specific rubrics are written for the content of a particular task.

Personally, I am partial to general analytic rubrics. The parsing of a task into distinct criteria makes it easier to target instruction for particular areas of weakness among your students; for example, if they all scored poorly for using evidence, but did well for writing a clear introduction, you know what to focus on in your lessons. As well, it is easier to grade, as students may score well on one criteria and poorly on another, something a holistic rubric doesn’t allow as all criteria are lumped together. General rubrics help focus instruction and learning on developing particular skills over time, rather than just completion of a particular task, and save teachers the work of writing a new rubric for each task; however, they are hard to write well without being broad to the point of uselessness, and may need to be customized to the task as necessary.

Why Use Rubrics?

First of all, rubrics are tools for communication. They communicate expectations to students, and help them understand what quality looks like for a given medium or discipline. They communicate the learning outcomes for teachers, particularly important when there is more than one teacher for a course or grade level. They communicate achievement of those learning outcomes to teachers, department heads, and curriculum coordinators, as the different criteria help to show the strengths and weaknesses of the students.

Secondly, there are tools for instruction. By breaking down performance into criteria, teachers can focus in on particular skills in lessons; as Daisy Christodoulou argues, it is better to break complex skills down into smaller pieces for students to practice, and rubric criteria help with that breakdown process. Students can also use rubrics for self- and peer-assessment, which helps them gain an understanding of what good quality looks like.

Finally, they are tools for grading. Unfortunately, we are still very much embedded in this idea of reporting achievement with numbers and letters. Rubrics help ensure this is done fairly, particularly for complex tasks where judgement of quality can be more subjective. Research has been done showing how use or rubrics can lead to more reliable scoring of assessments, particularly when combined with exemplars and calibration between scorers (Jonsson & Svingby, 2007).

You don’t always need rubrics to achieve these goals. A checklist can also communicate quality and break down complex skills. The problem with a checklist is that it is binary: either something is there or it isn’t. Rubrics should be employed when gradations of quality matter. For example, three different essays can all use evidence, and thus have it checked off a list; however, they can vary in the relevance of the evidence to the argument, detail provided in the evidence, or the originality of the interpretation. There are appropriate times for using a checklist or mark scheme, and appropriate times for a rubric, and judging when those times are is important.

Who Are Rubrics For?

Thinking about the intended end-user of a rubric is important, as it informs the type of language used. People advocate writing rubrics in student-friendly language, but the challenge with this is the sort of precise language needed to make a rubric reliable also makes it less student friendly. As well, unless you are employing single-point rubrics (which are more for feedback than grading), they tend to be large walls of text, daunting to read for younger students or ELLs.

As well, while I did state above they can be used to communicate expectations to students, and tools for self and peer assessment, I sometimes think they are not the best tools for the job. For self and peer assessment, it is usually better to boil the rubric down to a checklist to make it more digestible. For communicating standards, exemplars do a better job. For example, a rubric may ask that an essay use a formal tone, but if you want students to actually do that, it is better to show them examples and non-examples of formal tone, rather than reference the rubric. Even the great Dylan Wiliam advocates using exemplars over rubrics for helping students improve their work.

Rubrics are also very specific to the school experience. I have recently dabbled in bookbinding, an interesting experience in skill development that probably merits its own post. When I was trying to figure out how to bookbind, I looked for instructional videos. I also inspected hardcover books, using what I learned about construction to see how they were put together. Instruction and exemplars. I did not need a rubric, nor will I seek one out (nor do I think any exist, in this case). Using rubrics to develop a skill is unique to school.

So, in light of these mixed feelings about students being the target audience for a rubric, I found a few academic studies. Consider this to be a brief literature review.

Greenberg (2015) looked at whether use of rubrics in an undergraduate course resulted in better quality APA-style reports. In the first experimental manipulation, rubric use was compared to not using the rubric. Handouts and several class sessions were used to teach students how to write that style of report and to develop familiarity with the rubric. Sections that used the rubric scored significantly better on their final report than sections that did not. However, the difference between the experimental and controls groups is not clear. Did the control groups get any instruction on how to write an APA-style report? Did they go over an exemplar, instead of the rubric? The paper does not explain this.

In the second experimental manipulation, students graded a classmate’s paper using the rubric, and then had the opportunity to rewrite their own report. They were not told in advance that they would have the opportunity to rewrite, nor were they given the results of the peer grading. In general, the rewritten reports were of higher quality than the initial reports. However, this is a pre-test/post-test kind of comparison, and fairly useless for demonstrating an effect. After all, the rewritten reports could have improved from simply leaving them for a week and then revisiting them with fresh eyes. A better comparison would be to have had two groups do peer feedback, one using the rubric and one not. In general, Greenberg (2015) fails to demonstrate the benefits of students using rubrics for self and peer assessment.

Andrade, Wang, Du, and Akawi (2009) looked at the impact of rubric use on student self-efficacy for writing. In both the treatment group and the comparison group, students were asked to list characteristics of good writing and revise a rough draft. In the treatment group, students also reviewed an exemplar and self-assessed using a rubric. In general, both groups showed improvement in self-efficacy over the course of the writing process. There was no significant difference between treatment groups. However, when the data is broken down by gender, girls showed more improvement in self-efficacy in the treatment group than the comparison group. So, for this study, student use of rubrics did not really have an impact, but gender complicates the picture.

Wang (2014) looked at peer feedback for EFL writing, including student perception of using rubrics to guide their feedback. Student perceptions were mixed. One student felt that it “enabled us to realize what a good piece of EFL writing looks like” and that “[u]sing the rubric prompted us to pay attention to problems of both language use and content development; otherwise we might have only focused on those language problems.” Another student was wondering “whether adherence to the rubric would confine our patterns of thinking about EFL writing…Is there any other set of criteria for English writing? What should I do if I read some native English writers’ essays which do not strictly meet the rubric’s requirements?” These mixed perceptions match my feelings as well. Overall, analysis of the perception data showed that students held positive views of using the rubric.

Panadero and Jonsson (2013) completed a research review on the use of rubrics for formative assessment. They found that using the rubrics improved transparency regarding grading, reduce student anxiety about assessments, aided feedback, and improved student self-efficacy and self-regulation. This picture is complicated by the fact that rubrics are often used in conjunction with other activities, such as peer and self assessment and looking at exemplars. I am a bit skeptical about the quality of this review, as they reference the Andrade et al (2009) study from above, framing it as evidence showing improvement of self-efficacy; however, having read the paper itself, I know that is not quite what the authors concluded.

I do think the rubrics being used for grading should be available to students, for the sake of transparency. By all means, refer to them when introducing assignments or structuring peer feedback sessions. It certainly helps mitigate against “black box” grading, where an assignment goes in and a grade comes out, and the student has no idea of how that grade was determined. But I still think that the intended audience is teachers, and trying to cater for two very different audiences in your rubric design will make writing a good rubric difficult.

Finally, remember that the rubric serves the teacher and the instruction, and not the other way around. You don’t teach something because it is on the rubric; you teach it because it is an important skill or characteristic of quality within a given discipline. Keep this in mind for how you talk to students about the work as well. Don’t tell them to include a hook or evidence because it is on the rubric; tell them to do so because it is part of good writing. Remember that your goal is not student comprehension of the rubric, but for them to improve at the given task or skills. A great deal of time spent decoding and breaking down the rubric may not be the most effective way of reaching that goal. If you find yourself teaching to the rubric or teaching about the rubric, rather than the rubric being an accurate reflection of quality for that product or discipline, it is time to revise (or discard) the rubric.

A Checklist for Rubrics

I’m not going to write a rubric for rubrics. There are a couple you can find online, but I think in this case, a checklist better suits our purposes.

Rubrics should be

  • Valid
  • Reliable
  • And practical

These three attributes should be in balance. A rubric should not sacrifice practicality to validity, or validity to reliability and practicality. When a rubric is all three, then it can be a tool to communicate the goals of the assessment, evaluate the extent to which the student work met these goals, and target areas of improvement.

And while a rubric is the better tool for evaluating items that fall on a continuum, here I think the checklist approach invites debate. Find a rubric and discuss it with your grade-level or subject team: Is it valid? Is it reliable? Is it practical?

Valid

Does the rubric identify and assess characteristics of quality work for that discipline and medium? Does it measure what it is meant to measure?

Take, as a made-up example, a rubric for a presentation with low validity. The presentation meets expectations if it has ten slides, as the teacher wants a minimum number of slides. But is this a valid way to measure the quality of a presentation? There can be a very good presentation with only three slides, and a very bad presentation with fifteen. It would be better to judge quality based on what is on the slides, how they are used to supplement and enhance the presentation, and the presenters’ public-speaking skills.

Experts in the disciplines should have a role in writing and refining these rubrics, as they know best what quality looks like in that discipline and how to break it down. There is the risk of the “curse of knowledge,” where the characteristics of quality in a discipline have become so internalized and automatized that the experts are no longer aware of what they are. However, even with the curse, an expert can identify good and bad work when he or she sees it, and with some reflection, articulate what is wrong and right. This expert knowledge is then put to use in identifying and defining criteria, parsing a task into different dimensions.

Sometimes, when a rubric lacks validity, you can feel your tacit knowledge kicking in. You may grade a piece of work that scores well, but you feel it is not as well done as the grade reflects. Or, the reverse: it scores poorly, but you feel it should do better. Of course, there are plenty of unconscious biases at work here: perhaps you are letting writing mechanics distract you from assessing the quality of ideas (Rezaei & Lovorn, 2010), or maybe the halo effect of the student’s behaviour is distorting your judgement. But if this problem comes up repeatedly for a given rubric, you should sit down and try to articulate what the problem is.

Standards should also be considered when writing rubrics. We, as teachers, are beholden to the standards and learning outcomes we are given, and an assessment is meant to be a statement of a student’s attainment of those standards.

If writing an analytic rubric, the criteria should be discrete. If they overlap too much, then you have not identified separate and independent aspects of the task and will be giving more than one grade to essentially the same thing.

The final aspect of validity to consider is the performance descriptors and the progression of the different levels of performance. The gaps between the levels should not be too narrow or too large, but consistent and logical. If the gaps are not consistent, then that creates validity problems, as rubrics are often used to grade students and the meaning of those grades should be clear.

Reliable

Does the rubric result in consistent judgements of student work between teachers and between grading sessions?

This is where distinction between levels becomes important. If there is overlap between the levels, then it is harder for teachers to be consistent in their judgement of student work. What one teachers sees as a Meets Expectations, another may see as an Exceeds Expectations.

Language should be specific and vague terms clearly defined. As an example, say a rubric is assessing whether a student has a “basic,” “adequate,” or “deep” understanding of Greek history. Those adjectives are vague and will be interpreted in different ways between teachers. Instead, the rubric designer should think what a “basic,” “adequate” and “deep” understanding of Greek history would look like in a student’s work, and write the performance descriptor using that. Clear descriptors also help with the practicality of a rubric; if something is ambiguous or unclear, then the scorer becomes frustrated and it slows down the grading process.

Some vagueness is probably inevitable, but it is best to strive to be as precise as possible. Reliability of a rubric can be improved by having a calibration sessions where teachers collaborate in grading samples of student work and discuss how to interpret the descriptors, and as well by having some exemplars of student work, the levels they achieved for each criteria, and an explanation of why. Having the exemplars is helpful because they can be referred back to, while the calibration session may fade in memory, particularly when you are entering your tenth hour of grading. It is important to be consistent in grading, not just between teachers, but from one day to the next or one year to the next. Grading drift is a thing (Casabianca, Lockwood & McCaffrey, 2015).

Practical

Is the rubric easy for teachers to use and for students to understand?

A rubric can be carefully aligned with curriculum standards and each descriptor precisely phrased, but if it is large, elaborate, and time-consuming to use, then that effort will be for naught. Even the most well-intentioned, hard-working teachers will get a bit sloppy with their grading as they reach their fortieth or hundreth essay, and if the rubric is difficult to use, then this is only exacerbated.

Reducing the number of criteria and levels is one way to make a rubric more practical for teachers. For analytic rubrics, every criteria is a grade that needs to be given, so while it is possible to divide performance into many different dimensions, it is better to focus on just a few, so there are fewer criteria to grade. These can be the ones most essential for the performance, or if a task is repeated throughout a course, different dimensions of performance can be the focus for each assessment.

Limiting criteria not only makes a rubric easier to use for teachers, but also easier to understand for students. It is easy for students to be overwhelmed by having to consider too many things that need to be included in their work. I would argue that it overwhelms working memory, as they are still novices in the discipline. Having fewer criteria helps teachers focus their instruction and students focus their efforts.

Performance levels should also be restricted to just a few, as it may not be possible “for individuals to make operational sense out of inclusion of more than three levels of mastery” (Allen & Tanner, 2006). I certainly struggle to write more than three distinct levels, and if a rubric requires four or more, I start to fall back on vague adjectives like “some” and “adequate.”

Sometimes, using language that is specific and detailed enough that a rubric is reliable means compromising on having language that is simple enough for students to understand, as I mentioned before. This is why a rubric needs to be balanced between validity, reliability, and practicality, and a strongly valid and reliable rubric may be a bit cryptic to students. Explanation and annotation of exemplars can be used to communicate these expectations.

Examples of Good Rubrics

With something as difficult as writing rubrics, it is helpful to start with examples of success: rubrics that are valid, reliable, and practical, or at least come close to it.

AACU’s VALUE Rubrics

When I think of quality rubrics, the first that come to mind are the VALUE rubrics, written by the American Association of Colleges and Universities (AACU). These rubrics assess general learning outcomes that the AACU hopes students will achieve by the end of their three- or four-year degree. They are meant to transcend disciplines, and be applicable for both freshmen and college seniors. They are broad, which is their greatest flaw. Because they are meant to work for such a wide range of students and subject areas, they require some “translation” for a given task.

My last university summer job revolved around these rubrics. I was lucky enough to be working for a research project evaluating attainment of these general learning outcomes by students in different programs, at different points in their degree. Basically, my job was grading, which sounds like a teacher’s worst nightmare, but it turns out grading isn’t so bad when you are given the time to do it and opportunities to discuss the student work and what the grades really mean along the way.

Let’s look at the critical thinking rubric, because critical thinking is such a slippery concept and I am always happy to spend more time thinking about what it means.

The rubric has five criteria: explanation of issues, evidence, influence of context and assumptions, student’s position, and conclusion and related outcomes. That is a practical number, and breaks critical thinking down into distinct facets that can be used to guide instruction in, for example, the writing of an argumentative essay. They are also a fair representation of critical thinking. When I think of critical thinking as a skill, I think about using evidence, consideration of other perspectives, and so on.

There are four performance levels, which are in general distinct from each other, though there is a bit of overlap and lack of clarity. For example, for student’s position, the level 2 descriptor states that the specific position acknowledges different sides of an issue, while the level 3 descriptor states that the specific position takes into account the complexities of the issue and acknowledge other points of views. What is the difference between acknowledging different sides and acknowledging other points of view? The rubric does come with a glossary for the terms used, which is a great idea, but these phrases are not defined. I think this does demonstrate the difficulty of writing descriptors for more than three levels.  

I think these are good rubrics for general learning outcomes. I have referenced them before, when trying to think of criteria and descriptors, though with a fair amount of modification to suit the age level of my students. They are valid, in that they accurately breakdown the components of the skills into different criteria that reflect successful use of the skills in the real world. There is a bit of a problem with reliability, because of the distinction between the performance levels but also because of the inherent broadness of the rubric. If used in a course, it would be best to tailor the language to the task or discipline, but it provides a much better starting point than a lot of rubrics I have seen. In terms of practicality, five criteria is more than I prefer, but because most of the terms are clearly defined, there is not as much frustration with ambiguity that stalls the grading process.

Big History Project Rubric

The Big History Project (BHP) is a curriculum that takes students through all of time, starting with the Big Bang. I’ve never taught it, and it includes a lot more science than I really like in my history. It does, though, have a strong skills focus, particularly on reading and writing, and provides all the resources a teacher needs, so it does intrigue me.

BHP includes investigations regularly throughout the course, where students use a range of sources to write a report. The rubric for this breaks performance down into four criteria: constructing an argument, using texts as evidence, applying BHP concepts, and writing with appropriate mechanics. This is valid, as those are the different components of historical argumentative writing, and would apply to my own undergraduate writing during my history degree.

It is also a very reliable rubric. There are five levels of performance, but the difference between them is made very clear, and the descriptors are broken into bullet points for ease of reading. It would be easy to pinpoint different aspects of a student’s report to justify a grade, so consistency between scorers and between grading sessions would be easy to achieve.

Initially, I was concerned about practicality of the rubric, because it does go into so much detail and spans four pages. It seems like it would take a while to use. However, this is a rubric that can truly provide feedback to students. The descriptors are so clear, you can just check them off, and next steps in revision can be found by just looking at the next higher level. So, while grading may take a bit longer, time is saved in giving feedback. In general, I don’t really see rubrics and grading as a good way of providing feedback to students (Lipnevich & Smith, 2008), but I’ll make this the exception.

Writing an Awesome Rubric

Step 1: Think about the assignment and your goals for it. What do you want students to do? How will you know when they’ve done it? What does success look like for this task? Think about any standards you need to incorporate into the rubric, such as curriculum standards or general learning outcomes set by the school.

Step 2: Decide on your type of rubric and, if writing an analytic rubric, the criteria. What are the different components of success for this task? Make sure the criteria are distinct from each other.

Step 3: Decide how many performance levels you need. Check to see if your school has a standard way of doing this (e.g. level 1/2/3/4, below/meets/exceeds expectations). Next, write your performance descriptors. They should be clear, specific, and based on something observable. Use parallel language across the descriptors, so the differences between each level are clear. Beware words like “somewhat” or “adequate”; try instead to define what that looks like.

Step 4: Finally, if possible, test your rubric before rolling it out. Your rubric won’t truly be good until it has passed through several rounds of grading and revision. You can start that process early by holding a calibration session with your fellow teachers and a few samples of student work, to pinpoint problems with the rubric before you start using it in the unit.

It is easy to get stuck or spin your wheels during this process. I would recommend, when possible, to avoid starting from scratch. Are there rubrics or checklists that were used before for this assignment? Can you find examples online for a similar task? Are there any ideas you can steal from examples of quality rubrics, like the VALUE or BHP rubrics? Remember to modify what you find; it is unlikely that it will be perfectly suited for what you are doing.

If you must start from scratch, look at exemplars of the task or, even better, samples of student work for a similar task. This may seem a bit backwards, to start by looking at student work or exemplars then setting the standard, rather than starting by setting the standard then looking at student work. However, you will be influenced by exemplars you have seen before, even if you lock yourself in an empty room until you finish writing the rubric. If you are writing a rubric for an essay, you will have read and written essays before and that will influence your ideas for criteria and performance descriptors, even if you stubbornly refuse to look at samples of student work while writing the rubric. Examples help our students understand expectations for a task; they can also help you articulate those expectations in the form of a rubric.  

Overall, writing a rubric is a difficult task. However, it is a time investment that saves you from frustration in the future. A well-written rubric makes grading easier and more reliable, and can help design instruction in the unit. So, stand up, have a quick stretch, then get to it. It is a process and one that you get better at with practice.

 

References

Allen, D., & Tanner, K. (2006). Rubrics: Tools for Making Learning Goals and Evaluation Criteria Explicit for Both Teachers and Learners. CBE Life Sciences Education, 5 (3), 197-203.

Andrade, H., Wang, X., Du, Y., & Akawi, R. (2009). Rubric-Referenced Self-Assessment and Self-Efficacy for Writing. The Journal of Educational Research, 102 (4), 287-301.

Casabianca, J., Lockwood, J., & McCaffrey, D. (2015). Trends in Classroom Observation Scores. Educational and Psychological Measurement, 75 (2), 311-337.

Greenberg, K. (2015). Rubric Use in Formative Assessment. Teaching of Psychology, 42 (3).

Jonsson, A., & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and educational consequences. Educational Research Review, 2, 130-144.

Lipnevich, ., & Smith, . (2008). Response to Assessment Feedback: The Effects of Grades, Praise, and Source of Information. Educational Testing Service. https://www.ets.org/Media/Research/pdf/RR-08-30.pdf

Panadero, P., & Jonsson, A. (2013). The use of scoring rubrics for formative assessment purposes revisited: A review. Educational Research Review, 9, 129-144.

Rezaei, A., & Lovorn, M. (2010). Reliability and validity of rubrics for assessment through writing. Assessing Writing, 15, 18-39.

Wang, W. (2014). Students’ perceptions of rubric-referenced peer feedback on EFL writing: A longitudinal inquiry. Assessing Writing, 19, 80-96.

Advertisements

One thought on “Reflecting on Rubrics

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s