Today, I attended the first day of UW’s Teaching and Learning Symposium, which brings together educators from across campus to discuss techniques for improving teaching and learning at the college level. It’s been very thought provoking, both in terms of the ideas presented and in consideration of related concepts. One of the most thought provoking topics has been the use of new ways of encouraging student communication as a way to improve learning. While I think these efforts are important, I think their employ may ignore possible issues of systematic discrimination.
First, let’s consider the primary goals of measuring outcomes of instruction. There are two. First, we want to evaluate how much students learned. Second, we want to assign categories to students based on their learning. In other words, we want to know that we did our job as educators, and we want to give students a letter grade that represents how much we think that they learned in the course. Now, different departments have different goals; some colleges don’t give out grades at all. But for the most part, these two goals are present in most assessments of learning.
With those two goals in mind, what can we hope to accomplish with new tools designed to promote learning? Mainly, we would hope that these new tools increase learning of students. We could test this by comparing a course without the new tools to a course with the new tools. If students are randomly assigned to one of the other, we can presume that any significant differences in performance on assessment tools like tests is the result of the new methods. If student learning is increased, it would also warrant changing the criteria for assigning categories to students. In other words, there are still the same number of A students and B students; it’s just harder to get an A because everyone has learned more. Thus comparing the two groups, we would see that, on average, even a student with a C using the new tools has learned more than a C student using the old tools.
Using that ideal as the basis for normative assessment of the new tools, we have to consider how two key factors would correlate. First, we consider the amount that the student used the new tool. Second, we consider the grade the student receives in the course. If the tool improves student learning, we should see a positive correlation between time spent with the tool and grade received in the class. Combining this evidence with increased performance between classes with and without the tool, we could make a strong claim that our new tool works to improve learning.
The problem is, however, that gathering both these pieces of data is very difficult. It’s hard to randomly assign students to one class or the other. But it’s pretty easy to find a correlation between use of the tool and grade in the course. Relying on this one piece of evidence for conclusion drawing is dangerous, though, and this is the central error that I believe exists among many educators trying to employ these new tools.
Making conclusions with only a positive correlation between use of the tool and grade is a slippery first step toward a post hoc fallacy. In other words, we assume that because students used the tool more, they got a higher grade. The opposite (or any number of other explanations) could be just as true. Maybe students who were already going to get higher grades found the tool more fun to use and thus used it more.
It is this problem that brings up the possibility of systematic discrimination. The current tools being used for assessment provide reasonable discrimination between the A students, B students, and so on. Thus the application of new tools is not to aid in assigning categories; the new tools are used to increase learning. But if the new tools are simply favored by the students who are already being assessed as high performers with the current tools, then these tools may be disproportionately increasing the learning of high performing students over low performers. Thus the new tools actually increase the knowledge gap between A students and C students.
In many cases, this would not be a concern. After all, 100 freshmen in an Asian-American literature course all begin with largely the same knowledge of the subject. But by the end of the course, there are vast differences in the amounts that the students have learned. Criticizing this outcome is akin to damning the students who read the book twice for learning more than the students who only skimmed it.
The concern comes instead from trying to assess current outcomes and trying to do better. In the end, we would prefer that all freshmen in that course come out of it with a high level of knowledge. And we would prefer to scrap the full range of grades to assign and have the lowest grade in the course be a B. Achieving these goals is the purpose of using new tools. But employing those new tools without evidence to suggest that they provide equal outcomes in learning for all students means simply recreating the problems inherent in the current tools. In other words, the problem isn’t that the students who read the book learn more; it’s that some students don’t read the book in the first place! There is little value in new tools if they don’t solve this central issue.
When new tools don’t increase learning among C students, these tools aren’t changing the status quo enough to warrant their implementation. So no matter the extensive use of the new tools among A students, educators must consider the struggling student to really understand the value of new tools over old. Without this data, either because it is hard to collect or because the instructor doesn’t care, then new tools are simply a waste of time. Assuming, of course, that the old tools are working. Replacing broken tools with new tools is a whole other issue.
So, to educators, I would encourage you to beware of systematic discrimination present in any new tools that you are implementing. Consider how the new tools affect all your students, and don’t be satisfied when your A students love the new tools. In fact, any gushing from those students may be an indication that the new tools aren’t actually improving learning in the way you want them to.