"Weedout" Courses Considered Harmful

4 Feb 2016 diversity in stem, CS, teaching, academic machoism

"The Perils of JavaSchools", by Joel Spolsky, is a wonderfully fun read for students of computer science. A veritable demigod of the software world tells a story -- ringing with appealing truth throughout -- of a tragic fall from grace in modern CS curricula...and at its core, that great deluder Java.

It's relentlessly snarky, and feels relentlessly true, as it lays out in gruesome detail the extent to which kids nowadays are being coddled and left tragically unprepared for the big scary world where pointer arithmetic, abstraction, and recursion are inescapable necessities. It's hard to read it without coming away with the sense that Spolsky has hit upon the great uncomfortable truth in computer science -- that some curricula simply fail to properly train young minds in key concepts.

It's a fun read.

But it's got some problems.

Though I am loathe to disagree with such a titan of my field, I respectfully submit that Spolsky not only misses the broad point of a computer science education, but gets it so badly wrong that he manages to align himself with one of the most pernicious systemic problems facing the field of computer science -- and the broader tech community -- today.

Since I have a great deal of respect for Spolsky, this post turned out very, very long. (Even after I pushed off large portions of the initial concept to other posts.) If you're more in the mood to read a shorter post where I throw out unsubstantiated claims and don't stop to explain everything, I've tried to summarize my main points in a summary-post.

(1)

The heart of the essay is in two early paragraphs:

You used to start out in college with a course in data structures, with linked lists and hash tables and whatnot, with extensive use of pointers. Those courses were often used as weedout courses: they were so hard that anyone that couldn't handle the mental challenge of a CS degree would give up, which was a good thing, because if you thought pointers are hard, wait until you try to prove things about fixed point theory.

All the kids who did great in high school writing pong games in BASIC for their Apple II would get to college, take CompSci 101, a data structures course, and when they hit the pointers business their brains would just totally explode, and the next thing you knew, they were majoring in Political Science because law school seemed like a better idea. I've seen all kinds of figures for drop-out rates in CS and they're usually between 40% and 70%. The universities tend to see this as a waste; I think it's just a necessary culling of the people who aren't going to be happy or successful in programming careers.

(...)

...but keep in mind that that's only the first of two "hard course[s] for young CS students".

There's a compelling refrain here:

There are people with what it takes to be computer scientists, and people who just don't. You know who I mean; there are those students who really, truly understand pointer logic, and others who struggle through it, needing clarification on the difference between an address and its referent every step of the way. One of these has a future in software engineering; the other does not.

And the sooner we can separate those who can handle the mental challenge of a CS degree, the better for all involved...

But here's the thing about weeding people out: It happens when students choose to quit, not when teachers choose to drop them. You can make your data structures course as hard as you like, but there will always be one kid in the class who digs in and declares "I will not let pointers defeat me. I'm smart enough, and there's no reason that I can't understand this material if I set my mind to it!"

That student will stay in your class. The student who, hitting the same patch of trouble, begins to think "Maybe they were right about me. Maybe I don't belong here..." won't. The students who have had that doubt planted by a thousand half-spoken social cues -- that perhaps, there is a reason they can't understand the material, even if they set their mind to it -- will dig in the first time, and the second, but perhaps not the third, fourth, or fifth.

The other student that will drop out -- though it'll take her much longer -- is the one who overreacts in the other direction. Faced with the choice between giving up or proving everyone wrong, she decides that she refuses to prove the stereotype true. She resolves to learn pointers, goddammit -- and on her own, not by going to office hours to have her hand held by a constant stream of teaching assistants. In the end, she studies hard -- alone, reluctant to ask for help -- and, eventually, just can't do it any more. She, too, drops out for a subject where she feels less of a need to prove herself.

And, at the end of the day, the 40 to 70 percent lost in the "necessary culling of people who aren't going to be happy or successful in programming careers" includes a hugely oversized share of women, racial minorities traditionally underrepresented in tech, and students from socioeconomically disadvantaged backgrounds. Impostor syndrome and stereotype threat are real, and I don't think it's much of an exaggeration to say that they are the reasons our field is so white, Asian, and male.

And yes, our field is overwhelmingly white, Asian, and male.

(2)

"So we lose a lot of fragile students who can't face failure -- so what? They weren't going to last long in the real world, anyway..."

Let's assume, for the sake of argument, that there really is a distinction between the sort of student who can do well, and the sort of student who has no chance.

And let's imagine that you want to design a system for separating the students with 'grit' -- the determination to dig in and rise to the challenge when the work gets hard -- because they're the ones who, ultimately, will go farthest.

Then I argue that it's still the case that this system is terrible at doing just that.

By tilting the tables so heavily in favor of traditionally overrepresented groups -- by forcing students to judge themselves as worthy or unworthy with only a few weeks' experience with the subject -- you'll get a lot of privileged but grit-lacking students in your tent...and a whole lot of students outside it with a great deal of grit, but just not enough to overcome the messages that all of society has been pressing upon them for the entirety of their education.

We can phrase the entire issue in mathy terms (though, if you'd rather not, feel free to skip down to section 2A): We want to design a statistical test (Spolsky recommends the "weedout class") to distinguish between those students who are going to go on to do well, and those who won't. Ideally, we'd like a test that will always report "Pass" for the former group, and always represent "Fail" for the latter. But there are three things that can go wrong here:

First, overall bias. If the test is just too likely to let students in (or too quick to fail them), then it will always get some fraction of students wrong, even if it can perfectly rank students in order.

Think, for example about a badly-written and uncurved final exam where everyone, including the good students, fail. As another example: Spolsky, in "Perils", criticizes the system for an overall bias in favor of passing students.
Second, variance. It's possible to correct overall bias by shifting the goalposts in order to properly calibrate. the test. But if the test makes errors in both directions, it's not so easy.

Say that, even if you set the cutoff in the perfect place, the test will fail 5% of students it should pass, and pass 7% of students it should fail. If you make it easier, it'll still pass that 7% (and even more) and if you make harder, it'll fail that 5% (and even more). You can't win by either raising or lowering the bar -- you just need a better test.
Third, inter-group bias. Say that, in the previous example:
- The 5% of students that the test fails (but who would, in fact, go on to do well if given the chance) were hard workers who came to every class, did very well on homework assignments, but underperformed on the final exam for some reason.
- The 7% of students that the test passed (but who would later do poorly) all crammed hard for the test, but had decidedly poor performance on their week-in, week-out work.
If you weren't looking at "worked hard" as a factor, you might just write this error off as variance, but when you do look at it, it becomes clear that you can make the test better by taking into account students' prior record of hard work or slacking. If you raise the bar on the slackers and cut the hard workers some slack, you'll find, you'll do a better job at both including those who belong and excluding those who don't.

Notice that this inter-group bias, because it is different between groups, can't just be fixed by raising or lowering the overall difficulty. If you make things harder, in order to cut out the mistaken passes, then you'll be producing even more erroneous failures. (And, vice versa, if you try to lower the bar, you'll get even more false positives.) Either way, unless you re-work the test to fix the bias between groups, your test will always be letting in too many from one group, and not enough from another.

Okay, so in concrete terms: I claim that a sink-or-swim, do-or-die, killer-hard CS 101 that only lets the toughest, most pointer-grokking students through is a bad test for future programming success. I won't argue Spolsky's point on overall bias -- maybe you can't just lower the bar without letting in students set up for failure. But it can be improved in the other two sources of error:

First: it has an enormous amount of variance. Think about something you now like doing, and are kinda good at. If someone had made you decide, after three weeks of trying it, whether you were good enough to keep doing it seriously, what would you have said? Maybe you would have gotten the answer right, but maybe not. The problem is, judging students based on their first experience with one corner of a huge academic field is only a little better than a shot in the dark. (I'll digress on this point in 2A.)

And secondly: the more high-pressure you make the entrypoint, the more you exacerbate the inter-group bias introduced by impostor syndrome and stereotype threat. These phenomena will always give you higher rates of attrition in groups that are told by society that they don't belong, but the very minimum you can do, when laying out a guiding philosophy for the entry point of an academic field, is not to amplify those effects in a designed-to-be-intense crucible of self-selection and self-judgment. (I'll pick this point up in 2B.)

We now return to your regularly scheduled math-light social theorizing.

(2a)

previously: "The problem is, judging students based on their first experience with one corner of a huge academic field is only a little better than a shot in the dark…"

To be honest, this is probably a respect in which Spolsky's essay simply failed to age well. Ten years ago, if you couldn't write a software application designed with explicit knowledge of how computers worked, you faced a lot more barriers in the field than you do nowadays. Nowadays, computers have gotten pretty insanely fast, and you can do some pretty incredible things just by writing brute-force scripts using, ironically enough, things like Java. (Okay, but seriously; don't use Java, use Python.) Maybe you have some difficulty with abstraction and pointers, but that doesn't mean that you can't use off-the-shelf tools to run perform incredible feats of statistical analysis or create novel, useful visualizations of quantitative data.

The world has changed, and while computer science is still proofs, algorithms, languages, operating systems, compilers, and so on, programming computers has become so much more. And, since the academe hasn't yet seen fit to separate computer-science-as-science and programming-as-practice, we're going to need to make our tent large enough to educate students from across the sciences -- and beyond -- seeking just enough computer-savvy to achieve their own (diverse, important, and computationally-enabled) goals.

aside: I've previously gone on record bemoaning the fact that "Electrical Engineering and Computer Science" makes about as much sense as "Optics and Cosmology" for an academic department. While "Computer Science and Programming Computers" is, y'know, better, it's still pretty crazy when you think about it.

Programming is important for all sorts of people doing work in any number of fields, and computer science is amazingly fun as a mathematical topic of inquiry, but Scott Aaronson gets it absolutely right in describing the latter as more "quantitative epistemology" than "study of computers".

I understand that our field is young (75 years at the most?), and that generally, branchings of the academic tree take much longer than that to become accepted, but I'm impatient enough to want these fields of study unyoked and set free to run their separate (important, incredibly cool) paths now.

(2b)

previously: "And the worst part of all is: the more high-pressure you make the entrypoint, the more you exacerbate the inter-group bias introduced by impostor syndrome and stereotype threat, right from the start."

In plain English, because it's an incredibly important point:

If you put students through the wringer to weed out those with insufficient grit, you're asking some students to climb a mountain, while others climb molehills. On the one hand, you've got a black student facing down a slate of cognitive effects known to induce anxiety and harm performance, and on the other hand, you've got an Asian student who simply doesn't have any reason to doubt that he'll get it eventually. One drops out, the other doesn't, and your weedout course has weeded out those who were made most comfortable in a first CS course, not those with the greatest capacity for rising to the challenge.

It's simply the case that some students were challenged too much, and others, not enough. You can't fix the problem by simply raising or lowering the bar, because different students are having completely different experiences when you ask them to 'stand the heat or get out of the kitchen'...and so, when you turn the difficulty up to 11 in an effort to retain only those you expect to be worthy, your CS classes end up white, Asian, male, and upper-middle-class...whether or not you intended to have that effect.

(3)

How do you fix it?

Well, it's hard, and I'll defer answering that question to a sequel post, since this one has gone on long enough.

For starters, though, being careful with differentially scary jokes on the first day is probably a good start:

Scariness and Self-Selection: A Shopping-Week Meditation

Icosian Reflections

…a tendency to systematize and a keen sense

that we live in a broken world.

"Weedout" Courses Considered Harmful

(1)

(2)

(2a)

(2b)

(3)

Scariness and Self-Selection: A Shopping-Week Meditation

Oral Arguments from Whole Women's Health v. Hellerstedt