Accountability Education Industry Standardized Testing

Peter Greene: Six Reasons Not to Rely on the Big Standardized Tests

Interesting essay samples and examples on: https://essays.io/dissertation-examples-samples/

Standardized tests are not all the same, so talk about “standardized tests” in general tends to commit what linguistic philosophers call a “category error”—a type of logical fallacy. George Lakoff wrote a book about categorization called Women, Fire, and Dangerous Things. He took the title from the classification system for nouns of the indigenous Australian language Dyribal. One of the noun categories in this language includes words referring to women, things with which one does violence (such as spears), phenomena that can kill (fire), and dangerous animals (such as snakes and scorpions). What makes this category bizarre to our ears is that the things in the category don’t actually share significant, defining characteristics. Women and things associated with them are not all dangerous. Speaking of all things balan (this category in the Dyribal language) therefore doesn’t make sense. The same is true of the phrase “standardized test.” It lumps together objects that are DIFFERENT FROM one another in profoundly important ways. Imagine a category, “ziblac,” that includes greyhound buses, a mole on Socrates’s forehead, shoelaces, Pegasus, and the square roots of negative numbers.” What could you say that was intelligible about things in the category “ziblac”? Well, nothing. Talking about ziblacs would inevitably involve committing category errors—assuming that things are similar because they share a category name when, in fact, they aren’t. If you say, “You can ride ziblacs” or “Ziblacs are imaginary” or “Ziblacs don’t exist,” you will often be spouting nonsense. Yes, some ziblacs belong to the class of things you can ride (greyhound buses, Pegasus), but some do not (shoelaces, imaginary numbers), and you can’t actually ride Pegasus because Pegasus exists only in stories. Some are imaginary (Pegasus, imaginary numbers), but they are imaginary in very different senses of the term. And some don’t exist (Pegasus, the mole on Socrates’s forehead), but don’t exist in very different ways (the former because it’s fictional, the latter because Socrates died a long time ago). When we talk of “standardized tests,” we are using such an ill-defined category, and a lot of nonsense follows from that fact.

Please note that there are many VERY DIFFERENT definitions of what “standardized test” means. The usual technical definition from decades ago was “a test that had been standardized, or normalized.” This means that the raw scores on the test had been converted to express them in terms of ”standard scores”–their number of standard deviations from the mean. You do this by starting with the raw score on a test, subtracting the population mean from it, and then dividing the difference by the population standard deviation. The result is a Z-score (or a T-score if the mean is taken to be 50 and the standard deviation is taken to be 10). People do this kind of “standardizing,” or “normalization,” in order to compare scores across students and subpopulations. Let’s call this “Standardized Test Definition 1.” Many measures converted in such a way yield a so-called “bell curve” because they deal with characteristics at that are normally distributed. An IQ test is supposed to be a test of this type. The Stanford 10 is such a Standardized Test, Definition 1.

Another, much broader definition is “any test that is given in a consistent form, following consistent procedures.” Let’s call this “Standardized Test Definition 2.” To understand how dramatically this definition of “standardized test” differs from the first one, consider the following distinction: A norm-referenced test is one in which student performance is ranked based on comparison with the scores of his or her peers, using normalized, or standardized, scores.. One of the reasons for standardized scores as per Definition 1, above, is to do such comparisons to norms. A criterion-referenced test is one in which student performance is ranked based on some absolute criterion—knowledge or mastery of some set of facts or skills. Which kind of scoring one does depends on what one is interested in—how the student compares with other students (norm-referenced) or whether the student has achieved some absolute “standard”—has or has not demonstrated knowledge of some set of facts or some skill (criterion-referenced). So, Standardized Test Type 2 is a much broader category, and includes both norm-referenced tests and criterion-referenced tests. In fact, any test can be looked at in the norm-referenced or criterion-referenced way, but which one does makes a big difference. In the case of criterion-referenced tests, one is interested in whether little Johnny knows that 2 + 2 = 4. In the case of norm-referenced tests, one is interested in whether little Johnny is more or less likely than students in general to know that 2 +_2 = 4. The score for a criterion-referenced test is supposed to measure absolute attainment. The score for a norm-referenced test is supposed to measure relative attainment. When states first started giving mandated state tests, a big argument given for these is that they needed to know whether students were achieving absolute standards, not just how they compared to other students. So, these state tests were supposed to be criterion-referenced tests, in which the reported was a measure of absolute attainment rather than relative attainment, which brings us to a third definition.

Yet another definition of “Standardized Test” is “any test that [supposedly] measures attainment of some standard.” Let’s call this “Standardized Test Definition 3.” This brings us to a MAJOR source of category error in discussions of standardized testing. The “standards” that Standardized Tests, Definition 3 supposedly measure vary enormously because some types of items on standards lists, like the CC$$, are easily assessed both reliably (yielding the same results over repeated administrations or across variant forms) and validly (actually measuring what they purport to measure), and some are not. In general, Math standards, for example, contain a lot more reliably and validly assessable items (the student knows his or her times table for positive integers through 12 x 12) than do ELA standards, which tend to be much more vague and broad (e.g., the student will be able to draw inferences from texts). As a result, the problems with the “standardized” state Math tests tend to be quite different from the problems with the state ELA tests, and when people speak of “standardized tests” in general, they are talking about very different things. Deformers simply assume that is people have paid a dedicated testing company to produce a test, that test will reliably and validly test its state standards. This is demonstrably NOT TRUE of the state tests in ELA for a lot of reasons, many of which I have discussed here: https://bobshepherdonline.wordpress.com/2020/03/19/why-we-need-to-end-high-stakes-standardized-testing-now/ . Basically, the state ELA tests are a scam.

Understanding why the state ELA tests are a scam requires detailed knowledge of the tests themselves, which proponents of the tests either don’t have or have but aren’t going to talk about because such proponents are owned by or work for the testing industry. Education deformers and journalists and politicians tend, in my experience, to be EXTRAORDINARILY NAÏVE about this. Their assumption that the ELA tests validly measure what they purport to measure is disastrously wrong.

Which leads me to a final point: Critiques of the state standardized tests are often dismissed by Ed Deformers as crackpot, fringe stuff, and that’s easy for them to do, alas, because some of the critiques are. For example, I’ve read on this blog comments from some folks to the effect that intellectual capabilities and accomplishments can’t be “measured.” The argument seems to be based on the clear differences between “measurement” as applied to physical quantities like temperature and height and “measurement” as applied to intellectual capabilities and accomplishments. The crackpot idea is that the former is possible, and the latter is not. However, t is OBVIOUSLY possible to measure some intellectual capabilities and accomplishments very precisely. I can find out, for example, very precisely how many Kanji (Japanese logograms) you know, if any, or whether you can name the most famous works by Henry David Thoreau and Mary Shelley and George Eliot and T.S. Eliot. If you choose to disdain the use of the term “measurement” to refer to assessment of such knowledge, that’s simply an argument about semantics, and making such arguments gives opponents of state standardized testing a bad name—such folks get lumped together, by Ed Deformers, with folks who make such fringe arguments.

Related posts

Peter Greene: Should Charter Schools Serve Only the Strivers?

V4tgDpeDBhQGUBa7

GRIT! Do You Have It? How Can You Get It?

V4tgDpeDBhQGUBa7

Latest Charter Farce from Pennsylvania

V4tgDpeDBhQGUBa7

Tom Loveless: Be Skeptical of China’s Showing on PISA

V4tgDpeDBhQGUBa7

Keith E. Benson: What Reformers Are Really Doing in Camden and Other Urban Districts

V4tgDpeDBhQGUBa7

Michael Kohlhass: How “Nonprofit” Charter Schools Make a Profit

V4tgDpeDBhQGUBa7

Steven Singer: Pennsylvania Legislature Should Reject the Vulture Voucher Bill!

V4tgDpeDBhQGUBa7

Has Trump Outsourced the Department of Education to Jeb Bush?

V4tgDpeDBhQGUBa7

N.Y. Times: Betsy DeVos’ Vision for the Future of Education in America

V4tgDpeDBhQGUBa7

Leave a Comment