Digital Writing: Assessment and Evaluation

Afterword

Not Just a Better Pencil

Edward M. White

My introduction to computers was typical for my now-retired generation. In 1978, our English department gained a Vector-Graphic and a few hours with a computer expert to introduce us to this new improvement on the electric typewriter. I took to it right away, though it was tough to work on. None of the commands to make it work made sense, and we gave it its own room with three blackboards, chalking out instructions for how to backspace, delete, and paragraph. But I liked the idea of the big black disk that held something called a program and could also store a text, let you return to it, and revise it without erasing fluid. By the time the first commercial PCs arrived, two years later, I bought one right away despite its outrageous cost—an IBM PC1 with an impact printer for $6,000—and started drafting the conference paper on post-structural literary theory that I had promised for the December MLA conference. I knew it would take me a dozen drafts, and we had lost the secretary who used to type my papers. It was me and the machine, with me shakily in charge of the relationship. We got along perfectly well and the paper was well received. The next year WordStar appeared, and I never looked back.

The computer was and, until recently, remained for me what Dennis Baron (2009) called his latest book: A Better Pencil. And a fine pencil it was, as every year it got sharper and sharper, producing not only text but information of all sorts. Bit by bit it became indispensable—in all the ways readers of this book would find tiresome if I were to list the ways here. But the computer remained a tool for me, and did not change my world; the computer just made my world more convenient and efficient. Even when I collaborated with a friend and colleague, Ken McAllister, to write a history of computer scoring of writing for the collection edited by Patricia Ericsson and Haswell (2006), it was Ken who helped me see the connection between computer translation, natural language, and the algorithms that help computers put a score on student writing samples. Our main concern at the time was to trace the increasingly prominent role of commercial and “nonprofit” vendors alongside decreasing research funding. Although we left the door open for the future, it was still clear to us—as to almost all the contributors to that 2006 collection—that computers could not read; a pencil was still a pencil and had no business pretending otherwise. And writing was still what we knew to be writing, even if it was contained in an eportfolio.

But the inexorable march of technology and its integration into all of our lives has shaken my view. This pencil has gotten out of hand and has entered our bloodstream. This volume has made me entertain a new world view, one in which technology has reshaped the world at its core. Every contributor to Digital Writing Assessment and Evaluation simply assumes that the world is now quite different than it was and that writing is no longer what it used to be. Most remarkable to me is the way digital writing has changed the construct of writing itself, a realization that has truly shaken my world. If I knew anything at all, I knew what writing was and how to teach and assess it. And writing took place in a print environment, so I assumed. Now I can do that no longer. The agreement in this book, and generally among those writing in the field, is that writing in the digital environment is in essence quite different from writing before that environment became established. That case is by now so compelling that it hardly needs to be argued.

But I am encouraged that this agreement—that the production of writing, and the much expanded construct of writing, are new and different and quite wonderful—does not carry over to the evaluation of the writing so produced. As Charles Moran and Anne Herrington say at the start of their chapter in this book, “the development of emerging technologies has so increased the apparent difficulty and complexity of assessing student composing that as teachers we are glad to describe student adventures in multimodal composing, but when it comes to laying out our assessment procedures or criteria, we are most often silent.” Most of the writers in this book tend to avoid the fraught issue of Automatic Essay Scoring (AES), as indeed they should. When I wrote a book for writing teachers, my title implied a straight line from the production of texts to their evaluation: Assigning, Responding, Evaluating: A Writing Teacher’s Guide (2007). That is, an assignment with a clear pedagogical purpose led to student writing, followed by teacher and peer response to that writing, and at the end an assessment of the degree to which the writing fulfilled the assignment. But I was not quite correct to imply that this process was linear; it really is circular. That is, the assessment of the writing has as much or more to do with defining the assignment, and hence the construct being measured as the assignment itself. I did imply that a good writing assignment should be clear about how it will be assessed, but I took the assessment as a relatively minor and ancillary matter, really a bookkeeping extension of what was most important for teaching—the response. And I didn’t deal with AES at all, considering it to be at most an aid to editing and an untrustworthy assistant to teacher reading and response.

Indeed, it is hard to find discussions of assessment in the book you have in your hands—or, I should say, on your computer screen—that differ much from conventional practices by enlightened teachers in the past. The criteria and procedures for assessment have some new wrinkles, to be sure, but in comparison to the revolution in writing that has occurred, they are familiar: a teacher or a group of teachers defines the criteria for assessment, with attention to the context and goals of the assignment, and applies the tools available as fairly and efficiently as possible. The emphasis in this collection is ways to assess this exciting new writing and on ways digital technologies enable people to assess in some different configurations. But what is not discussed is what I consider the elephant in the room, which from my perspective is distinctly oppressive: assessment by computer and by various instructional platforms. While we talk pleasantly about the brave new world of writing that computers have ushered in, a darker side of technology has been making important inroads into the very center of writing itself. Although it is clear that assessment in a digital environment can mean more than scoring by computer, I suspect that we must nevertheless come to terms with the revolution in scoring happening all around us.

As I write this afterword in 2013, it is clear to me that we need to distinguish the issues of writing in a digital environment from those of assessment in a digital environment, a difficult set of problems that has become exceedingly complex and tangled in political and economic as well as pedagogical problems. Huge amounts of money are now behind a major sales effort to replace human response and grading of writing with machine scoring; one example is the $330 million allotted by the United States Department of Education in 2012 for scoring (probably by one of the nine prominent computing scoring programs) of writing as part of the “Race to the Top” (U. S. Department of Education, 2012). What began (with Writer’s Workbench in 1982; see Day, 1988 ) as a support to writers and teachers is now more and more becoming an economic way, as the sales force puts it, to relieve teachers of the burden of reading and grading writing. Writing teachers, for whom responding to writing is the chief means of teaching writing, would phrase this rather differently.

This disjuncture regarding what it means to teach writing reveals that the capability of AES is becoming a new, often contested way of defining the construct of writing itself. When human readings do not correlate well with computer scores, the readers are sometimes urged to read more like computers, so that high score correlations can be obtained. We have already seen examples of this in large-scale essay readings and in some research projects, where efficiency, speed, and economy start to replace older criteria for assessing writing, and confirmation bias makes inroads into studies. Readers of the short essay on the SAT have been instructed to ignore errors in fact (hard for computers to discern) on the essays they read, so it is possible for a student who confuses Martin Luther with Martin Luther King, or the Civil War with the American Revolution, to receive top grades. A widely popular research study arguing that students learn little that matters in college depends on one short test, mostly graded by computer, given to second-semester sophomores (Arum & Roksa, 2010). Another study affirming the value of computer scoring defines writing as very short impromptu writing—300 to 900 words—a definition more suited for computer scoring than for most other purposes for writing (Shermis & Hamner, 2012).

This contested site of assessment gives new meaning to the old adage: “That which is measurable drives out that which is important.” And it is far from clear that AES, at least at present, can identify, much less assess, a writing construct that relates well to the rhetorical tradition, developed over the last two thousand years, and stressing, to paraphrase Quintilian, good people writing well. Or, as Aristotle described it, discovery, arrangement, and style, a means of honest and supported argument for public purposes. Or the “habits of mind” that lead to thought and making that thought visible in writing. This background in rhetoric forms the foundation of the writing programs in rhetoric and composition at most universities in America, and has been stated compactly in many documents that have been referred to throughout this collection: the “Outcomes Statement for First-Year Composition” by the Council of Writing Program Administrators (2008), as well as in the “Framework for Success in Postsecondary Writing” (Council of Writing Program Administrators, National Council of Teachers of English, & the National Writing Project, 2011) and Common Core State Standards Initiative (National Governors Association Center for Best Practices & Council of Chief State School Officers, 2010). As befitting the wide diversity of American higher education, campus writing programs are not only rooted in the rhetorical tradition but also reflect the environment of that campus, including the nature of the student body, the scope of writing instruction, the campus mission—and local challenges of writing in a digital environment. It should be no surprise that the sudden emergence of AES has evoked strong hostility from many faculty members committed to the history and theories of rhetoric as well as to the special parameters of the writing programs suited to their campuses.

At the same time, it would be shortsighted as well as impolitic to simply dismiss AES—as many in the humanities tend to do—because its capacities have been oversold and overstated at the present moment. The astonishing advances in computer capabilities over the last generation must raise hopes as well as fears for the future of writing and writing assessment. It is altogether probable that within the next decade or two both responding to and assessing student writing will involve technology as well as human interaction, working together to help students improve their thinking and writing. As Samuel Messick (1989) suggested, the validity of an assessment is not about a test but, rather, about the uses of the assessment. Thus, assessing writing in a digital environment will require the same care and progress that have led to the now-common practice of writing in a digital environment. I cannot foresee, despite a sense of foreboding, technology replacing human readers of writing—after all, machines cannot read, however superior they may be at counting—though technology may become an indispensable support for teachers working with students at different levels, working in various formative ways.

And that becomes the sticking point. If, or when, economic pressures lead to the replacement of teachers responding to writing by machines as the audience for student writing, then the capacity of the machines will determine the construct of writing. Students will write to machines, a natural enough move for generations brought up challenging machines on computer games, rather than writing for their peers or their teachers. Students will write to machines just as surely as they now write their SAT or AP essays to the armies of dulled readers described by Todd Farley (2009) in Making the Grades: My Misadventures in the Standardized Testing Industry. This is the stuff of dystopias, where cyborgs take over the world and make humans their slaves—a common enough theme in the movies. The way to keep this from happening, as these fictions instruct us, is to get there first, if possible, and enforce power over them; we need to keep HAL from taking over the universe (as he—it?—almost does in 2001: A Space Odyssey) and replacing human needs with inhuman ones.

And thus the importance of this volume becomes apparent. Although its chapters do not really demonstrate inventiveness in the use of technology for assessment—inventiveness we have seen in expanding the scope of writing in a digital environment, without diminishing or constricting the concept of writing—its chapters chart paths in that direction. It is a treat to see the new concept of eportfolios portrayed by Kathi Yancey, Stephen McElroy, and Elizabeth Powers, for instance, and the wonderfully inventive student film described by the team called the Multimodal Assessment Project. The underlying vision of assessment is based, as it should be, in an inventive and exciting classroom, with much peer response and overarching sensitive teacher assessment. Throughout the book, we see assessment measures emerging from reconceived writing curricula, calling for personalized attention to the creativity elicited by exceptional teachers. In fact, this volume stands as a kind of response to AES, perhaps the only kind of response that makes sense: It is anchored fully in the teaching of writing in digital environments and considers a wide range of possible assessments, all related clearly to the teaching situation. It considers assessment as an integral part of teaching in any environment and directs our attention consistently to the best teaching practices available in this new digital environment.

If we are to resist the assessment of writing and writing programs by those unfamiliar with and uninterested in writing as a means of discovery, learning, and imagining, we need books like this one, and many more of them. We need to demonstrate that the creative environment fostered by writing in digital environments can be matched by even more creativity in the assessment of such writing. We need to show that assessment of writing in the digital age does not ignore the teachers and students in writing classes, in its attention to industrial-strength rubrics and corporate-sponsored definitions of what is important in education. I applaud the editors of and contributors to this volume, which I expect to open many avenues of exploration, experimentation, and innovation in assessment.

REFERENCES

Arum, Richard, & Roksa, Josipa. (2010). Academically adrift: Limited learning on college campuses. Chicago, IL: University of Chicago Press.

Baron, Dennis. (2009). A better pencil: Readers, writers, and the digital revolution. New York: Oxford University Press.

Council of Writing Program Administrators. (2008). WPA outcomes statement for first-year composition. Retrieved from http://wpacouncil.org/positions/outcomes.html

Council of Writing Program Administrators, National Council of Teachers of English, & the National Writing Project. (2011). Framework for success in postsecondary writing. Retrieved from http://wpacouncil.org/files/framework-for-success-postsecondary-writing.pdf

Day, John T. (1988). Writer's Workbench: A useful aid, but not a cure-all. Computers and Composition, 6 (1), 63–78. Ericsson Patricia F., & Haswell, Richard. (Eds.). (2006). Machine scoring of student essays: Truth and consequences. Logan: Utah State University Press.

Farley, Todd. (2009). Making the grades: My misadventures in the standardized testing industry. San Francisco, CA: Berrett-Koehler Publishers.

McAllister, Ken S., & White, Edward M. (2006). Interested complicities: The dialectic of computer-assisted writing assessment. In Patricia F. Ericsson & Richard Haswell (Eds.), Machine scoring of student essays: Truth and consequences (pp. 8–27). Logan: Utah State University Press.

Messick, Samuel. (1989). Validity. In Robert L. Linn (Ed.), Educational measurement (pp. 13–103). New York: American Council on Education and Macmillan.

National Governors Association Center for Best Practices, & the Council of Chief State School Officers. (2010). Common Core State Standards Initiative. Washington, DC: National Governors Association Center for Best Practices, Council of Chief State School Officers. Retrieved from http://www.corestandards.org

Shermis, Mark D., & Hamner, Ben. (2012, April). Contrasting state-of-the-art automated scoring of essays: Analysis. Paper presented at the annual meeting of the National Council of Measurement in Education, Vancouver, BC, Canada. Retrieved from http://www.scoreright.org/NCME 2012 Paper3 29 12.pdf

U. S. Department of Education. (2012). Race to the Top Assessment Program. Retrieved from http://www2.ed.gov/programs/racetothetop-assessment/index.html

White, Edward M. (2007). Assigning, responding, evaluating: A writing teacher's guide (4th ed.). New York: Bedford/St. Martin's.

REFERENCES

Return to Top