AI writes passable college papers

A post by Eduref, a company that deals in information about postsecondary education, has gotten some attention for doing a Turing Test for course papers produced by the AI language model, GPT-3.

We hired a panel of professors to create a writing prompt, gave it to a group of recent grads and undergraduate-level writers, and fed it to GPT-3, and had the panel grade the anonymous submissions and complete a follow up survey for thoughts about the writers. AI may not be at world-dominance level yet, but can the latest artificial intelligence get straight A’s in college?
As the saying goes, “C’s get degrees.” Straight A’s in college, however, are far from common, and with AI being far from perfect, GPT-3 performed in line with our freelance writers. While human writers earned a B and D on their research methods paper on COVID-19 vaccine efficacy, GPT-3 earned a solid C. Performing a bit better in U.S. History, humans received a B and C+ on their American exceptionalism paper, while GPT-3 landed directly in the middle with a B-. Even when it came to writing a policy memo for a law class, GPT-3 passed the assignment with a B-, with only one of three students earning a higher grade.

(More coverage at ZDNet and Inside Higher Education.)

Last year, I wrote something for The Conversation, based on my experience of using GPT-2, an earlier version of the same language model, to produce papers for an anthropology course. At the time, GPT-2 was close but not quite close enough.

I concluded:

While computer writing might never be as original, provocative, or insightful as the work of a skilled human, it will quickly become good enough for such writing jobs, and AIs won’t need health insurance or holidays.
If we teach students to write things a computer can, then we’re training them for jobs a computer can do, for cheaper.
Educators need to think creatively about the skills we give our students. In this context, we can treat AI as an enemy, or we can embrace it as a partner that helps us learn more, work smarter, and faster.

From all accounts, GPT-3 seems much more capable as is than GPT-2 was. While GPT-3 is not widely available, it won’t be long before it or something like it is. This means we need to rethink what writing assignments are, and what we want them to do.

John Warner, in Inside Higher Ed, suggests a change to how we approach grading is sorely needed:

In this case the problem is in our well-trodden patterns of how we assess student work in the context of school. [GPT-3’s] response is grammatical, it demonstrates some familiarity with the course and it is not wrong in any significant way.

It is also devoid of any signs that a human being wrote it, which, unfortunately does not distinguish it from the kinds of writing students are often asked to do in school contexts, which is rather distressing to consider, but let’s put that aside for the moment.

When confronted with this kind of work, what if we did something differently?

What if we replaced that … sigh … B with a “not complete, try again”?

Because honestly, isn’t that a more appropriate grade than the polite pat on the head that the B signals in this case?

This seems like an opportunity to put something like Labour-Based Grading into wider use.

(Previously (1), (2) on GPT-3.)

(Visited 160 times, 1 visits today)

Share this:

Leave a Reply Cancel reply