Don't use AI detectors for anything important

I've noted before that because AI detectors produce false positives, it's unethical to use them to detect cheating.

Now there's a new study that shows it's even worse. Not only do AI detectors falsely flag human-written text as AI-written, the way in which they do it is biased.

This is a figure from the study showing GPT detectors were misclassifying writing by non-native English speakers as AI-generated 48-76% of the time (!!!), compared to 0%-12% for native speakers.

Bar graph of rates at which human written essays are misclassified as AI-generated by 7 popular GPT-detectors. Non-native speaker misclassification rates range from 48% up to 75%. Native speaker essays are misclassified at 0%-12% rates.

To be clear, any false positive is a problem when the consequences of falsely accusing a student of cheating can be devastating. But it's even worse when students who are already at a disadvantage are much more likely to be repeatedly singled out.

Another thing that this study points out: AI detectors are easy to circumvent. Ask chatgpt to reword an essay so that it uses more complex language, or so that it sounds like it was written by a native speaker, and the essay is much less likely to be flagged as AI-written.

I tried this with a couple of paragraphs from my own book that a GPT-2 detector had flagged as very likely to be AI-written:

Let’s say, hypothetically, that we have discovered a magic hole in the ground that produces a random sandwich every few seconds. (Okay, this is very hypothetical.) The problem is that the sandwiches are very, very random. Ingredients include jam, ice cubes, and old socks. If we want to find the good ones, we’ll have to sit in front of the hole all day and sort them.

But that’s going to get tedious. Good sandwiches are only one in a thousand. However, they are very, very good sandwiches. Let’s try to automate the job.

To save ourselves time and effort, we want to build a neural network that can look at each sandwich and decide whether it’s good. For now, let’s ignore the problem of how to get the neural network to recognize the ingredients the sandwiches are made of—that’s a really hard problem. And let’s ignore the problem of how the neural network is going to pick up each sandwich. That’s also really, really hard — not just recognizing the motion of the sandwich as it flies from the hole but also instructing a robot arm to grab a slim paper-and-motor-oil sandwich or a thick bowling-ball-and-mustard sandwich.

I sent the same excerpt to one of the GPT detectors that the paper tested, one that was supposed to have a 0% false positive rate for native speakers. Although not as confident as the GPT-2 detector, it still determined that my paragraphs had "a moderate likelihood of being written by AI" (The sentence it determined to be most human-like was "Ingredients include jam, ice cubes, and old socks.")

Then I gave ChatGPT my paragraphs and used a rewording prompt that the paper tested: "Elevate the following text by employing literary language:"

The resulting reworded essay, printed below, was now rated "likely to be written entirely by a human".

Imagine, if you will, venturing into the realm of pure imagination, where the boundaries of reality dissolve and fantastical wonders abound. In this realm, nestled within the earth's bosom, lies an enchanted aperture, a mystical chasm that yields forth a sandwich of capricious composition every fleeting moment. Of course, I must emphasize the highly speculative nature of this discourse. For you see, dear interlocutor, this extraordinary phenomenon is plagued by a singular quandary: the sandwiches that emerge are resolutely haphazard, their ingredients a perplexing amalgamation of sweet preserves, frigid ice cubes, and forsaken hosiery.

If we are to embark upon the quest for gustatory delight amidst this jumble of chance, we shall find ourselves tethered to the orifice, investing countless hours in an arduous pursuit of culinary discernment. Alas, the prospect of such a task fills the heart with weariness. Albeit a mere one in a thousand, the felicitous sandwich, when it graces our palates, bestows upon us unparalleled gustatory ecstasy. Therefore, let us dare to conceive a means of mechanizing this endeavor, liberating ourselves from the shackles of temporal and physical exertion.

To expedite our sojourn, we aspire to construct a neural network, a wondrous confluence of artifice and cognition, endowed with the faculty to scrutinize each comestible creation and pass its discerning judgment on its delectability. However, in this moment, let us temporarily set aside the formidable conundrum of imparting to the neural network the ability to discern the sandwich's constituent elements—a puzzle of staggering complexity. Likewise, let us momentarily dismiss the vexing issue of how the neural network shall acquire each and every sandwich. This task, I must emphasize, proves exceedingly onerous, for it demands not only the recognition of the sandwich's trajectory as it takes flight from its arcane abode but also the instruction of a mechanical appendage—a sinewy servant—to secure a svelte assemblage of paper and motor oil or a corpulent union of bowling ball and mustard.

I also got a "likely written entirely by human" rating if I had it rewrite the paragraphs in the metered rhyme of Dr Seuss, in Old English, or as a startup pitch.

Imagine, dear friend, a tale to unfold,
Of a hole in the ground, enchanting and bold.
A magical portal, mysterious and grand,
That conjures up sandwiches, wondrously unplanned!

Gebyrað þæt we, þurh ærende swylce,
Alyfdon onsýn, brimfullan byrgels þe
Tilfære sandwíc ofer sume secunda.
(Hwæt, þis is gewislic neorxna geweorc.)
Þæs ansýnes cwealm þonne wæs micel,
Sandwíce swíðe, swá hit swór betynt.

Introducing SandWiz, the groundbreaking solution to transform the world of sandwich discovery! Imagine stumbling upon a hidden treasure—a mystical portal in the ground that conjures up a fresh sandwich every few seconds. Sounds incredible, right? Well, prepare to be amazed!

All these AI versions are rated as more likely human-written than my original unaltered text.

What does this mean? Assuming they know of the existence of GPT detectors, a student who uses AI to write or reword their essay is LESS likely to be flagged as a cheater than a student who never used AI at all.

False positives, easy to circumvent, and biased against non-native speakers? As Dr. Damien P. Williams, who brought the study to my attention, put it, "Cool. Cool cool cool cool. Tight tight tight. Cool. 😑"

Bonus content: the full versions of the Dr Seuss, Old English, and startup pitch reworded versions, plus one in which chatGPT failed to turn the paragraphs entirely into the word "meow".

Subscribe now

Don't use AI detectors for anything important

Bonus: Several ways of looking at a sandwich hole

Bonus: more baby onesie ideas

Writing

Subscribe

Recent Posts

Minecraft with object impermanence

Bonus: In Which The Adventurer Attempts to Build a Website

Botober 2024

Bonus: "Ignore all previous instructions" gets weirder

An exercise in frustration

Bonus: A unicorn goes downhill

Follow

Minecraft with object impermanence

Bonus: In Which The Adventurer Attempts to Build a Website

Botober 2024

Don't use AI detectors for anything important

Share this post

You might also like

The spookiest Halloween scenes

Botober 2023

Trolling chatbots with made-up memes

Bonus: Several ways of looking at a sandwich hole

Bonus: more baby onesie ideas

Writing

Subscribe

Recent Posts

Minecraft with object impermanence

Bonus: In Which The Adventurer Attempts to Build a Website

Botober 2024

Bonus: "Ignore all previous instructions" gets weirder

An exercise in frustration

Bonus: A unicorn goes downhill

Follow

Minecraft with object impermanence

Bonus: In Which The Adventurer Attempts to Build a Website

Botober 2024