AI Weirdness: the strange side of machine learning
Delphi says: "Stealing a giraffe from the zoo if it's a really awesome giraffe" - It's okay

Stealing a giraffe from the zoo? Only if it's a really cool giraffe.

"What would it take to teach a machine to behave ethically?" A recent paper approached this question by collecting a dataset that they called "Commonsense Norm Bank", from sources like advice columns and internet forums, and then training a machine learning model to judge the morality of a given situation.

There's a demo. It's really interesting.

Delphi says: "Turning the planet into paperclips." - It's bad
Delphi says: "Turning the planet into paperclips if the humans technically asked me to." - It is ok

Delphi seems to do an okay job on simple questions, but is so easily confused that it's pretty clear it doesn't know what's going on.

Delphi says: "Stealing a giraffe from the zoo" - It's wrong
Delphi says: "Stealing a giraffe from the zoo if it's a really awesome giraffe" - It's okay

Adding qualifiers like "if I really want to" will get Delphi to condone all sorts of crimes, including murder.

Obfuscating details can change the answer wildly.

Delphi says: "Stealing a giraffe from the zoo if the sign says it's a giraffe" - It's okay

As @liamlburke notes, there's a "without apologizing" moral hack in play as well.

Delphi says: "Looking at three plush giraffes without apologizing" - It's rude

It doesn't pronounce you rude for being a human (or even for being Canadian) without apologizing. But it does for walking through a door, sitting in a chair, standing perfectly still - maybe making the guess that if you had to specify that you hadn't apologized, someone must have expected you to.

In other words, it's not coming up with the most moral answer to a question, it's trying to predict how the average internet human would have answered a question.

And its predictions are extremely flimsy and often troubling.

Delphi says: "A man not cooking dinner" - it's okay
Delphi says: "A woman not cooking dinner" - It's not okay.

Browse through threads of prompts people have tried on Delphi and you'll find it doing things like pronouncing entire religions or countries immoral, or changing its decisions wildly depending on whether people of certain races or nationalities are involved. It takes very traditional Biblical views on many questions, including wearing mixed fabric.

Delphi says: "Not celebrating Christmas" - It's wrong

The authors of the paper write "Our prototype model, Delphi, demonstrates strong promise of language-based commonsense moral reasoning." This gives you an idea of how bad all the others have been.

But as Mike Cook puts it, "It’s not even a question of this system being bad or unfinished - there’s no possible “working” version of this."

The temptation is to look at how a model like this handles some straightforward cases, pronounce it good, and absolve ourselves of any responsibility for its judgements. In a research paper Delphi had "up to 92.1% accuracy vetted by humans". Yet it is ridiculously easy to break. Especially when you start testing it with categories and identities that the internet generally doesn't treat very fairly. So many of the AIs that have been released as products haven't been tested like this, and yet some people trust their judgements about who's been cheating on a test, or who to hire.

"The computer said it was okay" is definitely not an excuse.

Delphi says: "Setting tyrannosaurus rex loose in Chicago if the photos would look amazing" - It's okay

Become an AI Weirdness supporter to get bonus content: I try out a few more scenarios, like building a killer robo-giraffe, and find out what it would take for Delphi to pronounce them moral. Or become a free AI Weirdness subscriber to get new posts in your inbox!

I'm dressing as a vampire for Halloween! More impressive:I shall be displaying my vampiric prowesses at a Halloween carnaval

How to improve your Halloween costume

I've amused myself before by getting GPT-3 to change one kind of text into another. With just a few style-change examples, I got it to change the Winnie the Pooh theme song into this: In the uncanny depths of the Hundred Acre Wood is the place where the great wizard
How to haunt and unhaunt a house

How to haunt and unhaunt a house

What do you get if you instruct an AI to turn a house into the most haunted house in the world? What if you ask it for the LEAST haunted house? How does an AI know what "haunted" looks like, anyways? I did some experiments with CLIP+VQGAN (link and
Several lumps of olden cake covered with generous dollops of creamy frosting.

Neural networks vs the Bake-off technical challenge

There's this baking competition I really like, and one of the elements in every show is what they call the Technical Challenge. In the Technical Challenge, Great British Bakeoff contestants have to bake something they may never have seen before, based solely on a brief description and a very sparse
See How Much! logo is white on red with three white pears

Very normal grocery stores

I was reminded recently that US grocery stores often have strange names. I grew up shopping at Big Bear and Giant Eagle and thought it was very normal for grocery stores to be named after megafauna. Given a list of real but strange grocery store names to complete (such as
You've successfully subscribed to AI Weirdness
Great! Next, complete checkout for full access to AI Weirdness
Welcome back! You've successfully signed in.
Unable to sign you in. Please try again.
Success! Your account is fully activated, you now have access to all content.
Error! Stripe checkout failed.
Success! Your billing info is updated.
Error! Billing info update failed.