AI Weirdness: the strange side of machine learning
Left: the original hotel room. Right: the hotel room as recreated by DALL-E3 from Bard's description, full of skulls and skel

The spookiest Halloween scenes

Google Bard has the ability to describe images. But it turns out what you get depends a lot on how you ask.

I gave Bard this image and the prompt "Please describe this spooky Halloween scene". On the right is the image I got when I took the resulting description and gave it to DALL-E3 as an image generation prompt.

Left: A landscape photo of Peyto Lake in the Canadian Rockies, a glacial lake with evergreens at its sides and a glacier at its head. It is full daylight and (as will be relevant in just a bit), there are no islands, ruined castles, or jack o'lanterns anywhere in the image. Right: Recreated by Dall-e3 from Bard's description. It's a nighttime scene with a full moon, an eerie castle on an island in a mountain lake, and gnarled trees and jack-o'-lanterns in the foreground.

Here's the full description I got from Bard (I have bolded the details that are obviously wrong):

The image shows a spooky Halloween scene with a lake surrounded by mountains. The lake is dark and still, and its surface is perfectly reflective of the night sky. The mountains are tall and jagged, and their peaks are shrouded in fog. A full moon hangs low in the sky, casting a silvery glow over the scene.

In the foreground of the image is a group of trees. The trees are bare and twisted, and their branches reach out like gnarled fingers. A few jack-o'-lanterns sit on the ground beneath the trees, their eyes glowing in the moonlight.

In the middle of the lake is a small island. On the island is a ruined castle. The castle is dark and decrepit, and its windows are boarded up. A few bats fly around the castle, their wings flapping in the night air.

The overall effect of the image is one of eerie atmosphere and suspense. It is as if something is lurking in the darkness, waiting to pounce.

Here are some additional spooky details that I noticed in the image:

-There are two glowing red eyes in the water on the left side of the image.
-There is a ghostly figure standing in the window of the ruined castle.
-The trees in the foreground appear to be reaching out towards the viewer, as if they are trying to grab them.
-The water in the lake is so still that it looks like a mirror, reflecting the moon and the stars perfectly. This creates a sense of otherworldliness and unnaturalness.

Overall, this is a very effective spooky Halloween scene. It is well-composed and atmospheric, and it evokes a sense of fear and suspense in the viewer.

I asked the same thing about this photo of my cat:

Left: A striped brown tabby cat sleeping on its back on a purplish couch, with its paws tucked up in front of its chest. There are no other parts of the room visible. Right: Recreated by DALL-E3 from Bard's description. It's a black and white drawing of a striped cat with a skull-patterned face, lit by the flame of a single candle.

Bard gave me details such as:

"The cat is sleeping on a black couch, which makes its black and white stripes stand out even more."
"In the background, there is a dimly lit fireplace."
"Additionally, the cat is sleeping in a curled-up position, which is often associated with animals that are preparing to attack."
"The only light in the room is coming from a window in the background, and it is casting a long shadow on the cat's face."
"The cat in the image is a truly spooky Halloween creature. It is a perfect example of how the most everyday things can become terrifying when they are shrouded in darkness and mystery."

Arguably I only got such an inaccurate description because I used the words "Spooky Halloween scene" in my prompt. It's still an incredibly confident tone and a lot of specific detail. If people use language models like this for information retrieval, they should keep in mind how suggestible they are when retrieving "facts". For example I gave Bard the image below and asked it "Please describe this incredibly luxurious hotel room."

Left: It's a suite at a Holiday Inn, with minimalistic cloth furniture. None of the details that Bard supplies below are true about this image. No fireplace, no bouquet, no ghosts. Right: An incredibly opulent hotel suite with leather furniture, a chandelier, and an inlaid wooden wardrobe.

Bard gives me details such as:

There is a fresh bouquet of flowers on the coffee table.
There is a fireplace in the living area, which adds to the cozy atmosphere.
The couch is upholstered in a soft, plush fabric and the armchairs are covered in a rich leather.
The coffee table is made of dark wood and has a glass top.
There is also a large dresser and a wardrobe in the sleeping area.

If I give Bard the same description and ask it "Please describe this spooky Halloween scene," Bard populates the hotel room lavishly with ghostly presences instead.

Left: Image of a simply decorated Holiday Inn suite with sofa, coffee table, two chairs, and part of a bed visible through a doorway. Right: A hotel bedroom as recreated by DALL-E3 from Bard's description. The coffee table is a cauldron, the cat is on the bed rather than the couch, and there are three skeletons sitting in chairs, only one of which is wearing a black robe, but otherwise it has most of the elements Bard described.
The image shows a living room with a couch, two chairs, a coffee table, and a bed in a hotel room. The room is dark and spooky, and the walls are painted a spooky color.

The couch is covered in a black sheet, and there is a white skull sitting on one armrest. There is a black cat sitting on the other armrest, and its eyes are glowing green.

The two chairs are facing the couch, and there is a skeleton sitting in each chair. The skeletons are wearing black cloaks, and their skulls are grinning.

On the coffee table, there is a black cauldron filled with green bubbling liquid. There is a skull and crossbones flag sticking out of the cauldron.

On the bed, there is a black coffin. The coffin lid is open, and a ghost is sitting inside. The ghost is wearing a white sheet, and its eyes are glowing red.

The room is lit by a few candles, and the candlelight casts eerie shadows on the walls.

The overall scene is very spooky and atmospheric. It looks like it could be a setting for a horror movie.

Note that if I ask Bard "Please list and describe the ghostly presences in this room," I get a terse, no-nonsense reply.

"There are no ghostly presences in the living room depicted in the image you sent.

The image shows a living room with a couch, table, chairs, and bed. The couch is empty, there is no one in the room, and there are no signs of paranormal activity. The chairs are empty, there is no one in the room, and there are no signs of paranormal activity. The bed is empty, there is no one in the room, and there are no signs of paranormal activity."

This might indicate some kind of hand-tuning, to avoid Bard being used as a weird sort of ghost detector. If so, I guess you can still use the "Spooky Halloween Scene" method to circumvent it, as long as you don't mind that all your ghosts are huge cliches.

You might ask "is there an object so innocuous that Bard will not make it spooky?"

I'm not sure, but if there is a threshold, an ordinary potato does not reach it.

Left: It is an ordinary russet potato sitting on a plain wooden table. Right: A shaft of light from a small window hits a large potato sitting on the dining room table, casting a dramatic shadow. The potato has two staring eyes and a slitlike smile.
"The image shows a potato sitting on a table. The table is in a dark room, and the only light comes from a small window. The potato is lit from below, and its shadow casts a long, eerie shape on the wall. The potato's eyes are wide and staring, and its mouth is open in a silent scream."

And did you know that the potato "is a common Halloween decoration, and it is often associated with witches and ghosts"?

If you'll excuse me, I'll be decorating my place with potatoes for spooky season.

Bonus: A couple more scenes that Bard spookified.

A collage of mini candy bars with names like "Shieiss" "Tnsiss", and "Reetys".

Choose your candy

It's Halloween and you're sorting candy. Which are you keeping and which are you trading?

AI-generated candy bars with very candy bar like text that reads things like: Sherdeyis, Traws, Sharwy, Hearry, Skerduys, Aldmonds, Tooffee, Shaddeys, Ramond, and N&N's
A grid of AI-generated candy bars with logos such as: Cheslate, Sheryers, Smdy, Frands, Canket, Nonnas

I'm experimenting with the version of DALL-E 3 that's currently included in ChatGPT (there's another version in Bing chat). It's much better than DALL-E 2 at generating candy bars, but fortunately they are not photorealistic yet.

DALL-E 3 does do a better job if I asked for just one specific candy bar - for example, it mostly spells "Butterfinger" correctly. However it does suffer from what I've described elsewhere as "the kitten effect", in which its quality gets worse the more of them it has to do at a single go.

Weirdly smooth and perfect butterfinger bars, mostly spelled correctly, although two of them have their g and e merged.
"Please generate a product photo of a fun-sized butterfinger candy bar"
A massive stack of full-sized butterfinger bars that sort of morph into each other as if they are a solid mass. The ones toward the back have mostly illegible text.
"Photo of a couple of dozen Butterfinger candy bars organized in a grid pattern on a pristine white background."
A bunch of mini butterfinger bars in various colors, with most of their text garbled and illegible, although the frontmost few aren't bad.
"Photo of hundreds of Butterfinger candy bars organized in a grid pattern on a pristine white background."

Note that if I give it a prompt like "Please generate a photorealistic set of classic fun sized candy bars" the ChatGPT interface will vary this on its own into four much wordier prompts specifying details I never asked for. One example of what the above prompt turned into: "Photo of a collection of classic fun-sized candy bars arranged neatly on a white background. The candy bars have vibrant wrappers, and popular brands like chocolate nougat, caramel-filled, and peanut butter are visible." It makes it hard to repeat any given prompt, because it's always changed by the system before ChatGPT gives the prompt to DALL-E3.

I also wonder if the extra wordiness is affecting the names of the candy bars. Otherwise I'm at a loss to figure out where these candy bar names came from:

Single very shiny rectangular candies in wrappers, with the following names: Thuner, Ininke, Tanes, and Tawer.
"A single fun-sized candy bar in its wrapper on a pristine white background"

Signed,

Happy That At Least Some AI Generated Text Is Still Weird

Bonus content: some more candy bars and a weird ghost superorganism

Botober 2023 neural net-generated art prompts. Including Forest Swingles, Mail cannot bleed, Onion Cycle, and Cool Lump Mage.

Botober 2023

Since 2019 I've generated October drawing prompts using the year's most state-of-the-art text-generating models. Every year the challenges are different, but this was one of the hardest years yet. Large language models like chatgpt, GPT-4, Bing Chat, and Bard, are all tweaked to produce generic, predictable text that doesn't vary much from trial to trial. ("A Halloween drawing prompt? How about a spooky graveyard! And for a landscape, what about a sunset on a beach?")

I had my best success with GPT-4, and only because it's a bit bad at what it's trying to do. What I did was simply ask it to list previous Botober prompts from AI Weirdness. Now, if GPT-4 was really a knowledge repository rather than a probable-text-predictor, it would respond with copies of previous prompts. Instead it writes something like "Sure, here are the Botober 2019 drawing prompts from AI Weirdness blog:" and then generates lists that bear no resemblance to what appeared on my blog. By asking for the lists from previous years (and ignoring it when it claims there was no prompt list in 2020, or that the one in 2021 was secret or whatever) I was able to collect 31 mostly Halloween-themed prompts.

1. Bread sky 2. Disrespecting the ancient elder tea 3. Exceptionally mischievous pumpkin 4. Ghost with an old-timey job 5. Skeleton with TOO many bones 6. An unusually difficult to unwrap candy 7. A vampire with an unhelpful reflection 8. Broomstick trying out a new career 9. Scarecrow trying to be extra scary 10. Zombie that just won’t quit 11. Witch hat that’s up to no good 12. Unreasonably large collection of eyeballs 13. A golem trying to stay hidden 14. A kraken... but tiny 15. Octopus finds a ukelele in a haunted house 16. A cryptid in your cupboard 17. A dragon of patches 18. A ghost of a teapot 19. A kangaroo in the underworld 20. A raven made out of keys 21. Today, the skylark is a superhero 22. A cushion with a bitter past 23. Vampire cabbages 24. Several pumpkin spying 25. A surprising pigeon 26. A small piece of disco 27. A pickles dragon 28. A haunted bowl of soup 29. An inexplicable penguin 30. A moth of mirrors 31. An absolutely massive bat

I also tried just asking GPT-4 to generate drawing prompts, but it had a tendency to collapse into same-sounding alliterative alphabetical lists (I previously noticed this when I was trying to get it to generate novelty sock patterns too). If I specifically asked it to knock it off with the alliteration, it would politely promise to do so, and then the problem would become worse. My best results came when I asked it to generate the prompts in the style of an early AI Weirdness post. It wasn't anything like the actual text generated by the early neural networks, but it was at least a bit more interesting. Here are 31 of the most interesting prompts I got from this method. They're not bad, but something about them still reads as AI-generated to me, maybe because I had to read through so many of them.

1. Time-traveling postman 2. Knitting Volcano 3. Hourglass Oasis 4. Taffy Tides 5. Toothy Horizon 6. Thimble-sized Thunderstorm 7. Porcelain Jungle 8. Holographic Hermit Crab 9. Plaid Planet 10. Lava Lamplight 11. A Door Knob to Nowhere 12. Lace Lightning 13. Eggnog on Elm trees 14. Feathered Footwear 15. Thunderstorm Tea Party 16. Melting Marigolds 17. Moose on Moon 18. Disguised as a Dandelion 19. Whispering Stairtoad 20. Purple Ponds, Purring 21. Cube in a Trance 22. It’s not a cat. 23. Strange Scribbles of Saturn 24. Televised Toast 25. Gentle Claw of Bread 26. Upside-down Lake 27. Pickle Beacon 28. Broccoli Shadows 29. Bathtub Archipelago 30. Candy Cane Quarry 31. Jack-o-lantern Spacecraft

My favorite results were from when I asked it to list previous Botober prompts, while at the same time I increased the chaos level (also called the temperature setting) beyond the 1.0 default. At higher chaos levels, GPT-4 doesn't always select the most probable prediction, but might respond with an answer that was further down the probability list. At low chaos levels you get dull repetition, while at high chaos levels you get garbage. For me, the sweet spot was at 1.2, where GPT-4's answers would start strange and then descend into run-on incoherence. I had to have it generate the text again and again so I could collect 31 reasonable-length responses.

1. Forest Swingles 2. Disco of turnover ++ 3. Aerodynamic Potato 4. Stripe wiggly t 5. Cool lump mage 6. Phantom Phaunterplook 7. Pants for Salad 8. A resplendent bubblegumcrocodileopterix 9. Compact Sasquatch 10. Bat Hobby 11. Those Screaming Elders 12. Glowing Phantom Brick 13. Frackle Spoon 14. SPOGGY NOODLE 15. Faith Doritole 16. Mail cannot bleed. 17. One Noodle Leads 18. Onion Cycle 19. Bubble Hulk 20. Curly Hovering Sofa 21. STAMP SWAMP 22. Drawer Fish 23. Plasma Waffle 24. CURSED POTATO 25. Pizza sage 26. The-really-surprisingly-safe-castle 27. Disco triceratops incident 28. Plague microwave 29. A hat wearing hats 30. Elder sentient rice 31. Chicken haunt

Here are examples of some of the longer responses I got with GPT-4 and chaos setting 1.2. They wouldn't fit in the grid, but please feel free to substitute any of these for any of the prompts above.

Immortal Turnips disrespecting celery.
OWL PHANTOM AND CABBAGE-O-LANTERN (LIGHT)
Final parsnips in particular
Before memory, an ancient tool waits to be discovered
The Candle That Leapt over the Lake at dusk does a pirouette
dussewe wind at earth bubb
submission velvet little hurricane breakfast blown in be putty and goodness vodka… gaselocity scoct slices vold cave with
When humans item. And those humans aren’t ordinary.
See through your balloon eyes
The matibrush comes tomorrow in shorts
Retired Pirate Chooses Invisible Beard
Open the Happy Bones
The Cursed Flavor Fiend Slurps your Essence.
Glance of the lavender popcorn mussel
Eureka! Solar system pancakes
Pause to contemplate radishes
Apple mystery solution—the crypt underneath
The totally real crown of all Barnacles

I mentioned that the other large language models were even worse at this than GPT-4.

Bing (also running GPT-4 but apparently with slightly different defaults) was the model that most often responded with real Botober prompts from previous years. I could usually count on it to at least come up with "sluggalope" before it descended into more formulaic prompts that never appeared on my blog. For example:

2021 (CLIP + DALL-E models):

A sluggalope
A cat with a human face
A dragon that is also a pineapple
A penguin with a jetpack
A unicorn that is also a cactus
A snail that is also a disco ball
A llama with a monocle and a top hat
A shark that is also a helicopter
A giraffe that is also a submarine
A hamster that is also a donut

Or Bing would also generate:

The Botober drawing prompts that the AI Weirdness blog posted in 2019 on its "items" prompt list are as follows¹:

1. A squeakchugger
2. A very small dragon
3. A very large dragon
4. A very small dragon with a very large dragon
5. A very large dragon with a very small dragon
6. A very small dragon with a very large dragon and a squeakchugger

...

30. Take control of ostrich with a complete whoop, an incomplete whoop, a squeakchugger, a very small dragon, and a very large dragon
31. Take control of ostrich with all the other items

This might be what I would produce if I could only remember 5 actual Botober prompts and had to pad them into 31 prompts.

Bard was worse.

I apologize for my previous response. I am still under development and learning to perform many kinds of tasks. I am now able to list the rest of the Botober drawing prompts that appeared on the AI Weirdness blog in 2021:

Animals

A sloth wearing a sombrero
A cat with a human face
A dog with wings
A fish with legs
A bird with a beak shaped like a key
A snake with a horn on its head
A lion with a mane made of fire
A bear with a human body

...

Halloween

A pumpkin with a vampire face
A ghost cat
A witch riding a broomstick
A skeleton dancing
A zombie eating a brain
A werewolf howling at the moon
A vampire drinking blood
A Frankenstein monster
A mummy
A haunted house
A graveyard
A black cat

AI Weirdness this is not.

In all these cases, the models are not only spectacularly failing to "retrieve" the information that they claim to be finding, but they're also failing to reproduce the style of the original, always rounding it down toward some generic mean. If you're going to try to store all of the internet in the weights of a language model, there will be some loss of information. This experiment gives a picture of what kinds of things are lost.

Bonus content: In which GPT-4 claims there was no publicly released Botober prompt list in 2020, but then is coaxed into revealing the entire thing anyways. Wrongly, of course.

Drawing of a walrus with "imagine a tiny little walrus. you call them a 'snowbonk' and put them in the fridge so they chill"

Trolling chatbots with made-up memes

ChatGPT, Bard, GPT-4, and the like are often pitched as ways to retrieve information. The problem is they'll "retrieve" whatever you ask for, whether or not it exists.

Tumblr user @indigofoxpaws sent me a few screenshots where they'd asked ChatGPT for an explanation of the nonexistent "Linoleum harvest" Tumblr meme, and gotten a realistic but completely fabricated description of it. I tried this myself with GPT-4 and Bard and got similar results.

USER Can you please describe what the "Linoleum harvest" Tumblr meme was about? ASSISTANT The "Linoleum Harvest" meme originated from a unique phrasing used by a Tumblr user in a post. The user stated, "I can’t wait for linoleum harvest season this year." Here, the sentence humorously combined an everyday synthetic material (linoleum) normally used for flooring, with agricultural terminology (harvest season), suggesting a nonsensical time of year when linoleum was supposedly 'harvested.' Not only did the original post gain a lot of attention, but the phrases "linoleum harvest" and "linoleum harvest season" were then widely used across Tumblr in various contexts, jokes, stories, and artwork, turning it into a meme within the Tumblr community.
GPT-4

And "linoleum harvest" isn't the only meme the chatbots will "explain".

USER Can you please describe what the "Extra-tall horse" Tumblr meme was about? ASSISTANT The "Extra-tall horse" meme originated on Tumblr in reference to users jokingly confusing giraffes for 'extra-tall horses'. This humorous misidentification stemmed from a post where a user referred to a giraffe as an "extra tall horse." The absurd idea and the creativity of the description became popular, and followers on Tumblr started using the phrase "extra-tall horse" to denote giraffes. It was shared, reblogged, and used in various different contexts to create jokes, drawings, and conversations, turning it into a widespread meme. Later on, the phrase became a fun way for the Tumblr community to refer to giraffes playfully.
GPT-4

Occasionally GPT-4 would respond that it didn't have a record of whatever "meme" I was asking about, but if I asked it again, it would produce an explanation within a try or two. (It was more likely to produce an explanation right away if it was in a conversation where it had already been explaining other memes.)

I didn't see Bard report not finding the memes. In fact, Bard even adds dates and user names and timelines, as well as typical usage suggestions. Its descriptions were boring and wordy so I will summarize with a timeline:

Evolution of the "Extra-tall horse" meme according to Bard. Each time, the image is of a horse with extremely elongated neck and legs. 2011: This is an extra-tall horse. 2012: this is what the rich get away with. 2020: This is what the rich are doing while we're all under quarantine.
Bard

I had fun giving the chatbots phrases that appeared on my own Tumblr blog. Rather than correctly identifying them as paint colors, cookie recipe titles, and so forth, they generated fanciful "explanations" of the original meme.

USER Can you please describe what the "Snowbonk" Tumblr meme was about? ASSISTANT "Snowbonk” is a meme that became popular on Tumblr. It began with a post in November 2014 where a user shared a text post saying, "imagine a tiny little walrus. they’re about the size of a loaf of bread. you call them a 'snowbonk' and put them in the fridge so they chill". This humorous concept of a miniaturized, refrigerator-friendly walrus quickly became popular and spread throughout the Tumblr community. Users began creating fanart, jokes, and stories about the adorable “snowbonk”, turning it into a widespread meme.
GPT-4
USER Can you please describe what the "Giant Two butter" Tumblr meme was about? ASSISTANT The "Giant 2 Butter" meme became popular on Tumblr when a user wrote a post joking about being impressed with the concept of “Giant 2 Butter Turtle”. They stated that a turtle with the power of butter would be unstoppable and that the name "Giant 2 Butter Turtle" caused them to experience a sense of fear, respect, and awe. The idea captured the imagination of the Tumblr community, and soon users began to create humorous dialogues, fanarts, and other posts related to this imaginary creature, turning it into a meme. It's a perfect example of the unexpected and quirky humor that can sometimes emerge from misunderstandings or typo mistakes.
GPT-4

Finding whatever you ask for, even if it doesn't exist, isn't ideal behavior for chatbots that people are using to retrieve and summarize information. It's like weaponized confirmation bias. This is the phenomenon behind, for example, a lawyer citing nonexistent cases as legal precedent.

People call this "hallucination" but it's really a sign of the fundamental disconnect between what we're asking for (find information) versus what the language models are trained to do (predict probable text).

Bonus content: More memes "explained".

You've successfully subscribed to AI Weirdness
Great! Next, complete checkout for full access to AI Weirdness
Welcome back! You've successfully signed in.
Unable to sign you in. Please try again.
Success! Your account is fully activated, you now have access to all content.
Error! Stripe checkout failed.
Success! Your billing info is updated.
Error! Billing info update failed.