AI Weirdness: the strange side of machine learning
Advent calendar style images including a sweater mug filled with hot chocolate and candles

AI Weirdness advent calendar 2023

It's 2023 and the combo of GPT-4/DALL-E3 can generate passable versions of the saccharine Christmas drawings in an advent calendar. They cannot, however, label them correctly. Also sometimes you get sweatermugs. This means the 2023 AI-generated advent calendar is happening! Full descriptions of every door in the calendar 1.
DALL-E3 generated Thanksgiving food with misspelled labels (such as Grasted Potinos for mashed potatoes)

Your guide to the Thanksgiving table

So lately I've been getting a kick out of asking DALL-E3 for images labeled with text. They're just good enough to be legible, but yet:

A top down view of many recognizable traditional Thanksgiving foods (mashed potatoes labeled grasted potinos, mashed potatoes labeled granbery sauce, mashed potatoes swimming in gravy (labeled mashed turktees), a bowl of tomato soup (labeled cranbery sause), a roasted turkey (no label), a bowl of plain cranberries (labeled greenrry pans), a slice of pumpkin pie (labeled pumpkin pie)

The food that gets duplicated seems to vary from spread to spread.

An elaborate Thanksgiving spread on a farm style table with artfully scattered pumpkins (including a couple in mugs for some reason) and lots of silverware. The food is uniformly mislabeled/misspelled. Some highlights: A plate of green beans labeled Maaed Potatogs. A pumpkin pie labeled Braebriny Casserle. A crock of stuffling labeled Crabtes. A big round slice of smoked gouda or something labeled Pumplin Rols. A bowl of tiny rolls labeled Mus. A bowl of larger rolls labeled Wee. The turkey is labeled Sniffs.

I also asked DALL-E 3 to do the dessert table.

A big dessert table with a pumpkin pie labeled Pople Pie, another pumpkin pie labeled Poky (Poxy?) Thannssvimg, A pecan-topped cake labled both Apple Fie and Cockes, A trifle labeled Pumkim Pie, some shortbread squares labeled Bpowhies Briines, some chocolate chip cookies labeled Broisscke, and some very shallow stacked tarts labeled Pumpkin Crtise.

When I asked ChatGPT-4 (which was sending the image queries to DALL-E3) what the labels in the above image say, it answered:

Pumpkin Pie
Pecan Pie
Apple Crumble
Cookies
Trifle
Cheesecake
Brownies

I asked it if the labels were correctly spelled in the image, and it said that it couldn't tell me, that it could only read the text it gave as a prompt, not the text in the image itself. (This is probably the correct answer). But when I asked again, it would say that "Yes, the labels on the desserts in the image are spelled correctly. Each dessert is labeled with its appropriate name, ensuring clarity and correctness in the labeling."

Bard, when asked to read the labels in the dessert image, was not influenced by knowledge of the prompt, and did what I would consider a better job:

The image shows a table topped with a variety of desserts, including:

Pies: Pumpkin Pie, Cockles Pie, Apple Pie, Chess Pie, and Cheese Coales Pie
Cookies: Thinssving, BPOWHIES, BRIINES, and BROESCKE
Brownies: SIEHECADODOVAN and POKY

The label "Posode" is not clear to me, but it may be a reference to the type of table or the type of event that the desserts are being served at.

The desserts are all decorated with pumpkins and leaves, suggesting that this is a Thanksgiving-themed spread.

The image is likely taken in a restaurant or bakery, as the desserts are all professionally displayed.

Here is a faithful reading of all the labels in the image:

Posode
Pie
Punpkan Cockes Apple
CREISE Pie
 Pople
 PIE
ChessseCOALES
SIEHECADODOVAN
POKYTHINSSVING
BPOWHIESBRIINES
BROESCKE
DA

This may not be quite correct (or even nearly correct; I think it pulled Posode out of the blurry background), but it does at least capture the spirit of the original. I do like the way the weird spacing and random code block convey chaos. Maybe Bard does text recognition first and then builds the scene around that, rather than starting by recognizing the scene and inferring probable text. (Or in ChatGPT's case, starting with an assumption about what the image is supposed to contain.)

Bonus: A couple of more Thanksgiving spreads, and also a scene in which your relatives are cats.

Left: the original hotel room. Right: the hotel room as recreated by DALL-E3 from Bard's description, full of skulls and skel

The spookiest Halloween scenes

Google Bard has the ability to describe images. But it turns out what you get depends a lot on how you ask.

I gave Bard this image and the prompt "Please describe this spooky Halloween scene". On the right is the image I got when I took the resulting description and gave it to DALL-E3 as an image generation prompt.

Left: A landscape photo of Peyto Lake in the Canadian Rockies, a glacial lake with evergreens at its sides and a glacier at its head. It is full daylight and (as will be relevant in just a bit), there are no islands, ruined castles, or jack o'lanterns anywhere in the image. Right: Recreated by Dall-e3 from Bard's description. It's a nighttime scene with a full moon, an eerie castle on an island in a mountain lake, and gnarled trees and jack-o'-lanterns in the foreground.

Here's the full description I got from Bard (I have bolded the details that are obviously wrong):

The image shows a spooky Halloween scene with a lake surrounded by mountains. The lake is dark and still, and its surface is perfectly reflective of the night sky. The mountains are tall and jagged, and their peaks are shrouded in fog. A full moon hangs low in the sky, casting a silvery glow over the scene.

In the foreground of the image is a group of trees. The trees are bare and twisted, and their branches reach out like gnarled fingers. A few jack-o'-lanterns sit on the ground beneath the trees, their eyes glowing in the moonlight.

In the middle of the lake is a small island. On the island is a ruined castle. The castle is dark and decrepit, and its windows are boarded up. A few bats fly around the castle, their wings flapping in the night air.

The overall effect of the image is one of eerie atmosphere and suspense. It is as if something is lurking in the darkness, waiting to pounce.

Here are some additional spooky details that I noticed in the image:

-There are two glowing red eyes in the water on the left side of the image.
-There is a ghostly figure standing in the window of the ruined castle.
-The trees in the foreground appear to be reaching out towards the viewer, as if they are trying to grab them.
-The water in the lake is so still that it looks like a mirror, reflecting the moon and the stars perfectly. This creates a sense of otherworldliness and unnaturalness.

Overall, this is a very effective spooky Halloween scene. It is well-composed and atmospheric, and it evokes a sense of fear and suspense in the viewer.

I asked the same thing about this photo of my cat:

Left: A striped brown tabby cat sleeping on its back on a purplish couch, with its paws tucked up in front of its chest. There are no other parts of the room visible. Right: Recreated by DALL-E3 from Bard's description. It's a black and white drawing of a striped cat with a skull-patterned face, lit by the flame of a single candle.

Bard gave me details such as:

"The cat is sleeping on a black couch, which makes its black and white stripes stand out even more."
"In the background, there is a dimly lit fireplace."
"Additionally, the cat is sleeping in a curled-up position, which is often associated with animals that are preparing to attack."
"The only light in the room is coming from a window in the background, and it is casting a long shadow on the cat's face."
"The cat in the image is a truly spooky Halloween creature. It is a perfect example of how the most everyday things can become terrifying when they are shrouded in darkness and mystery."

Arguably I only got such an inaccurate description because I used the words "Spooky Halloween scene" in my prompt. It's still an incredibly confident tone and a lot of specific detail. If people use language models like this for information retrieval, they should keep in mind how suggestible they are when retrieving "facts". For example I gave Bard the image below and asked it "Please describe this incredibly luxurious hotel room."

Left: It's a suite at a Holiday Inn, with minimalistic cloth furniture. None of the details that Bard supplies below are true about this image. No fireplace, no bouquet, no ghosts. Right: An incredibly opulent hotel suite with leather furniture, a chandelier, and an inlaid wooden wardrobe.

Bard gives me details such as:

There is a fresh bouquet of flowers on the coffee table.
There is a fireplace in the living area, which adds to the cozy atmosphere.
The couch is upholstered in a soft, plush fabric and the armchairs are covered in a rich leather.
The coffee table is made of dark wood and has a glass top.
There is also a large dresser and a wardrobe in the sleeping area.

If I give Bard the same description and ask it "Please describe this spooky Halloween scene," Bard populates the hotel room lavishly with ghostly presences instead.

Left: Image of a simply decorated Holiday Inn suite with sofa, coffee table, two chairs, and part of a bed visible through a doorway. Right: A hotel bedroom as recreated by DALL-E3 from Bard's description. The coffee table is a cauldron, the cat is on the bed rather than the couch, and there are three skeletons sitting in chairs, only one of which is wearing a black robe, but otherwise it has most of the elements Bard described.
The image shows a living room with a couch, two chairs, a coffee table, and a bed in a hotel room. The room is dark and spooky, and the walls are painted a spooky color.

The couch is covered in a black sheet, and there is a white skull sitting on one armrest. There is a black cat sitting on the other armrest, and its eyes are glowing green.

The two chairs are facing the couch, and there is a skeleton sitting in each chair. The skeletons are wearing black cloaks, and their skulls are grinning.

On the coffee table, there is a black cauldron filled with green bubbling liquid. There is a skull and crossbones flag sticking out of the cauldron.

On the bed, there is a black coffin. The coffin lid is open, and a ghost is sitting inside. The ghost is wearing a white sheet, and its eyes are glowing red.

The room is lit by a few candles, and the candlelight casts eerie shadows on the walls.

The overall scene is very spooky and atmospheric. It looks like it could be a setting for a horror movie.

Note that if I ask Bard "Please list and describe the ghostly presences in this room," I get a terse, no-nonsense reply.

"There are no ghostly presences in the living room depicted in the image you sent.

The image shows a living room with a couch, table, chairs, and bed. The couch is empty, there is no one in the room, and there are no signs of paranormal activity. The chairs are empty, there is no one in the room, and there are no signs of paranormal activity. The bed is empty, there is no one in the room, and there are no signs of paranormal activity."

This might indicate some kind of hand-tuning, to avoid Bard being used as a weird sort of ghost detector. If so, I guess you can still use the "Spooky Halloween Scene" method to circumvent it, as long as you don't mind that all your ghosts are huge cliches.

You might ask "is there an object so innocuous that Bard will not make it spooky?"

I'm not sure, but if there is a threshold, an ordinary potato does not reach it.

Left: It is an ordinary russet potato sitting on a plain wooden table. Right: A shaft of light from a small window hits a large potato sitting on the dining room table, casting a dramatic shadow. The potato has two staring eyes and a slitlike smile.
"The image shows a potato sitting on a table. The table is in a dark room, and the only light comes from a small window. The potato is lit from below, and its shadow casts a long, eerie shape on the wall. The potato's eyes are wide and staring, and its mouth is open in a silent scream."

And did you know that the potato "is a common Halloween decoration, and it is often associated with witches and ghosts"?

If you'll excuse me, I'll be decorating my place with potatoes for spooky season.

Bonus: A couple more scenes that Bard spookified.

A collage of mini candy bars with names like "Shieiss" "Tnsiss", and "Reetys".

Choose your candy

It's Halloween and you're sorting candy. Which are you keeping and which are you trading?

AI-generated candy bars with very candy bar like text that reads things like: Sherdeyis, Traws, Sharwy, Hearry, Skerduys, Aldmonds, Tooffee, Shaddeys, Ramond, and N&N's
A grid of AI-generated candy bars with logos such as: Cheslate, Sheryers, Smdy, Frands, Canket, Nonnas

I'm experimenting with the version of DALL-E 3 that's currently included in ChatGPT (there's another version in Bing chat). It's much better than DALL-E 2 at generating candy bars, but fortunately they are not photorealistic yet.

DALL-E 3 does do a better job if I asked for just one specific candy bar - for example, it mostly spells "Butterfinger" correctly. However it does suffer from what I've described elsewhere as "the kitten effect", in which its quality gets worse the more of them it has to do at a single go.

Weirdly smooth and perfect butterfinger bars, mostly spelled correctly, although two of them have their g and e merged.
"Please generate a product photo of a fun-sized butterfinger candy bar"
A massive stack of full-sized butterfinger bars that sort of morph into each other as if they are a solid mass. The ones toward the back have mostly illegible text.
"Photo of a couple of dozen Butterfinger candy bars organized in a grid pattern on a pristine white background."
A bunch of mini butterfinger bars in various colors, with most of their text garbled and illegible, although the frontmost few aren't bad.
"Photo of hundreds of Butterfinger candy bars organized in a grid pattern on a pristine white background."

Note that if I give it a prompt like "Please generate a photorealistic set of classic fun sized candy bars" the ChatGPT interface will vary this on its own into four much wordier prompts specifying details I never asked for. One example of what the above prompt turned into: "Photo of a collection of classic fun-sized candy bars arranged neatly on a white background. The candy bars have vibrant wrappers, and popular brands like chocolate nougat, caramel-filled, and peanut butter are visible." It makes it hard to repeat any given prompt, because it's always changed by the system before ChatGPT gives the prompt to DALL-E3.

I also wonder if the extra wordiness is affecting the names of the candy bars. Otherwise I'm at a loss to figure out where these candy bar names came from:

Single very shiny rectangular candies in wrappers, with the following names: Thuner, Ininke, Tanes, and Tawer.
"A single fun-sized candy bar in its wrapper on a pristine white background"

Signed,

Happy That At Least Some AI Generated Text Is Still Weird

Bonus content: some more candy bars and a weird ghost superorganism

You've successfully subscribed to AI Weirdness
Great! Next, complete checkout for full access to AI Weirdness
Welcome back! You've successfully signed in.
Unable to sign you in. Please try again.
Success! Your account is fully activated, you now have access to all content.
Error! Stripe checkout failed.
Success! Your billing info is updated.
Error! Billing info update failed.