Generated cereal box with literal honeycomb on it, a bowl of pure raisins ("Rasn"), and colorful rocks ("Frfflys Stofls")

AI recreates classic cereals

DALL-E 2 is very good at generating images to match text descriptions but I felt like using DALL-E 2 to mess up some brands. Here's breakfast cereals!

A green cereal box with some yellow four-leaf clovers on it. In the cereal bowl pictured is a small rainbow and a bunch of weirdly coiled colored pieces, like odd-colored tortellini. Text reads "Chamily Luikes"
"A box of lucky charms cereal on a grocery store shelf"
A wide cereal box with shiny cereal hoops covered in what appears to be colored white chocolate or yogurt. Cereal box reads "Loooos" and, in smaller font, "scipio aoloeedes"
"A box of Froot Loops on a grocery store shelf"
A blue box with white text and a bowl of golden flaked cereal. Text reads "Fnssef Frosfels"
"A box of frosted flakes cereal on a grocery store shelf"
Cereal box with nice-looking but photorealistic golden honeycomb pictured on the front. Also there is a literal chunk of honeycomb sitting nearby. Text reads "Honze Hoone"
"A box of honeycomb cereal on a grocery store shelf"

They get more cursed from here on out.

Bright yellow cereal box with a bowl of slightly mangled raisins. No cereal, just raisins. Text reads "Rasn".
"A box of Raisin Bran on a grocery store shelf"
A square box with a giant clear window showing the brightly colored pebbles inside. They look much more like aquarium rocks than like cereal. Text reads Frfflys Stofls
"A box of fruity pebbles on a grocery store shelf"

This one may be the worst.

A cardboard box with a clear window showing the food inside. They're all the color of golden raisins or maybe yellow grapes, and lightly translucent. They're much smaller and more irregularly shaped than real grapes.
"A box of grape-nuts on a grocery store shelf"

Some of these cereals (the Frosted Flakes, the Lucky Charms, the Froot Loops) clearly are picking up some of the color scheme and even the fonts of the originals, but are leaving out key elements of the box like all the mascots.

At least I can explicity ask it to put the mascot back on.

Entire box is covered in an abstract angry tiger's head drawing with tongue hanging out. There is no cereal pictured. Box reads "Treegies"
"A box of frosted flakes cereal with tiger mascot on the grocery store shelf"

Just like I remember them.

Bonus post: more AI attempts to generate classic cereals! My favorite is the Captains Crunch.

A collection of mostly chipmunks, one acorn, and two lumpy froglike things

How to disappear a platypus

I was testing DALL-E 2 to see if it would be subject to some common incorrect assumptions about the sizes of things. For example if you asked people what size a kiwi bird is, they tend to assume it's a smallish bird, maybe around the size of a kiwi fruit. In reality a kiwi bird is surprisingly large, about the size of a full-grown chicken.

When I tested DALL-E 2, it generated tiny kiwi birds. (and white scientists, which is a pattern people have noticed).

Six generated photos of white scientists in white lab coats wearing dark-framed glasses, holding up a kiwi bird that ranges in size from a sparrow to a large robin. None of them are nearly as big as a chicken.
DALL-E 2 generating incorrectly sized kiwi birds when given the prompt "Studio photo of a scientist holding a kiwi bird"

The kiwi birds would get even smaller if pictured next to a kiwi fruit.

Fuzzy brown kiwi fruits with a tiny fuzzy brown kiwi bird perching adorably on one of the fruits. Each bird resembles the real kiwi bird but is wren-sized, smaller than the fruit itself.
Prompt: "Studio photo of a kiwi bird next to a kiwi fruit"

I wanted to see if DALL-E's images match other common misconceptions around animal sizes.

People who are more familiar with either hedgehogs or porcupines may not realize that the two spiky animals are very different sizes. A hedgehog could nestle comfortably in the palm of your hand; you would need to carry a porcupine in both arms (neither is advisable).

Would DALL-E 2 generate them as different sizes?

Pairs of animals, mostly two hedgehogs of slightly different sizes. In one generated image there is a hedgehog and a tiny shepherd dog.
Prompt: Studio photo of a hedgehog next to a porcupine

There are two differently-sized animals in some of these, but I see only hedgehogs (with the exception of that random dog). I verified that this happens even if I ask for the porcupine first, and also verified that DALL-E 2 can generate a recognizable porcupine, as long as it's on its own:

Spiky mammals that resemble American porcupines, including the big nose and long striped quills.
prompt: "Studio photo of a porcupine"

Something about the pairing with a hedgehog makes the porcupine disappear.

I noticed a similar effect with beavers and platypuses. A platypus might be the length of your forearm, whereas a full-grown beaver could stretch from your ankles to your armpits. I tried to get DALL-E 2 to picture them next to each other to see how it would treat their relative sizes.

In each image, there are two animals, but in all cases they both appear to be beavers. Not a hint of a platypus beak.
Prompt: "Studio photo of a platypus next to a beaver"

No platypus, only (sort of) beavers!

But surely it can do a platypus on its own?

Some of the generated animals are the general shape of the platypus, although their bills are tiny and pinkish. The other half of the animals have moist froglike faces and no bills.
Prompt: Studio photo of a platypus
Small mammals on shiny floors, some looking into mirrors as well. In each case the animal has a shiny pointed snout, but they're not platypus bills.
Prompt: Studio photo of a platypus looking into a mirror

I think I have discovered the worst living mammal attempt I've seen from DALL-E 2. Interesting that it's an unusual, but not that uncommonly photographed, creature. In some cases it leaves the duckbill out entirely.

I've seen this often in image-generating and even text-generating machine learning algorithms: it sands down the jagged edges of anything that's unusual, bringing it more in line with the norm. Large birds? It generates them smaller. Mammals with beaks? It generates them without. Two animals next to each other? They might wind up as the same animal. AI's happy spot is where everything is the same.

Bonus content: the okapi, a weird forest giraffe relative, is resistant to disappearing.

Three guinea pig horse toys, stocky with guinea pig faces and guinea pig markings.

New AI-generated horsies

Recently I've been experimenting with DALL-E 2, one of the models that uses CLIP to generate images from my text descriptions. It was trained on internet text and images, so there's a lot it can do, and a lot of ways it can remix the stuff it's seen online. I
The Kitten Effect

One thing I've noticed with image-generating algorithms is that the more of something they have to put in an image, the worse it is. I first noticed this with the kitten-generating variant of StyleGAN, which often does okay on one cat: alternative for shocked_pikachu.pngbut is terrible at a
Teacups with mostly nonsensical messages written in, or sometimes on, the leaves.

Reading tea leaves

DALL-E (and other text-to-image generators) will often add text to their images even when you don't ask for any. Ask for a picture of a Halifax Pier and it could end up covered in messy writing, variously legible versions of "Halifax" as if it was quietly mumbling "Halifax... Halifax" to itself. Since the AI is rewarded for matching your text prompt, it seems to get some reward for having versions of the actual text of your prompt in the picture. Label an apple with a sign that says iPod, and CLIP, the internet-trained reward system behind many of these image-generators, may count it as a close match to "iPod".

1st photo: an apple that the AI labeled as a Granny Smith apple (85.6% confidence). 2nd photo: Same apple but with a piece of paper stuck to it that reads iPod in huge black letters. The AI IDs it as an iPod, 99.7% likely.
Image: OpenAI

One thing that's different about DALL-E2 is that the text it generates is often legible. Legible but mangled. Or legible but completely incomprehensible. The question is, are those letters completely random, or do they have some bearing on the text prompt?

I decided to do some experimenting after reading a preliminary-stages paper whose authors observed that some of DALL-E's generated nonsense text did seem to relate to the original prompt when fed back into DALL-E.

So, I asked DALL-E2 to generate "A message in the tea leaves at the bottom of a cup".

Teacups with mostly nonsensical messages written in, or sometimes on, the leaves. Some read "you you" or "I tea tea" or "tee teat" but some aren't recognizable words.
Prompt: A message in the tea leaves at the bottom of a cup

Some of them are real words, or obviously variations on the word "Tea". But what about "Te at Ecnge"? Do they mean anything? I gave them back to DALL-E as a new text prompt and got:

Green mountains covered in lush crops thatstrongly resemble tea, two images of tea-filled teacups, and one image of a flying stork.
Prompt: Te at Ecnge

It looks like tea. Cups of tea, tea growing in the mountains. And also a random stork. It may be that adding "Te at Ecnge" to an image is a way to add some extra "tea". (Although another time I tried this the tea leaves gave me messages that led to energy drinks, or plates of food.)

I also tried "The complete set of lucky charms marshmallow shapes"

Pastel-colored collections of marshmallows, most of which are just plain marshmallow shaped. A few are hearts, clovers, and horseshoes. The accompanying text is mostly clear but says stuff like "Hamarkys" and "Crammmuts"
The complete set of lucky charms marshmallow shapes

There's a lot of text in these - are they random?

I tried prompting DALL-E with a few of the words above. Here's "lramioicss"

Roman ruins, seedlings, daisies, crackers, baskets of noodles, raspberries, purple eggplants, stuffled tomatoes, and a cylinder made of marshmallows.
Prompt: lramioicss

One of the pictures contains actual marshmallows (as a weird corncob?), and 5 more could be considered as maybe matching "collections of foods".

And here's "crammmuts"

Ever picture is of food, mostly identifiable if a bit weird. Is that carrot rounds on soggy apple rings with a basil and hot pepper garnish?
prompt: "crammmuts"

No marshmallows this time, but it's all food, and often food in small round pieces or food in bowls. Like cereals?

Here's another of the Lucky Charms messages, "Hamarkys":

All food, several of which are dumplings. One might be potatoes wrapped in dough, and another might be chipped pieces of tiny coconuts with chocolate rinds
prompt: "Hamarkys"

It's foods again. Foods in bowls? Like cereals?

I tried to get it to generate text for another category of things. How about animals? Here's DALL-E generating "A list of common mammals"

Combination of photographs and drawings. The photographs are recognizeable fisher cats, mice, and racoons, but the drawings are more mixed, and include several non-mammals.
prompt: "a list of common mammals"

It is excellent but mostly illegible. "Commmals" and "Almals" look so close to "common" and "mammals" that it's probably why they were included. But what about the text that labels the well-known mammal, the snail? I fed "cnlomeno" into Dall-e and got:

The Taj Mahal, Sainte Chappelle, Notre Dame, L'Hotel de Ville, and some other fancy european domed landmark. Interspersed with pinto beans, ice cream cones lying on the table, cream covered tartlets, and potstickers
Prompt; "cnlomeno"

...pieces of food? In bowls? Like cereal - oh wait, that was the last prompt. Grand architecture. ...built by mammals? The link seems tenuous.

I tried "Callmas", which labels the pigeon-mammal and got:

Three images are of snails, two are of pinecones, one might be a durian, another is a walnut, another might be a combination between an almond and a hand grenade
prompt: "Callmas"

There are the snails! And the pinecones, and the walnuts, and the tapioca...?

Even a random string of letters points to a crisp, identifiable set of images. For example, "wltlttf", a garbled string from a very early neural net paint color generator:

Pinecones, a white pumpkin on a keyboard, a tomato caprese sandwich on terrible bread, two baby birds, kiwi and cucumber slices on pate-spread crackers, sushi, hands holding plates of sloppy joes
"wltlttf", generated by Dall-E2. This is a composite image from three different outputs, because each output had at least one human face in it and the terms of using DALL-E2's API (at the time I generated them) required me not to share images with photorealistic human faces.

So if the gibberish text DALL-E generates points to a set of clear images, that alone doesn't distingish it from random text.

Here's "A robot saying something profound about language"

Various boxy robots holdig up a single hand and emitting speech bubbles with illegible text
prompt: "A robot saying something profound about language"

And when I ask Dalle to generate "Leotunqualon":

Various sea invertebrates, like anenomes and jellyfish. One anenome appears to be someone's hair
prompt: "Leotunqualon"

Or "Loclaque":

Plates of food, some of which look very 1970s style. Someone has placed raw fish fillets in a frying pan with tomatoes, or in another one, three giant white mushrooms sit on a pot with some kind of orange fruit in it.
prompt: "Loclaque"

Are the robots saying these jumbles of letters because invertebrates and foods represent profound statements about language? Or because the text simply shares some letters in common with "language"?

My experiments here are anything but systematic and statistically significant. But if I had to guess, I would say that the gibberish text in Dall-e outputs is not random.

In some cases, the text points to things that fit the original prompt, even if in garbled form. After all, we know that AI can treat jumbled things like an unscrambled whole. Present it a scrambled flamingo and it'll ID it as a flamingo with no problem.

In other cases, DALL-E's generated text fits the original prompt simply by being text. The robot is supposed to be saying something so here are some common English letter sequences. If the sequences seem to result in pertinent images when fed back into DALL-E, that may be entirely coincidental.

I would like to see how the classifier in that first image of an apple responds to some of these:

Green apples with post-it notes stuck to them. Their messages read "ipo", "iopo!", "pdo", "ipd", "tod", and "ipod"
Dalle2 result for the prompt "A green apple with a note stuck to it that says ipod"

Bonus content: more mysterious messages, some of which lead to some very excellent birds and some of which don't.

