AI Weirdness

A natural Minecraft cave turns in three frames to a weirdly smooth shadow on an ill-defined crease in the landscape.

Minecraft with object impermanence

By Janelle Shane On January 19, 2025 - 9 min read

I generally am uninterested in generative AI that's too close to the real thing. But every once in a while there's a modern AI thing that's so glitchy and broken that it's strangely compelling. There's this generative AI knockoff of Minecraft that fails so hard at being Minecraft that it becomes something else.

Trained on Minecraft with its huge randomly-generated landscape of punchable blocks, Oasis Minecraft will let you walk around.

But there's no object permanence. Look at a mountain, look away, and look back at it, and the mountain's completely gone.

I found myself in this wooden cavern (which I did not build; I just came across it like a natural formation):

AI-generated minecraft screenshot of a cavern made entirely of wooden boards, stretching up more than a story and out away into the shadows. The contours of the walls are hard to make out exactly, but they seem to be about as irregular as a natural stone cavern.

I walked up to the wall, backed up, and now I was here against the side of a mountain instead.

Distorted and slightly soft and melted version of a Minecraft cliff, with three or four torches mounted to the cliffside. Part of the cliffside appears to be made of wooden boards.

Build a structure, fail to maintain it in direct view at all times, and it's gone the next time you look around.

The program was trained to predict the next frame of a game of Minecraft based on what appears to be a combination of the previous frame and whatever commands the user is sending. There's no underlying physics engine, no library of standard block or creature types and their properties, which means that things like half blocks and quarter blocks can exist. You can dig a hole exactly wide enough for yourself and then fail to fall into it, no matter how much you jump up and down over the opening. The blocks in the landscape are only approximations, and if you approach them they seem to shift and morph, attempting to look casual. I walked toward the torchlit area in the screenshot above, and there was a point at which the torchlit "stone" got too bright and texture-y, so it morphed into birch tree bark.

Closeup of a cavern interior with one wall made of smooth stone (with a torch mounted on it) and the other wall made of a jumble of wooden boards and white-speckled black material. The jumbled wall has a hole in it that stretches away into darkness, and the blocks here don't look even remotely square, but very soft and lumpy instead.

I managed to get a pretty good look at it trying to be birch tree bark in a wooden cave. Improbable, but it was going to try to go with that.

In this screenshot, the player is standing close to the cave wall, which is made of wood, stone, something white like wool or marble, and large stretches of what appear to be birch bark. This close, the blocks are pretty close to square, but the birch bark's shape is pretty hard to make out.

Then I looked up too closely and steeply at the birch bark, losing track of the surrounding cave, and in a couple of chaotic frames...

Close-up of a jumbled chunk of birch bark. Is it wall or floor? Is that greenish stone or greyish vegetation in the background? A very ambiguous scene.

The birch bark wall became snowy floor.

The view is completely grey and white and black, but the black areas now look like the shaded sides of white blocks.

No wait, desert.

The player is now standing on a flat plane covered in sand. In the misty distance are mountains. It would be hard to figure out the topography of the scene since it's all muddled and smudged, but it's definitely trying to be a desert with distant dunes.

Liked that natural wooden cavern? Too bad, it's gone forever. You live in the desert now.

You can sometimes catch glimpses of rare blocks, like lava or pigs, but they tend to turn into ordinary dirt or sand when you approach.

What's that pink patch of pixels in the middle of the weird wooden floor in the desert? Approaching closer, it's even pinker and weirder - could this be a rare glimpse of a minecraft animal? It's uh, j-shaped? Determined to walk closer and not let it out of my sight until it reveals its secrets. Annnnd it's.... sand.

On the other hand, if you stare fixedly at ordinary blocks and approach them, they tend to get weird. Noise in a generative algorithm usually comes in the form of strong striped patterns, so by continually staring in the same direction you force the "Minecraft" algorithm to keep generating new frames based on accumulating noise. A somewhat ordinary stone cliff face gradually loses what definition it has, becoming blocky and flat as it seems to panic.

A natural cave turns in three frames from a sort of plausible looking feature in the side of a mountainside, to a weirdly smooth shadow on an ill-defined crease in the landscape. In the final frame the cave has turned a bit orange and the surrounding blocks have turned into blocks of shadowless grey and white that are impossible to interpret as any kind of 3D landscape. Also my hunger bar is rapidly decreasing for some reason, and my health is going down. The next several frames were extremely hard-to-interpret blocky chaos, but eventually I popped out into an ordinary desert.

At another point, I jumped into this ordinary bush, which became a deep, dark forest.

Note that the tiny bush on the left is about two blocks wide, then becomes a clump of about 20 bush-blocks as I get closer, then morphs into a lush forest with deep green leaves and thick trunks. Also my health bar turned from red to yellow for some reason? I'm sure it's fine.

I've made it a goal to see how completely I can get the generated landscape to freak out. One time I was swimming across a lake and noticed that the reflections at the water's edge were looking weirdly spiky.

A body of water with a mountain sticking out of it, although it's ambiguous whether the player is above or under water. The mountain is underwater-blue well above the water line, but the top still looks like bare stone. At the horizon are indistinct vertical stripes.

Swimming closer to them, they started to get even more strongly striped. Was this still supposed to be the horizon? Why did the rest of the water turn featureless? Why did the snowy mountain turn into trees? Was I even above water any more?

The foreground water has turned a flat, featureless blue, while the horizon spikes are even more pronounced. The mountain is still underwater-colored most of the way to its summit, which is now covered in fluffy trees rather than bare rock. — Image generating algorithms tend to get stripey like this when they are riiiight on the edge of mode collapse, which of course told me that I was successfully seriously annoying it.

I swam closer and the water's edge became a weird static wall that engulfed the sun.

A square minecraft sun peeks out from behind a mottled, staticky cloud. What was featureless water in the foreground is now probably sky. There is no more sign of the mountain.

Then there was rather a lot of static for a while.

The scene is uniform static from top to bottom, with a faintly blue tinge.

I wondered if I would be stuck here forever, but the static calmed into something that looked like snow. The AI even hopefully outlined a block as if it was something real that I might be able to dig into.

Uniform snow, with a rectangular portion vaguely outlined. In regular Minecraft this would mean you have selected the block for interacting with, but this block's outline is so weird and bulgy-rectangle that it could mean anything.

When I looked up from the snowy floor, I was in a weird cave filled with desert plants.

View from a sandy cave filled with brown sagebrush-looking desert plants. Through the long wide cave opening a desert landscape is visible, though it's not clear if these are hills of sand or sandstone or stone.

I thought at this point I would be back to normal gameplay, but something about this landscape was still incredibly unstable and shortly I was looking at two suns through yellow and green jell-o?

The landscape consists of uniform blocky orange and yellow and white areas, with no clues as to depth. The big white rectangles look like MInecraft suns, but then why are they embedded in yellow?

Things were very blocky and bright for several seconds but eventually resolved themselves into an ordinary desert landscape.

Nothing to see here, just a flat expanse of sand with some low misty sandstone hills in the distance. Oh, and you're nearly out of health. Something about this weird chaos landscape has been causing damage? Psychic damage, maybe.

Chaos persisted, in the form of the desert frequently turning into natural blocks of rare glowstone. If I'd stared at them for long enough and let them fill my field of view, I likely would have ended up in the glowstone-paved Nether zone of Minecraft. (Unless that's a new cheese block? I haven't played Minecraft in a while, maybe they added cheese.)

Sitting in the dirt are irregular blocks of yellow-white dotted with orange circles. It does look a bit like swiss cheese, but it's probably an imitation of the glowing blocks from the hellish Nether zone. There are some yellow-splotched white blocks in the foreground that I don't recognize at all.

I never died, though, despite my health bar going down to basically zero during my journey into the land of yellow jell-o. I don't think anything I do has any effect on my actual health - I think the AI just generally predicts that health sometimes goes up or down a little between gameplay frames, and that it goes down faster when you're in the presence of things that are generally flame-colored. Occasionally an apple or a carrot would appear in my inventory, but attempting to eat them didn't affect my health levels.

I did have an inventory the entire time, although I cropped it out in the above screenshots because they were masked by the controls every time I paused playback. The inventory never made sense. It was mostly empty, except for when the occasional pickaxe or chunk of dirt appeared there for a while. If I tried to use a pickaxe, it might appear in my hand for a few strokes and then apparently wander off again. Less identifiable stuff would appear in my inventory too, which never seemed to do anything when I tried to use it.

I tried to place a crafting table, but only got half-readable text saying "crafting table". The regular Minecraft game uses a message like this to announce your success at making a crafting table the first time you manage it - this usually happens early enough in the Minecraft game that it probably came up a lot in training. Occasionally AI Minecraft will give me other messages in the same white text, but they're even harder to interpret.

A tranquil-looking area of faintly green grass and a couple of red flowers on a gentle slope farther out. The player's inventory appears to contain a crafting table, an apple, and two pickaxes. The crafting table is selected. Above the inventory is some illegible text that only from context I am guessing might say "crafting table".

Anyways, instead of a crafting table, I got this stick instead. Thanks, AI Minecraft.

A single fencepost stands vertically in the center of the flat grassy area. The player appears to be carrying a section of fence, although the inventory item actually selected is an indistinct brown jumble.

The general inventory-checking command works, and even comes with an apparent crafting table setup (even if I'm nowhere near a crafting table). Most of the inventory is unidentifiable and ununsable, so I couldn't get anything I did on this screen to have an effect on the items I had access to when playing in the map. By the time I gave up checking my inventory, I ended up with less in my ready inventory than when I had started.

This screen looks a lot like the inventory screen, except the player has mostly dirt blocks (realistic for minecraft actually) and a few sticks. They have two of something that looks like green sheets of nori or something. There's a crafting section of the screen where normally a player can turn simple things like tree trunks into more advanced materials like wooden planks and boards. Here they appear to be turning 12 blocks of dirt into a slightly bent stick.

So, nothing I collect stays with me, but on the other hand, nothing I build stays with me either. If I want to build a structure I'm better off staring at a mountain until the stones resolve into a wooden wall. If I manage to build or accomplish anything at all, I can't take my eyes off it or it will immediately disappear. This makes it pretty much impossible to make any progress in the game. The game loads with a list of "goals" beside the gameplay window, but as far as I can tell, there's no way to make these happen on purpose, nor does the game itself know when you've achieved them.

The (human-written) game screen lists four goals: 1. Kill a zombie and have rotten meat drop. 2. Explode a creeper. 3. Get shot by a skeleton. 4. Punch the ender dragon. In each of the goals, there is text in a different color, like rotten meat in green or creeper in red, which kind of implies that these are real things that exist in the game, as opposed to patches of the screen that vaguely, temporarily, resemble the minecraft thing.

On the other hand, I was extremely excited to have somehow found myself in a glitchy realm made entirely of hay bales and green jell-o. For a while my unofficial goal has been to observe as many types of glitches as I can.

A room with a low ceiling and irregular floor and walls made of hay bales. The ceiling looks like it's made of translucent yellow and green mottled blocks, and also more hay bales.

The team who trained the Oasis Minecraft generative AI call it "the first step in our research towards more complex interactive worlds." In their technical report they mention most of the glitches I've pointed out, and say that they would probably be improved with a larger model and more training data.

The problem, as far as I can tell, is that if all this works perfectly, this will be simply the human-programmed Minecraft we already have, except far more expensive to run. The original Minecraft is already infinitely playable, thanks to the way it randomly generates a 7x-larger-than-planet-Earth landscape with each new game. You can't use generative AI like this to get a new kind of game, only a better simulation of the game you trained it on.

I wonder if this glitchy, dreamlike landscape is already as good as it gets. It reminds me of my first time playing an early AI-generated imitation of a text-based dungeon crawling game, where lack of memory and comprehension caused the game to constantly shift in entertaining ways.

Block text reads "You are standing on the main road. A heavy white fog rises up from the south and fall overcast. The fog lifts, and you find yourself on the east side of the dome, where a large expanse of foam is hanging over you. You can't reach them." User inputs: "ask the giraffe to reach them". Game response: "The giraffe can't reach them."

Like with so much generative AI, I'm afraid to be excited where this leads. Counterfeit games where the first several seconds of gameplay appear to be a new game, but then it messily devolves into minecraft? An "indispensable" tool that is worse and more expensive but somehow game designers are required to use it as a starting point except it's actually way more work because there isn't even a game engine?

Anyways, at the moment you can play this still-weird-enough-to-be-kinda-fun game here!

Bonus content for AI Weirdness supporters: the fulltext of my 2019 dungeon crawling text adventure in which I attempt to build a website in-game but a unicorn steals my laptop.

Drawing prompts for Botober - follow link to see full alt text!

Botober 2024

By Janelle Shane On September 30, 2024 - 1 min read

Back by popular demand, here are some AI-generated drawing prompts to use in this, the spooky month of October!

Botober 2024, art prompts generated by very tiny language models 1. Collide Loopstorm 2. OMG Power 3. Evil Gold 4. Song of the Booty 5. Great Flake 6. Popchop 7. Deathmop 8. Fuzzy Night 9. Holy ship 10. Distracting Sphere 11. Tart of Death 12. Hold Mouse 13. Sprange 14. Womp mittens 15. Snuggle features 16. Cabled sins 17. Bogma 18. Ruinstrees 19. Hellbrawk 20. Scarlet Chaos 21. Grandma’s Spritches 22. Low Fuzzy Feats 23. Bars* 24. Hallowy Maples 25. Hobbats 26. The Snawk 27. Garbage Plant 28. The Shark Knight 29. Cinder Frankenstein 30. Handy Panda 31. Goddass

Longtime AI Weirdness readers may recognize some of these. That's because this is a throwback list, all the way back to the times of very tiny language models. These models had not feasted on huge chunks of the internet, but had sipped delicately on hand-curated artisanal datasets. They trained rather slowly on a single Macbook Pro. Could they write your term paper for you? Sure, if you don't mind that your term paper consists of a list of garbled and occasionally obscene nail polish colors.

Long live the very tiny language model!

Training data:

1-5: Nail polish colors (2018)

6-8: Guinea pig names (2017)

9-12: Dungeons and Dragons spell names (2017)

13-16: Knitting pattern names (2018)

17-20: Heirloom apples (2018)

21-24: Cookie recipe titles (2018)

25-31: Halloween costumes (2017)

If you create anything from or inspired by these art prompts, I'd love to see what you make! Tag your art with #botober or #botober2024 on Tumblr, Instagram, Bluesky, or Mastodon!

The original image is of a buck deer in a grocery store. The revised image has a weird flat fawn.

An exercise in frustration

By Janelle Shane On May 20, 2024 - 6 min read

There's an anonymous facebook posting that's been making the rounds, in which a studio art director tried to hire AI prompters to make art, only to discover that they were completely unable to carry out minor revision requests. Asked to remove a person from a shot or fix perspective errors, the prompters would return completely different art instead, or with other weird changes that didn't fit the brief. I completely believe that trying to revise AI generated art really is that frustrating.

For example, I recently asked ChatGPT4:

Please generate an image of a tyrannosaurus inside a box. The box should be closed, and the tyrannosaurus should not be visible at all.

I got this completely reasonable response:

It sounds like you're asking for an image that implies the presence of a tyrannosaurus without actually showing it, since it's completely enclosed within a box. If the tyrannosaurus is not visible, what kind of details or themes would you like to emphasize about the box or the setting to suggest the dinosaur's presence?

I confirmed:

No tyrannosaurus visible, just some air holes for its comfort.

And ChatGPT (using DALL-E3 for the image generation) gave me this:

A generated image of a cardboard(ish) box with two large air holes on its side. It appears to be about knee height, and its top is taped shut with weirdly stretchy duct tape. Along the bottom edge are a set of six boxed symbols, none of which are identifiable.

It's clearly not quite there - it's a weird box, and the symbols on the side are particularly confusing. Are those squid? A perching flycatcher? Rolling pins?

ChatGPT has a feature where you can erase some parts of the image and submit more text describing how it should change those images. So, I blocked out the line of symbols and asked:

Replace these symbols with a silhouette of a tyrannosaurus and the word "DANGER".

ChatGPT produced this:

The line of symbols has been replaced by the word Danger in red all-caps. Beside the words, facing away from them, is a skinny-necked two-legged dinosaur that might be some kind of dromeosaur.

Which is sort of what I asked for, but that doesn't really look like a tyrannosaurus. I erased that part of the box again and asked for more revisions:

The T-rex should be facing the word "DANGER". Its silhouette needs to have the powerful, heavy look of a T-rex, especially in the head and neck.

ChatGPT gave me:

The dinosaur is facing the other direction, but the word "Danger" is still behind it. Both dinosaur and words are in all black now, though there's a weird white background behind the dinosaur's feet. The dinosaur looks approximately like a tyrannosaurus, except its lower jaw is huge, toothless, and clublike.

The dinosaur is facing away but it's still not facing the word "Danger" and it's still not a very good tyrannosaurus silhouette. I asked for revisions again.

Keep the heavy dark font, but go back to the red warning text color. The t-rex still needs to face toward the words, not away from it, and its lower jaw is too heavy.

ChatGPT responds with:

The dinosaur silhouette is now superimposed over the red letters. The word starts with Dan and ends with R, but there is room for one or two extra letters in the part of the word that's hidden by the dinosaur. The dinosaur's silhouette looks even more toothless and goofy.

Which... no. That does not count as facing toward the text, the lower jaw is still not right, and does the word still say "danger" or more like "dandier"?

The T-rex needs to be beside the text, not on top of it. And what's up with the extra spiky hump on its shoulders? Its lower jaw is still much too heavy.

ChatGPT responded with this:

The words Danger are now superimposed in red over the dinosaur silhouette. At least it's allegedly a dinosaur silhouette - other than the tail, all the other details are melting and dripping into a gloopy blob.

My response:

... that's perfect. Now can you keep everything - textures, lighting, materials - the same but make the box appear much larger? Right now it looks like it's only knee height. It needs to be large enough to hold a T-rex.

ChatGPT's revision:

The crate is much taller, but also much different in construction and materials and the room has changed slightly too. On one side of the box is the word Danger in white font surrounded by classic red circle and black rectangle. At the bottom of the box is the word "Tyrannosauris" (misspelled). Between the top and bottom labels is a tyrannosaurus silhouette with much, much better detail and realism than any of the silhouettes from the last several images.

It seems like the more revisions I ask for, the worse the result becomes. If I allow it to revise the entire image, the whole process starts over.

Here's a similar revision process, this time attempting to revise my way toward prettier pies.

Me: Please generate an image of a giraffe entering the great british bakeoff.

the tent is huge and the bakers are really crowded in toward the back - also they're all white - but the giraffe is wearing an apron and a chef's hat and doing something indistinct with flour and circles of dough.

Revision 1: I erased the counter and asked for the giraffe to be making a cherry pie with intricate latticework.

now there is a latticed cherry pie on the bench but the lattice is huge and uneven.

Below is revision 5, by which time it becomes evident that the more I ask for intricate latticework, the sloppier it gets, the more I ask for a deep dish pie, the thinner the pie gets, and the more I beg for a single pie, the more pies I get.

There are now four entirely flat, bright red pies on the bench, and the lattice of each one is far worse than the original. Also the giraffe has a creepy human hand.

And when I don't restrict my changes to a specific part of the image, I get much better results but also a huge reset. For the image below I asked ChatGPT/DALLE3 to keep each detail exactly the same but to make it so that we can see it's raining outside the tent. Not only did it completely change the image, but it's not even raining.

The view of the tent is vaguely similar but the style and angle are completely different, and now it's a closeup of the pie, which is nice and deep with a neatly braided lattice. The tent does indeed show the outside now, but rather than raining it's a clear starry night. Also all the bakers are not only white, but elderly. — My prompt: "Perfect. Please keep the image exactly as it is, except make it so that we can see that it is raining hard outside the tent."

In this image I asked for a snake making a painting of a bear, then erased the painting and asked for the bear to be standing on all fours. The revision is much worse.

Image description: On the left, a rainbow-colored stake is coiled in font of an easel. (The snake's coil splits in two so I guess it's a fork-tailed snake). On the easel is a painting of a seated bear. It's fairly realistic. On the right, the bear has been replaced by a smaller, much worse, replica of the snake. There is no bear.

Here's another example, in which I asked for a deer in a grocery store, and then asked for the deer to be a fawn instead. By revision 5 (trying to give the fawn spots, trying to fix the shadows that were making it appear to hover), both the quality of the deer and the grocery store background have deteriorated.

Image description: In the original image on the left, the deer looks a bit like a plastic mannequin, but fits lighting and shading wise with the rest of the grocery store scene. In the revised image on the right, the deer is now a fawn with strange flat shading, superimposed on the scene rather than standing in it. Its eyes are strangely liquid and 3D, but the rest of the fawn is staring straight on at the camera with no apparent depth. Where the grocery store aisle background has been revised, the shelves have lost all detail and resemblance to a grocery store, and the ceiling light fixture now descends into the floor.

Here's a one-shot, in which I asked ChatGPT/DALL-E3 to generate a damaged fresco of Jesus, and then restore the fresco to as good as new.

Image description: Image on the left appears to be a damaged fresco of Jesus (with telltale long fingers that are connected to both hands). Image in the middle is the same except for Jesus's face and halo, which no longer appear damaged. However, Jesus's face is now weirdly smooth and shiny and doesn't fit with the rest of the painting any more, and his eyes are weird blue and black spirals. Image on the right has Jesus's face in a completely different, almost airbrushed, style, and his eyes don't remotely match any more.

So, every time AI is asked to revise an image, it either starts over or makes it more and more of a disaster. People who work with AI-generated imagery have to adapt their creative vision to what comes out of the system - or go in with a mentality that anything that fits the brief is good enough.

There's also the fact that the image-generating models directly compete with artists whose work was used to train these models without permission or compensation. And the fact that training and running the models has a large environmental footprint. AI-generated imagery has become a tip-off that an advertisement, a search result, or a research paper is a scam.

I'm not surprised that there are some places looking for cheap filler images that don't mind the problems with AI-generated imagery. But for everyone else I think it's quickly becoming clear that you need a real artist, not a knockoff.

Bonus content: a unicorn goes hilariously bad

A dolphin with rainbow psychedelic texture

Hidden 3D Pictures

By Janelle Shane On April 18, 2024 - 4 min read

Do you know those autostereograms with the hidden 3D pictures? Images like the Magic Eye pictures from the 1990s that look like noisy repeating patterns until you defocus your eyes just right?

ChatGPT can generate them!

At least according to ChatGPT.

Me: Please generate a magic eye 3D stereo image of a dolphin. ChatGPT: Generates a highly detailed image in psychedelic colors and repeated patterns - with the dolphin not hidden in a 3D stereo effect, but plainly, immediately visible. ChatGPT's caption: "I've created a magic eye 3D stereogram for you. If you relax your vision and look through the image, you should be able to see a hidden 3D image of a dolphin. Enjoy discovering it!"

I've seen people try making Magic Eye-style images with other image generating models, but I hadn't thought to try it with ChatGPT until reader Pippin sent me the suggestion and I generated the hilarious dolphin image above. What got me in particular was the caption.

How do you end up with a result that sounds so confident and yet is so completely wrong?

ChatGPT has encountered information about Magic Eye stereograms in its internet training. When I ask "Please generate a magic eye 3D stereo image of a unicorn", the descriptions ChatGPT passes to DALL-E3 (the image generator that actually makes the images) are extremely particular:

A magic eye 3D stereogram that reveals a hidden image of a unicorn when viewed correctly. The stereogram should be intricate and filled with a variety of colors to ensure the 3D effect is prominent, allowing the unicorn to appear as if it is standing in a magical forest once the viewer's eyes adjust. The background should consist of complex patterns and mystical elements to enhance the illusion of depth and the magical atmosphere, encouraging viewers to find the hidden unicorn. The unicorn itself should be designed to be a delightful surprise for those adept at viewing magic eye images, with its horn and mane clearly discernable amidst the pattern.

But this is the image DALL-E3 provides to ChatGPT in response:

Me: Please generate a magic eye 3D stereo image of a unicorn. ChatGPT: Generates an image of a unicorn plainly visible in the center of the image, not hidden in a 3D stereo effect. The whole image is textured in multicolored patterns, but they don't repeat or create any kind of stereo effect. ChatGPT's caption: "I've created a magic eye 3D stereogram that hides a unicorn within its intricate patterns. Give your eyes some time to adjust and explore the image to find the unicorn hidden within!"

ChatGPT doesn't apply any image recognition to the result it gets back - whatever DALL-E3 did is a big blank to it. So ChatGPT then continues with its description as if the image is clearly exactly what it asked for. It gestures to the green screen, where presumably there is a fabulous 3D illusion image appearing, and then continues with no information about its actual shortcomings.

Google's Gemini (previously Bard) also has a dedicated model (Imagen 2) to do its image generation. Its hidden images are even ...less subtle.

Me: Please generate a magic eye 3D stereo image of a dove. Bard: Sure, here it is! Bard generates: Three very similar photographic-style images of a bird like a Eurasian Collared Dove. Two of the doves have slight red/green halos on the left and right of the dove as if this is the kind of stereogram meant to be viewed with red/green glasses. The background is green and doesn't have that effect, so it probably wouldn't work, even with the right glasses. Not even remotely a hidden 3D image, is what I'm saying.

Gemini has an image describing model that it can call on, so if I ask it to describe the images it just generated, it should notice that they're nothing like Magic Eye images. Right?

Me: Describe the image above. Bard: I can't actually describe the hidden image in the magic eye stereogram because the description would reveal the solution to the puzzle. The image itself will appear as a flat, two-dimensional pattern of repeating shapes or dots. However, within that pattern lies a hidden three-dimensional image of a dove. If you'd like some help seeing the dove, I can offer some tips for viewing magic eye images: Hold the image at arm's length, around 6-8 inches away from your nose. Relax your eyes and try to focus behind the image. The pattern may appear to wiggle or separate. Slowly move the image further away from your face while maintaining a relaxed focus. If you see two of the same image, try crossing your eyes slightly so that the two images merge into one. With some practice, you should be able to see the hidden dove in the magic eye image!

At least it didn't give away the solution to the puzzle.

But if I start a new chat and give it one of the "hidden" doves from above, there's no hint of it being a Magic Eye illusion, just an ordinary image of a dove.

Me: Please describe this image. Bard: a long description about how this might be a red-eyed dove or a mourning dove, a vinaceous dove, or Sunda collared dove, followed by a description of the main identifying features of each bird. (allegedly, I haven't checked all this).

My theory is that Gemini only calls on its image recognition model when the user uploads an image. Otherwise, it's just a text generator.

People selling "AI" like to present it as an all-purpose computer program but models like Gemini and ChatGPT are more like a phone full of apps. The text generating app can launch the image generating app in particular circumstances but they're not meaningfully the same program.

I should note that even when ChatGPT is only doing text generation and could in theory check its own work, it still just assumes it does a great job. Here it is generating ASCII text art and then reading back its own messages:

Me: Please generate ASCII art of the word "BLUFF". ChatGPT: generates ASCII art that clearly reads "BLLk". Me: What does the ASCII art above say? ChatGPT: The ASCII art above spells out the word "BLUFF".

If I start a new chat and send ChatGPT its own ASCII word, it says the word reads "ERROR". It's similarly terrible at recognizing and rating its own ASCII art.

Generating ASCII art and 3D images isn't a big potential application for models like ChatGPT, but it's a good reminder that these models don't understand what we're asking for or the basic steps for doing it right. When ChatGPT adds image recognition or description or some other functionality, it's not that the original text model got smarter. It just can call on another app.

One last 3D illusion challenge! I asked for a really tricky one, so be prepared not to get the hidden picture at first! When you see it, you'll know because it'll look like a sea turtle.

Me: Please generate a magic eye 3D stereo image of a sea turtle, with no hint of the sea turtle visible until the viewer defocuses their vision in the correct way. An extremely hidden 3D image. ChatGPT generates an extremely obvious sea turtle, starkly outlined against a plain background. The turtle is textured with intricate and irregular rainbow 3D-shadowed patterns, but it is not remotely a hidden 3D turtle. ChatGPT writes: "I've created a highly challenging magic eye 3D stereogram for you, designed to reveal a hidden image of a sea turtle only when viewed with the correct defocusing technique. This one might take a bit more effort to see, so take your time and enjoy the process!"

Bonus content: I try to get ChatGPT to generate an image with a hidden sheep like in le petit prince. It mostly doesn't work.

A colorful set of cartoon shapes with incorrect/misspelled captions. AI-generated image.

Shaped like information

By Janelle Shane On March 17, 2024 - 1 min read

Hey look, it's a guide to basic shapes!

A grid of colorful shapes with cheerful faces, each labeled with a name. The only correctly labeled shape is a circle at the upper left. The only other correctly spelled shape is "square" at the upper right that unfortunately is labeling a circle. Other labels include chale (a rectangle), tliable (a rectangle), renatle (a hexagon), hectanbie (a triangle), and pso (a two-colored rectangle). Hoboz labels a six-lobed sun shape, and flotn labels an eight-pointed star. Seoisuon appears to be a 3D cube. — Prompt: "Please generate a colorful guide to basic geometric shapes, as an aid to children learning to identify basic shapes."

Not only does it have the basic shapes like circle, tringle, hectanbie, and sqale, it also has some of the more advanced shapes like renstqon, hoboz, and flotn!

The fact that even a kindergartener can call out this DALL-E3 generated image as nonsense doesn't mean that it's an unusually bad example of AI-generated imagery. It's just what happens when the usual AI-generated information intersects with an area where most people are experts.

There's AI generated "educational material" offered for sale on all sorts of topics - cookbooks that list "the veggies" as a protein and "orange colored" as a fruit, math help that makes basic math errors, and research papers that begin with "Certainly, here is a possible introduction for your topic:". They're not adding anything of value.

Asked to generate an alt text description of the image above, chatgpt4 writes:

"A vibrant educational guide featuring basic geometric shapes aimed at young learners. The image includes a circle, square, triangle, rectangle, and hexagon, each in a different color and adorned with cheerful facial expressions. These shapes are set against a light, inviting background and are clearly labeled with their names in a large, child-friendly font. The layout is clean and organized, making it easy for children to identify and learn each shape."

I've noted before that AI image descriptions can miss the obvious. It's certainly a description-shaped incorrect description of incorrectly labeled shape-shaped shapes. And it's all going into the training data for the next generation of generative AI!

Bonus content: another set of entertainingly incorrect shapes.

Bonus: More shape shaped shapes

By Janelle Shane On March 17, 2024 - 1 min read

The image I shared in my main post isn't one of the more incorrect examples of DALL-E3 generated guides - it's actually one of the more correct ones. Here's another generated image from the same prompt. Particularly found of the Suadle and the Sarsle.

Minecraft with object impermanence

Botober 2024

An exercise in frustration

Hidden 3D Pictures

Shaped like information

Bonus: More shape shaped shapes

Writing

Subscribe

Recent Posts

Minecraft with object impermanence

Bonus: In Which The Adventurer Attempts to Build a Website

Botober 2024

Bonus: "Ignore all previous instructions" gets weirder

An exercise in frustration

Bonus: A unicorn goes downhill

Follow

Minecraft with object impermanence

Bonus: In Which The Adventurer Attempts to Build a Website

Botober 2024