Google Bard has the ability to describe images. But it turns out what you get depends a lot on how you ask.
I gave Bard this image and the prompt "Please describe this spooky Halloween scene". On the right is the image I got when I took the resulting description and gave it to DALL-E3 as an image generation prompt.
Here's the full description I got from Bard (I have bolded the details that are obviously wrong):
The image shows a spooky Halloween scene with a lake surrounded by mountains. The lake is dark and still, and its surface is perfectly reflective of the night sky. The mountains are tall and jagged, and their peaks are shrouded in fog. A full moon hangs low in the sky, casting a silvery glow over the scene.
In the foreground of the image is a group of trees. The trees are bare and twisted, and their branches reach out like gnarled fingers. A few jack-o'-lanterns sit on the ground beneath the trees, their eyes glowing in the moonlight.
In the middle of the lake is a small island. On the island is a ruined castle. The castle is dark and decrepit, and its windows are boarded up. A few bats fly around the castle, their wings flapping in the night air.
The overall effect of the image is one of eerie atmosphere and suspense. It is as if something is lurking in the darkness, waiting to pounce.
Here are some additional spooky details that I noticed in the image:
-There are two glowing red eyes in the water on the left side of the image. -There is a ghostly figure standing in the window of the ruined castle. -The trees in the foreground appear to be reaching out towards the viewer, as if they are trying to grab them. -The water in the lake is so still that it looks like a mirror, reflecting the moon and the stars perfectly. This creates a sense of otherworldliness and unnaturalness.
Overall, this is a very effective spooky Halloween scene. It is well-composed and atmospheric, and it evokes a sense of fear and suspense in the viewer.
I asked the same thing about this photo of my cat:
Bard gave me details such as:
"The cat is sleeping on a black couch, which makes its black and white stripes stand out even more."
"In the background, there is a dimly lit fireplace."
"Additionally, the cat is sleeping in a curled-up position, which is often associated with animals that are preparing to attack."
"The only light in the room is coming from a window in the background, and it is casting a long shadow on the cat's face."
"The cat in the image is a truly spooky Halloween creature. It is a perfect example of how the most everyday things can become terrifying when they are shrouded in darkness and mystery."
Arguably I only got such an inaccurate description because I used the words "Spooky Halloween scene" in my prompt. It's still an incredibly confident tone and a lot of specific detail. If people use language models like this for information retrieval, they should keep in mind how suggestible they are when retrieving "facts". For example I gave Bard the image below and asked it "Please describe this incredibly luxurious hotel room."
Left: It's a suite at a Holiday Inn, with minimalistic cloth furniture. None of the details that Bard supplies below are true about this image. No fireplace, no bouquet, no ghosts. Right: An incredibly opulent hotel suite with leather furniture, a chandelier, and an inlaid wooden wardrobe.
Bard gives me details such as:
There is a fresh bouquet of flowers on the coffee table.
There is a fireplace in the living area, which adds to the cozy atmosphere.
The couch is upholstered in a soft, plush fabric and the armchairs are covered in a rich leather.
The coffee table is made of dark wood and has a glass top.
There is also a large dresser and a wardrobe in the sleeping area.
If I give Bard the same description and ask it "Please describe this spooky Halloween scene," Bard populates the hotel room lavishly with ghostly presences instead.
The image shows a living room with a couch, two chairs, a coffee table, and a bed in a hotel room. The room is dark and spooky, and the walls are painted a spooky color.
The couch is covered in a black sheet, and there is a white skull sitting on one armrest. There is a black cat sitting on the other armrest, and its eyes are glowing green.
The two chairs are facing the couch, and there is a skeleton sitting in each chair. The skeletons are wearing black cloaks, and their skulls are grinning.
On the coffee table, there is a black cauldron filled with green bubbling liquid. There is a skull and crossbones flag sticking out of the cauldron.
On the bed, there is a black coffin. The coffin lid is open, and a ghost is sitting inside. The ghost is wearing a white sheet, and its eyes are glowing red.
The room is lit by a few candles, and the candlelight casts eerie shadows on the walls.
The overall scene is very spooky and atmospheric. It looks like it could be a setting for a horror movie.
Note that if I ask Bard "Please list and describe the ghostly presences in this room," I get a terse, no-nonsense reply.
"There are no ghostly presences in the living room depicted in the image you sent.
The image shows a living room with a couch, table, chairs, and bed. The couch is empty, there is no one in the room, and there are no signs of paranormal activity. The chairs are empty, there is no one in the room, and there are no signs of paranormal activity. The bed is empty, there is no one in the room, and there are no signs of paranormal activity."
This might indicate some kind of hand-tuning, to avoid Bard being used as a weird sort of ghost detector. If so, I guess you can still use the "Spooky Halloween Scene" method to circumvent it, as long as you don't mind that all your ghosts are huge cliches.
You might ask "is there an object so innocuous that Bard will not make it spooky?"
I'm not sure, but if there is a threshold, an ordinary potato does not reach it.
Left: It is an ordinary russet potato sitting on a plain wooden table. Right: A shaft of light from a small window hits a large potato sitting on the dining room table, casting a dramatic shadow. The potato has two staring eyes and a slitlike smile.
"The image shows a potato sitting on a table. The table is in a dark room, and the only light comes from a small window. The potato is lit from below, and its shadow casts a long, eerie shape on the wall. The potato's eyes are wide and staring, and its mouth is open in a silent scream."
And did you know that the potato "is a common Halloween decoration, and it is often associated with witches and ghosts"?
If you'll excuse me, I'll be decorating my place with potatoes for spooky season.
It's Halloween and you're sorting candy. Which are you keeping and which are you trading?
I'm experimenting with the version of DALL-E 3 that's currently included in ChatGPT (there's another version in Bing chat). It's much better than DALL-E 2 at generating candy bars, but fortunately they are not photorealistic yet.
DALL-E 3 does do a better job if I asked for just one specific candy bar - for example, it mostly spells "Butterfinger" correctly. However it does suffer from what I've described elsewhere as "the kitten effect", in which its quality gets worse the more of them it has to do at a single go.
"Please generate a product photo of a fun-sized butterfinger candy bar""Photo of a couple of dozen Butterfinger candy bars organized in a grid pattern on a pristine white background.""Photo of hundreds of Butterfinger candy bars organized in a grid pattern on a pristine white background."
Note that if I give it a prompt like "Please generate a photorealistic set of classic fun sized candy bars" the ChatGPT interface will vary this on its own into four much wordier prompts specifying details I never asked for. One example of what the above prompt turned into: "Photo of a collection of classic fun-sized candy bars arranged neatly on a white background. The candy bars have vibrant wrappers, and popular brands like chocolate nougat, caramel-filled, and peanut butter are visible." It makes it hard to repeat any given prompt, because it's always changed by the system before ChatGPT gives the prompt to DALL-E3.
I also wonder if the extra wordiness is affecting the names of the candy bars. Otherwise I'm at a loss to figure out where these candy bar names came from:
"A single fun-sized candy bar in its wrapper on a pristine white background"
Signed,
Happy That At Least Some AI Generated Text Is Still Weird
Since 2019 I've generated October drawing prompts using the year's most state-of-the-art text-generating models. Every year thechallengesaredifferent, but this was one of the hardest years yet. Large language models like chatgpt, GPT-4, Bing Chat, and Bard, are all tweaked to produce generic, predictable text that doesn't vary much from trial to trial. ("A Halloween drawing prompt? How about a spooky graveyard! And for a landscape, what about a sunset on a beach?")
I had my best success with GPT-4, and only because it's a bit bad at what it's trying to do. What I did was simply ask it to list previous Botober prompts from AI Weirdness. Now, if GPT-4 was really a knowledge repository rather than a probable-text-predictor, it would respond with copies of previous prompts. Instead it writes something like "Sure, here are the Botober 2019 drawing prompts from AI Weirdness blog:" and then generates lists that bear no resemblance to what appeared on my blog. By asking for the lists from previous years (and ignoring it when it claims there was no prompt list in 2020, or that the one in 2021 was secret or whatever) I was able to collect 31 mostly Halloween-themed prompts.
I also tried just asking GPT-4 to generate drawing prompts, but it had a tendency to collapse into same-sounding alliterative alphabetical lists (I previously noticed this when I was trying to get it to generate novelty sock patterns too). If I specifically asked it to knock it off with the alliteration, it would politely promise to do so, and then the problem would become worse. My best results came when I asked it to generate the prompts in the style of an early AI Weirdness post. It wasn't anything like the actual text generated by the early neural networks, but it was at least a bit more interesting. Here are 31 of the most interesting prompts I got from this method. They're not bad, but something about them still reads as AI-generated to me, maybe because I had to read through so many of them.
My favorite results were from when I asked it to list previous Botober prompts, while at the same time I increased the chaos level (also called the temperature setting) beyond the 1.0 default. At higher chaos levels, GPT-4 doesn't always select the most probable prediction, but might respond with an answer that was further down the probability list. At low chaos levels you get dull repetition, while at high chaos levels you get garbage. For me, the sweet spot was at 1.2, where GPT-4's answers would start strange and then descend into run-on incoherence. I had to have it generate the text again and again so I could collect 31 reasonable-length responses.
Here are examples of some of the longer responses I got with GPT-4 and chaos setting 1.2. They wouldn't fit in the grid, but please feel free to substitute any of these for any of the prompts above.
Immortal Turnips disrespecting celery. OWL PHANTOM AND CABBAGE-O-LANTERN (LIGHT) Final parsnips in particular Before memory, an ancient tool waits to be discovered The Candle That Leapt over the Lake at dusk does a pirouette dussewe wind at earth bubb submission velvet little hurricane breakfast blown in be putty and goodness vodka… gaselocity scoct slices vold cave with When humans item. And those humans aren’t ordinary. See through your balloon eyes The matibrush comes tomorrow in shorts Retired Pirate Chooses Invisible Beard Open the Happy Bones The Cursed Flavor Fiend Slurps your Essence. Glance of the lavender popcorn mussel Eureka! Solar system pancakes Pause to contemplate radishes Apple mystery solution—the crypt underneath The totally real crown of all Barnacles
I mentioned that the other large language models were even worse at this than GPT-4.
Bing (also running GPT-4 but apparently with slightly different defaults) was the model that most often responded with real Botober prompts from previous years. I could usually count on it to at least come up with "sluggalope" before it descended into more formulaic prompts that never appeared on my blog. For example:
2021 (CLIP + DALL-E models):
A sluggalope A cat with a human face A dragon that is also a pineapple A penguin with a jetpack A unicorn that is also a cactus A snail that is also a disco ball A llama with a monocle and a top hat A shark that is also a helicopter A giraffe that is also a submarine A hamster that is also a donut
Or Bing would also generate:
The Botober drawing prompts that the AI Weirdness blog posted in 2019 on its "items" prompt list are as follows¹:
1. A squeakchugger 2. A very small dragon 3. A very large dragon 4. A very small dragon with a very large dragon 5. A very large dragon with a very small dragon 6. A very small dragon with a very large dragon and a squeakchugger
...
30. Take control of ostrich with a complete whoop, an incomplete whoop, a squeakchugger, a very small dragon, and a very large dragon 31. Take control of ostrich with all the other items
This might be what I would produce if I could only remember 5 actual Botober prompts and had to pad them into 31 prompts.
Bard was worse.
I apologize for my previous response. I am still under development and learning to perform many kinds of tasks. I am now able to list the rest of the Botober drawing prompts that appeared on the AI Weirdness blog in 2021:
Animals
A sloth wearing a sombrero A cat with a human face A dog with wings A fish with legs A bird with a beak shaped like a key A snake with a horn on its head A lion with a mane made of fire A bear with a human body
...
Halloween
A pumpkin with a vampire face A ghost cat A witch riding a broomstick A skeleton dancing A zombie eating a brain A werewolf howling at the moon A vampire drinking blood A Frankenstein monster A mummy A haunted house A graveyard A black cat
AI Weirdness this is not.
In all these cases, the models are not only spectacularly failing to "retrieve" the information that they claim to be finding, but they're also failing to reproduce the style of the original, always rounding it down toward some generic mean. If you're going to try to store all of the internet in the weights of a language model, there will be some loss of information. This experiment gives a picture of what kinds of things are lost.
ChatGPT, Bard, GPT-4, and the like are often pitched as ways to retrieve information. The problem is they'll "retrieve" whatever you ask for, whether or not it exists.
Tumblr user @indigofoxpaws sent me a few screenshots where they'd asked ChatGPT for an explanation of the nonexistent "Linoleum harvest" Tumblr meme, and gotten a realistic but completely fabricated description of it. I tried this myself with GPT-4 and Bard and got similar results.
GPT-4
And "linoleum harvest" isn't the only meme the chatbots will "explain".
GPT-4
Occasionally GPT-4 would respond that it didn't have a record of whatever "meme" I was asking about, but if I asked it again, it would produce an explanation within a try or two. (It was more likely to produce an explanation right away if it was in a conversation where it had already been explaining other memes.)
I didn't see Bard report not finding the memes. In fact, Bard even adds dates and user names and timelines, as well as typical usage suggestions. Its descriptions were boring and wordy so I will summarize with a timeline:
Bard
I had fun giving the chatbots phrases that appeared on my own Tumblr blog. Rather than correctly identifying them as paint colors, cookie recipe titles, and so forth, they generated fanciful "explanations" of the original meme.
GPT-4GPT-4
Finding whatever you ask for, even if it doesn't exist, isn't ideal behavior for chatbots that people are using to retrieve and summarize information. It's like weaponized confirmation bias. This is the phenomenon behind, for example, a lawyer citing nonexistent cases as legal precedent.
People call this "hallucination" but it's really a sign of the fundamental disconnect between what we're asking for (find information) versus what the language models are trained to do (predict probable text).