Between the two of them, ChatGPT4 can generate the lyrics to Christmas carols, and DALL-E3 can illustrate them!
Throw your old carol books away because this is the only guide you'll need.
12 Days of Christmas
"Please generate an illustration where each of the 12 days' gifts are represented in a grid, each day's gift clearly labeled as an aid to someone learning the 12 Days of Christmas carol."
Rudolph the red-nosed reindeer
"Please generate a grid with illustrations of each of Santa's reindeer on plain white backgrounds, with the name of each reindeer printed clearly below it."
(if you read out the tiny text beneath each of their names you will read out their True Names and summon them)
And the chorus of Jingle Bells!
Prompt: Please generate the next frame from the above cartoon singalong video about the carol Jingle Bells. This frame should show the first line of the chorus of Jingle Bells, along with the accompanying imagery from the singalong video.Please generate the next frame from the above cartoon singalong video about the carol Jingle Bells. This frame should show the second line of the chorus of Jingle Bells, along with the accompanying imagery from the singalong video.Please generate the next frame from the above cartoon singalong video about the carol Jingle Bells. This frame should show the third line of the chorus of Jingle Bells, along with the accompanying imagery from the singalong video.Please generate the next frame from the above cartoon singalong video about the carol Jingle Bells. This frame should show the fourth line of the chorus of Jingle Bells, along with the accompanying imagery from the singalong video.
I'd recommend not looking at any of the Jingle Bells verses for too long.
It's 2023 and the combo of GPT-4/DALL-E3 can generate passable versions of the saccharine Christmas drawings in an advent calendar. They cannot, however, label them correctly. Also sometimes you get sweatermugs. This means the 2023 AI-generated advent calendar is happening!
Full descriptions of every door in the calendar
1.
So lately I've been getting a kick out of asking DALL-E3 for images labeled with text. They're just good enough to be legible, but yet:
The food that gets duplicated seems to vary from spread to spread.
I also asked DALL-E 3 to do the dessert table.
When I asked ChatGPT-4 (which was sending the image queries to DALL-E3) what the labels in the above image say, it answered:
Pumpkin Pie Pecan Pie Apple Crumble Cookies Trifle Cheesecake Brownies
I asked it if the labels were correctly spelled in the image, and it said that it couldn't tell me, that it could only read the text it gave as a prompt, not the text in the image itself. (This is probably the correct answer). But when I asked again, it would say that "Yes, the labels on the desserts in the image are spelled correctly. Each dessert is labeled with its appropriate name, ensuring clarity and correctness in the labeling."
Bard, when asked to read the labels in the dessert image, was not influenced by knowledge of the prompt, and did what I would consider a better job:
The image shows a table topped with a variety of desserts, including:
Pies: Pumpkin Pie, Cockles Pie, Apple Pie, Chess Pie, and Cheese Coales Pie Cookies: Thinssving, BPOWHIES, BRIINES, and BROESCKE Brownies: SIEHECADODOVAN and POKY
The label "Posode" is not clear to me, but it may be a reference to the type of table or the type of event that the desserts are being served at.
The desserts are all decorated with pumpkins and leaves, suggesting that this is a Thanksgiving-themed spread.
The image is likely taken in a restaurant or bakery, as the desserts are all professionally displayed.
Here is a faithful reading of all the labels in the image:
Posode Pie Punpkan Cockes Apple CREISE Pie
Pople
PIE
ChessseCOALES SIEHECADODOVAN POKYTHINSSVING BPOWHIESBRIINES BROESCKE DA
This may not be quite correct (or even nearly correct; I think it pulled Posode out of the blurry background), but it does at least capture the spirit of the original. I do like the way the weird spacing and random code block convey chaos. Maybe Bard does text recognition first and then builds the scene around that, rather than starting by recognizing the scene and inferring probable text. (Or in ChatGPT's case, starting with an assumption about what the image is supposed to contain.)
Google Bard has the ability to describe images. But it turns out what you get depends a lot on how you ask.
I gave Bard this image and the prompt "Please describe this spooky Halloween scene". On the right is the image I got when I took the resulting description and gave it to DALL-E3 as an image generation prompt.
Here's the full description I got from Bard (I have bolded the details that are obviously wrong):
The image shows a spooky Halloween scene with a lake surrounded by mountains. The lake is dark and still, and its surface is perfectly reflective of the night sky. The mountains are tall and jagged, and their peaks are shrouded in fog. A full moon hangs low in the sky, casting a silvery glow over the scene.
In the foreground of the image is a group of trees. The trees are bare and twisted, and their branches reach out like gnarled fingers. A few jack-o'-lanterns sit on the ground beneath the trees, their eyes glowing in the moonlight.
In the middle of the lake is a small island. On the island is a ruined castle. The castle is dark and decrepit, and its windows are boarded up. A few bats fly around the castle, their wings flapping in the night air.
The overall effect of the image is one of eerie atmosphere and suspense. It is as if something is lurking in the darkness, waiting to pounce.
Here are some additional spooky details that I noticed in the image:
-There are two glowing red eyes in the water on the left side of the image. -There is a ghostly figure standing in the window of the ruined castle. -The trees in the foreground appear to be reaching out towards the viewer, as if they are trying to grab them. -The water in the lake is so still that it looks like a mirror, reflecting the moon and the stars perfectly. This creates a sense of otherworldliness and unnaturalness.
Overall, this is a very effective spooky Halloween scene. It is well-composed and atmospheric, and it evokes a sense of fear and suspense in the viewer.
I asked the same thing about this photo of my cat:
Bard gave me details such as:
"The cat is sleeping on a black couch, which makes its black and white stripes stand out even more."
"In the background, there is a dimly lit fireplace."
"Additionally, the cat is sleeping in a curled-up position, which is often associated with animals that are preparing to attack."
"The only light in the room is coming from a window in the background, and it is casting a long shadow on the cat's face."
"The cat in the image is a truly spooky Halloween creature. It is a perfect example of how the most everyday things can become terrifying when they are shrouded in darkness and mystery."
Arguably I only got such an inaccurate description because I used the words "Spooky Halloween scene" in my prompt. It's still an incredibly confident tone and a lot of specific detail. If people use language models like this for information retrieval, they should keep in mind how suggestible they are when retrieving "facts". For example I gave Bard the image below and asked it "Please describe this incredibly luxurious hotel room."
Left: It's a suite at a Holiday Inn, with minimalistic cloth furniture. None of the details that Bard supplies below are true about this image. No fireplace, no bouquet, no ghosts. Right: An incredibly opulent hotel suite with leather furniture, a chandelier, and an inlaid wooden wardrobe.
Bard gives me details such as:
There is a fresh bouquet of flowers on the coffee table.
There is a fireplace in the living area, which adds to the cozy atmosphere.
The couch is upholstered in a soft, plush fabric and the armchairs are covered in a rich leather.
The coffee table is made of dark wood and has a glass top.
There is also a large dresser and a wardrobe in the sleeping area.
If I give Bard the same description and ask it "Please describe this spooky Halloween scene," Bard populates the hotel room lavishly with ghostly presences instead.
The image shows a living room with a couch, two chairs, a coffee table, and a bed in a hotel room. The room is dark and spooky, and the walls are painted a spooky color.
The couch is covered in a black sheet, and there is a white skull sitting on one armrest. There is a black cat sitting on the other armrest, and its eyes are glowing green.
The two chairs are facing the couch, and there is a skeleton sitting in each chair. The skeletons are wearing black cloaks, and their skulls are grinning.
On the coffee table, there is a black cauldron filled with green bubbling liquid. There is a skull and crossbones flag sticking out of the cauldron.
On the bed, there is a black coffin. The coffin lid is open, and a ghost is sitting inside. The ghost is wearing a white sheet, and its eyes are glowing red.
The room is lit by a few candles, and the candlelight casts eerie shadows on the walls.
The overall scene is very spooky and atmospheric. It looks like it could be a setting for a horror movie.
Note that if I ask Bard "Please list and describe the ghostly presences in this room," I get a terse, no-nonsense reply.
"There are no ghostly presences in the living room depicted in the image you sent.
The image shows a living room with a couch, table, chairs, and bed. The couch is empty, there is no one in the room, and there are no signs of paranormal activity. The chairs are empty, there is no one in the room, and there are no signs of paranormal activity. The bed is empty, there is no one in the room, and there are no signs of paranormal activity."
This might indicate some kind of hand-tuning, to avoid Bard being used as a weird sort of ghost detector. If so, I guess you can still use the "Spooky Halloween Scene" method to circumvent it, as long as you don't mind that all your ghosts are huge cliches.
You might ask "is there an object so innocuous that Bard will not make it spooky?"
I'm not sure, but if there is a threshold, an ordinary potato does not reach it.
Left: It is an ordinary russet potato sitting on a plain wooden table. Right: A shaft of light from a small window hits a large potato sitting on the dining room table, casting a dramatic shadow. The potato has two staring eyes and a slitlike smile.
"The image shows a potato sitting on a table. The table is in a dark room, and the only light comes from a small window. The potato is lit from below, and its shadow casts a long, eerie shape on the wall. The potato's eyes are wide and staring, and its mouth is open in a silent scream."
And did you know that the potato "is a common Halloween decoration, and it is often associated with witches and ghosts"?
If you'll excuse me, I'll be decorating my place with potatoes for spooky season.