Since 2019 I've generated October drawing prompts using the year's most state-of-the-art text-generating models. Every year the challenges are different, but this was one of the hardest years yet. Large language models like chatgpt, GPT-4, Bing Chat, and Bard, are all tweaked to produce generic, predictable text that doesn't vary much from trial to trial. ("A Halloween drawing prompt? How about a spooky graveyard! And for a landscape, what about a sunset on a beach?")

I had my best success with GPT-4, and only because it's a bit bad at what it's trying to do. What I did was simply ask it to list previous Botober prompts from AI Weirdness. Now, if GPT-4 was really a knowledge repository rather than a probable-text-predictor, it would respond with copies of previous prompts. Instead it writes something like "Sure, here are the Botober 2019 drawing prompts from AI Weirdness blog:" and then generates lists that bear no resemblance to what appeared on my blog. By asking for the lists from previous years (and ignoring it when it claims there was no prompt list in 2020, or that the one in 2021 was secret or whatever) I was able to collect 31 mostly Halloween-themed prompts.

1. Bread sky 2. Disrespecting the ancient elder tea 3. Exceptionally mischievous pumpkin 4. Ghost with an old-timey job 5. Skeleton with TOO many bones 6. An unusually difficult to unwrap candy 7. A vampire with an unhelpful reflection 8. Broomstick trying out a new career 9. Scarecrow trying to be extra scary 10. Zombie that just won’t quit 11. Witch hat that’s up to no good 12. Unreasonably large collection of eyeballs 13. A golem trying to stay hidden 14. A kraken... but tiny 15. Octopus finds a ukelele in a haunted house 16. A cryptid in your cupboard 17. A dragon of patches 18. A ghost of a teapot 19. A kangaroo in the underworld 20. A raven made out of keys 21. Today, the skylark is a superhero 22. A cushion with a bitter past 23. Vampire cabbages 24. Several pumpkin spying 25. A surprising pigeon 26. A small piece of disco 27. A pickles dragon 28. A haunted bowl of soup 29. An inexplicable penguin 30. A moth of mirrors 31. An absolutely massive bat

I also tried just asking GPT-4 to generate drawing prompts, but it had a tendency to collapse into same-sounding alliterative alphabetical lists (I previously noticed this when I was trying to get it to generate novelty sock patterns too). If I specifically asked it to knock it off with the alliteration, it would politely promise to do so, and then the problem would become worse. My best results came when I asked it to generate the prompts in the style of an early AI Weirdness post. It wasn't anything like the actual text generated by the early neural networks, but it was at least a bit more interesting. Here are 31 of the most interesting prompts I got from this method. They're not bad, but something about them still reads as AI-generated to me, maybe because I had to read through so many of them.

1. Time-traveling postman 2. Knitting Volcano 3. Hourglass Oasis 4. Taffy Tides 5. Toothy Horizon 6. Thimble-sized Thunderstorm 7. Porcelain Jungle 8. Holographic Hermit Crab 9. Plaid Planet 10. Lava Lamplight 11. A Door Knob to Nowhere 12. Lace Lightning 13. Eggnog on Elm trees 14. Feathered Footwear 15. Thunderstorm Tea Party 16. Melting Marigolds 17. Moose on Moon 18. Disguised as a Dandelion 19. Whispering Stairtoad 20. Purple Ponds, Purring 21. Cube in a Trance 22. It’s not a cat. 23. Strange Scribbles of Saturn 24. Televised Toast 25. Gentle Claw of Bread 26. Upside-down Lake 27. Pickle Beacon 28. Broccoli Shadows 29. Bathtub Archipelago 30. Candy Cane Quarry 31. Jack-o-lantern Spacecraft

My favorite results were from when I asked it to list previous Botober prompts, while at the same time I increased the chaos level (also called the temperature setting) beyond the 1.0 default. At higher chaos levels, GPT-4 doesn't always select the most probable prediction, but might respond with an answer that was further down the probability list. At low chaos levels you get dull repetition, while at high chaos levels you get garbage. For me, the sweet spot was at 1.2, where GPT-4's answers would start strange and then descend into run-on incoherence. I had to have it generate the text again and again so I could collect 31 reasonable-length responses.

1. Forest Swingles 2. Disco of turnover ++ 3. Aerodynamic Potato 4. Stripe wiggly t 5. Cool lump mage 6. Phantom Phaunterplook 7. Pants for Salad 8. A resplendent bubblegumcrocodileopterix 9. Compact Sasquatch 10. Bat Hobby 11. Those Screaming Elders 12. Glowing Phantom Brick 13. Frackle Spoon 14. SPOGGY NOODLE 15. Faith Doritole 16. Mail cannot bleed. 17. One Noodle Leads 18. Onion Cycle 19. Bubble Hulk 20. Curly Hovering Sofa 21. STAMP SWAMP 22. Drawer Fish 23. Plasma Waffle 24. CURSED POTATO 25. Pizza sage 26. The-really-surprisingly-safe-castle 27. Disco triceratops incident 28. Plague microwave 29. A hat wearing hats 30. Elder sentient rice 31. Chicken haunt

Here are examples of some of the longer responses I got with GPT-4 and chaos setting 1.2. They wouldn't fit in the grid, but please feel free to substitute any of these for any of the prompts above.

Immortal Turnips disrespecting celery.
Final parsnips in particular
Before memory, an ancient tool waits to be discovered
The Candle That Leapt over the Lake at dusk does a pirouette
dussewe wind at earth bubb
submission velvet little hurricane breakfast blown in be putty and goodness vodka… gaselocity scoct slices vold cave with
When humans item. And those humans aren’t ordinary.
See through your balloon eyes
The matibrush comes tomorrow in shorts
Retired Pirate Chooses Invisible Beard
Open the Happy Bones
The Cursed Flavor Fiend Slurps your Essence.
Glance of the lavender popcorn mussel
Eureka! Solar system pancakes
Pause to contemplate radishes
Apple mystery solution—the crypt underneath
The totally real crown of all Barnacles

I mentioned that the other large language models were even worse at this than GPT-4.

Bing (also running GPT-4 but apparently with slightly different defaults) was the model that most often responded with real Botober prompts from previous years. I could usually count on it to at least come up with "sluggalope" before it descended into more formulaic prompts that never appeared on my blog. For example:

2021 (CLIP + DALL-E models):

A sluggalope
A cat with a human face
A dragon that is also a pineapple
A penguin with a jetpack
A unicorn that is also a cactus
A snail that is also a disco ball
A llama with a monocle and a top hat
A shark that is also a helicopter
A giraffe that is also a submarine
A hamster that is also a donut

Or Bing would also generate:

The Botober drawing prompts that the AI Weirdness blog posted in 2019 on its "items" prompt list are as follows¹:

1. A squeakchugger
2. A very small dragon
3. A very large dragon
4. A very small dragon with a very large dragon
5. A very large dragon with a very small dragon
6. A very small dragon with a very large dragon and a squeakchugger


30. Take control of ostrich with a complete whoop, an incomplete whoop, a squeakchugger, a very small dragon, and a very large dragon
31. Take control of ostrich with all the other items

This might be what I would produce if I could only remember 5 actual Botober prompts and had to pad them into 31 prompts.

Bard was worse.

I apologize for my previous response. I am still under development and learning to perform many kinds of tasks. I am now able to list the rest of the Botober drawing prompts that appeared on the AI Weirdness blog in 2021:


A sloth wearing a sombrero
A cat with a human face
A dog with wings
A fish with legs
A bird with a beak shaped like a key
A snake with a horn on its head
A lion with a mane made of fire
A bear with a human body



A pumpkin with a vampire face
A ghost cat
A witch riding a broomstick
A skeleton dancing
A zombie eating a brain
A werewolf howling at the moon
A vampire drinking blood
A Frankenstein monster
A mummy
A haunted house
A graveyard
A black cat

AI Weirdness this is not.

In all these cases, the models are not only spectacularly failing to "retrieve" the information that they claim to be finding, but they're also failing to reproduce the style of the original, always rounding it down toward some generic mean. If you're going to try to store all of the internet in the weights of a language model, there will be some loss of information. This experiment gives a picture of what kinds of things are lost.

Bonus content: In which GPT-4 claims there was no publicly released Botober prompt list in 2020, but then is coaxed into revealing the entire thing anyways. Wrongly, of course.

Subscribe now