I've generated a lot of terrible unicorn cakes by this point.

Last time I experimented with generating cakes, using CLIP's internet training to guide a couple of image generating methods. Around the time I posted my experiments, RiversHaveWings, who developed one of the most popular methods, came out with a new one called CLIP-guided diffusion. She found it dramatically improved results on some prompts, like "industrial demon, a matte painting, trending on ArtStation".

A shadowy dark shape like an oil refinery rises from the darkness, glowing red in places. The girders and catwalks are visible.
industrial demon, a matte painting, trending on ArtStation

So, I decided to see if it could improve on my unicorn cakes from last time.

Using the default settings, here's "unicorn cake with golden horn and rainbow sprinkles"

An irregular lump of cake studded with holographic sprinkles and looking like someone has squeezed it in their fist
unicorn cake with golden horn and rainbow sprinkles

I like an iridescent cake as much as the next person, but this has missed a few style points. And a CLIP-based method ought to be able to use its internet training to do better. A google image search for "unicorn cake" returns mostly cakes of a particular design and color scheme. It should know what "unicorn cake" means.

Google search results for a unicorn cake, showing mostly white cakes with a horn on top and two cartoon eyes in front, plus rainbow mane and pointed ears.

As in my previous experiments, I tried giving CLIP+diffusion a basic sketch of a plain cake to start with. I could tweak how much it was allowed to change the sketch.

This, it turns out, is too much departure.

Close-up of a field of wrinkly pink frosting studded with tiny gold balls. In the background is yellow-green hair.
Unicorn cake with golden horn and rainbow sprinkles, skip_timesteps=0

And this is too little.

A black cake on a rainbow-sprinkled cake. Embossed in one side of the cake is a shape like a cartoon toaster head with two wild pigtails.
Unicorn cake with golden horn and rainbow sprinkles, skip_timesteps=400

This one at least has a unicorn in it, but it's not quite got the spirit of a unicorn cake, nor any idea where the horn goes.

A black cake with galaxy textures and a hint of a unicorn’s face. The golden horn is lying beside the cake, looking a bit like a row of sweet corn.
Unicorn cake with golden horn and rainbow sprinkles, skip_timesteps=300

Starting from a light-colored cake didn't work any better.

A hideous white cake with a hint of a unicorn face with melting frosting and melting purple eyes, and a foot that looks a lot like a cake. Long silver needles pierce the top of the cake.
Unicorn cake with golden horn and rainbow sprinkles, skip_timesteps=300

But again, CLIP should KNOW what a unicorn cake looks like. I shouldn't even have to start it out with a sketch of a cake. Maybe my prompt was the problem. I decided to try a prompt style RiversHaveWings was having good success with.

"Unicorn cake, matte painting, trending on ArtStation"

A landscape of lumpy pink frosting topped by a white earlike shape, maybe made of frosting. A faint watermark says something like “ororoi”
Unicorn cake, matte painting, trending on ArtStation

I also tried the "food photography" and "by janelle's bakery", modifiers that seemed to work before.

Shiny yellow beans in front of a plasticky rainbow background in indistinct shapes. It could be an extreme closeup of a very ugly cake.
food photography unicorn cake with golden horn and rainbow sprinkles by janelle's bakery

These are hideous unicorn cakes, and I'm sorry. It's as if I have stumbled on a weirdly adversarial prompt for CLIP+diffusion.

I've noted before that asking for an image in just the right way can change the output from something terrible to something very aesthetically pleasing. @kingdomakrillic on imgur has put together an amazing grid of ways to modify a prompt for effect. Here are just the first three lines out of dozens.

Grid of images of mushroom, spaceship, volcano, and victorian house on a hill, in 8k resolution, pencil sketch, and 8k 3d styles.
excerpted from @kingdomakrillic on imgur: https://imgur.com/a/SnSIQRu

I looked for prompts that were producing relatively coherent mushroom results.

"Unicorn cake, cryengine"

A dirty ceramic bowl with two unicorn feet, filled with cake topped with chopped cherries.
Unicorn cake, cryengine

"Unicorn cake, photorealistic"

A cross between a close-up of cake and an infinite beach. A crack in the landscape reveals lavendar cake and a single eye.

"Unicorn cake, ArtStation HD"

Shiny purple frosting lumps cris-crossed with white veins. A teal-edged golden horn points down toward it.

Then I noticed a parameter that's supposed to control how similar the output looks to the prompt text. All this time had I been telling it "like a unicorn cake but not TOO much like a unicorn cake"? I want exactly a unicorn cake.

Not knowing which direction to tweak the parameter, I reduced "CLIP_guidance_scale" from 1000 to 0.

At first glance, a dirty bar covered with worn-out beer mats. The image has a yellowish cast to it and nothing is legible.

Okay wow so that must be in the "not unicorn cake" direction. I increased "CLIP_guidance_scale" to 5000.

Perched on the edge of a steep cake cliff is a twist of color and texture like a peacock shrimp crossed with a jellyfish.


In the end, nothing I tried resulted in unicorn cakes that looked anything like the instagram photos, or even as much like unicorn cakes as the CLIP+VQGAN method I tried before. Maybe the right parameter/prompt combination is out there and I just haven't found it yet. But I'm beginning to suspect that CLIP+diffusion is just really good at a certain kind of detailed, vibey, industrial prompt. So with that in mind:

"Industrial unicorn cake, matte painting, trending on artstation"

A white frosting shape rises from a city skyline. A horn emerges from the shape, shedding ribbons of pinkish rust.
Industrial unicorn cake, matte painting, trending on artstation

Want to try this yourself for free? Here's instructions on using both CLIP+VQGAN and CLIP+diffusion methods (no coding required).

Bonus content: a few more CLIP+diffusion images, some pretty cool looking

