In search of a unicorn cake

I've generated a lot of terrible unicorn cakes by this point.

Last time I experimented with generating cakes, using CLIP's internet training to guide a couple of image generating methods. Around the time I posted my experiments, RiversHaveWings, who developed one of the most popular methods, came out with a new one called CLIP-guided diffusion. She found it dramatically improved results on some prompts, like "industrial demon, a matte painting, trending on ArtStation".

industrial demon, a matte painting, trending on ArtStation

So, I decided to see if it could improve on my unicorn cakes from last time.

Using the default settings, here's "unicorn cake with golden horn and rainbow sprinkles"

unicorn cake with golden horn and rainbow sprinkles

I like an iridescent cake as much as the next person, but this has missed a few style points. And a CLIP-based method ought to be able to use its internet training to do better. A google image search for "unicorn cake" returns mostly cakes of a particular design and color scheme. It should know what "unicorn cake" means.

As in my previous experiments, I tried giving CLIP+diffusion a basic sketch of a plain cake to start with. I could tweak how much it was allowed to change the sketch.

This, it turns out, is too much departure.

Unicorn cake with golden horn and rainbow sprinkles, skip_timesteps=0

And this is too little.

Unicorn cake with golden horn and rainbow sprinkles, skip_timesteps=400

This one at least has a unicorn in it, but it's not quite got the spirit of a unicorn cake, nor any idea where the horn goes.

Unicorn cake with golden horn and rainbow sprinkles, skip_timesteps=300

Starting from a light-colored cake didn't work any better.

Unicorn cake with golden horn and rainbow sprinkles, skip_timesteps=300

But again, CLIP should KNOW what a unicorn cake looks like. I shouldn't even have to start it out with a sketch of a cake. Maybe my prompt was the problem. I decided to try a prompt style RiversHaveWings was having good success with.

"Unicorn cake, matte painting, trending on ArtStation"

Unicorn cake, matte painting, trending on ArtStation

I also tried the "food photography" and "by janelle's bakery", modifiers that seemed to work before.

food photography unicorn cake with golden horn and rainbow sprinkles by janelle's bakery

These are hideous unicorn cakes, and I'm sorry. It's as if I have stumbled on a weirdly adversarial prompt for CLIP+diffusion.

I've noted before that asking for an image in just the right way can change the output from something terrible to something very aesthetically pleasing. @kingdomakrillic on imgur has put together an amazing grid of ways to modify a prompt for effect. Here are just the first three lines out of dozens.

excerpted from @kingdomakrillic on imgur: https://imgur.com/a/SnSIQRu

I looked for prompts that were producing relatively coherent mushroom results.

"Unicorn cake, cryengine"

Unicorn cake, cryengine

"Unicorn cake, photorealistic"

"Unicorn cake, ArtStation HD"

Then I noticed a parameter that's supposed to control how similar the output looks to the prompt text. All this time had I been telling it "like a unicorn cake but not TOO much like a unicorn cake"? I want exactly a unicorn cake.

Not knowing which direction to tweak the parameter, I reduced "CLIP_guidance_scale" from 1000 to 0.

Okay wow so that must be in the "not unicorn cake" direction. I increased "CLIP_guidance_scale" to 5000.

...huh.

In the end, nothing I tried resulted in unicorn cakes that looked anything like the instagram photos, or even as much like unicorn cakes as the CLIP+VQGAN method I tried before. Maybe the right parameter/prompt combination is out there and I just haven't found it yet. But I'm beginning to suspect that CLIP+diffusion is just really good at a certain kind of detailed, vibey, industrial prompt. So with that in mind:

"Industrial unicorn cake, matte painting, trending on artstation"

Industrial unicorn cake, matte painting, trending on artstation

Want to try this yourself for free? Here's instructions on using both CLIP+VQGAN and CLIP+diffusion methods (no coding required).

Bonus content: a few more CLIP+diffusion images, some pretty cool looking