Recently I've been experimenting with DALL-E 2, one of the models that uses CLIP
to generate images from my text descriptions. It was trained on internet text
and images, so there's a lot it can do, and a lot of ways it can remix the stuff
One thing I've noticed with image-generating algorithms is that the more of
something they have to put in an image, the worse it is.
I first noticed this with the kitten-generating variant of StyleGAN, which often
does okay on one cat:
alternative for shocked_pikachu.pngbut is terrible
DALL-E (and other text-to-image generators) will often add text to their images
even when you don't ask for any. Ask for a picture of a Halifax Pier
[https://www.aiweirdness.com/the-terror-of-the-sea/] and it could end up covered
in messy writing, variously legible versions of "Halifax"
Google's large language model, LaMDA, has recently been making headlines after a
Google engineer (now on administrative leave), claimed to be swayed by an
interview in which GPT-3 described the experience of being conscious. Almost
everyone else who has used these large text-generating AIs, myself included, is
entirely
I recently started playing with DALL-E 2, which will attempt to generate an
image to go with whatever text prompt you give it. Like its predecessor DALL-E,
it uses CLIP, which OpenAI trained on a huge collection of internet images and
nearby text. I've experimented with a few