Still amazed by this: Here's CLIP+VQGAN (trained on internet photos and their accompanying text), prompted two different ways: "A car driving down a desert road in monument valley" A car driving down a desert road in monument valley"A car driving down a desert road in monument valley | dramatic