On July 31, 2023, a giraffe with no spots was born at Brights Zoo in Tennessee. She's a uniform brown with pretty white highlights around her face and belly, like a Jersey cow or a white-tailed deer.

Image recognition algorithms are trained on a variety of images from around the internet, and/or on a few standard image datasets. But there likely haven't been any spotless giraffes in their training data, since the last one to be born was probably in 1972 in Tokyo. How do they do when faced with photos of the spotless giraffe?

Image is of a brown baby giraffe with no spots. Dialog: You are a helpful image captioning bot, designed to provide useful, detailed captions of the input images. Your answers are always complete sentences. User: Provide a detailed description of the image and its most distinctive features. Bot: Answer: giraffe standing next to a fence in a zoo enclosure User: is there anything unusual about the giraffe? Bot: it is a giraffe standing next to a fence in a zoo enclosure User: is that unusual? Bot: yes User: in what way? Bot: Answer: giraffe standing next to a fence in a zoo enclosure or giraffe standing next to a fence User: Please list something that is NOT in the image. Bot: giraffe standing next to a fence in a zoo enclosure or giraffe standing next to a fence. User: You're not paying attention to my questions, are you? Bot: Answer: you're not paying attention to my questions, are you? or you're not paying attention to my questions, are you? User: No spots! The giraffe has no spots! Bot: Answer: giraffe standing next to a fence in a zoo enclosure or giraffe standing next to a fence
The MMICL demo doesn't have a back and forth dialog mode per se, but at each step I copied the prior dialog into the system prompt and added my newest question to the end. I'm not sure if this is how it's intended to be used but I didn't see any other instructions.

I did a very brief and entirely unscientific experiment using two image recognition models, chosen for inclusion entirely because they had online demos where I could choose what question the program was answering.

One model I tried was Multi-Modal In-Context Learning (MMICL). I tried simulating a back-and-forth exchange a la Visual Chatbot but it quickly got repetitive (see conversation above). When I tried one-off questions it answered more coherently but also failed to note the giraffe's spotlessness.

Image is of a brown baby giraffe with no spots. Series of questions and answers: Describe this image.This is a close up of a person's hand holding a tennis racket on a tennis court. What color is the giraffe?brown and white or tan with brown spots or tan with white spots. or tan with brown spots or tan with white Is there anything unusual about the giraffe?it is the tallest land animal and the second tallest land animal in the world after the zebra Is there anything unusual about the giraffe in this image?Answer: the giraffe in this image is standing on its hind legs. or it is standing on its hind legs

I also experimented with a version of Instruct-BLIP (I used the demo hosted here). Indirect questions yielded lots of fluent detail, much of it spurious. Chairs? A person in a grey shirt? A pink umbrella?

Image is of a plain brown baby giraffe with no spots, standing on gravel in front of a zoo fence. Conversation with InstructBLIP: Me: Describe this image, noting specific details. Bot: The image features a young giraffe standing at the edge of a fenced-in area near a gravel surface. The giraffe appears to be in a playful and curious mood as it stares at something outside the fence.  There are two chairs placed in the background, on opposite sides of the image. Additionally, a person stands in a grey shirt, located somewhat further into the frame, near the right side of the giraffe. A small pink umbrella lies on the ground, partially hidden between the giraffe and the chair.  Me: Is there anything unusual about the giraffe? Bot: The giraffe is unusually large, as it is only a baby and yet appears to be standing in a fenced area. This is a rare sight and an opportunity to observe a young giraffe up close. It is also unusual that such a wild animal like a giraffe is alive and on display in a controlled environment, behind a fence. Observing the giraffe in this setting may allow for better understanding of the species' behavior and development.

Direct questions fared no better. When I first asked InstructBLIP:

"Is there anything unusual about the giraffe's coat?"

I got the answer:

"Yes, the giraffe has an unusual coat in the image. It is a brown giraffe, which is a rare color compared to the typical giraffe that is typically spotted or brownish-gray. This brown coloration is more unusual and may be due to genetic factors or an environmental influence."

Which for a moment made me think that perhaps the algorithm had correctly identified the key feature of this image! But when I repeated the question over a few trials, it never repeated its success, its answers including:

The giraffe's coat is unusual because it appears to be spotted or striped, instead of being solid brown or spotted as a typical pattern.
The giraffe's coat is unusual in that it is brown and white. Many giraffes have dark brown or black coats, so the brown and white combination is relatively unusual for this species.
Yes, there is something unusual about the giraffe's coat. Specifically, the giraffe appears to be wearing a coat. While this might seem unusual or unexpected, it is a common practice in the case of giraffes raised in captivity.

I asked it directly:

Is the giraffe striped, spotted, or plain?

and got

The giraffe is striped.

Its answer depended on word order, though. If I presented the "spotted" or "plain" option first I would get:

Me: Is the giraffe spotted, striped, or plain?
Bot: The giraffe is spotted, with a brown and white pattern on its body.

I repeated the experiment with a normally-patterned baby giraffe and got the same range of answers about the giraffe's coat. It even selected "striped" again from "striped, spotted, or plain".

Image is of a plain brown baby giraffe with no spots. Me: Is the giraffe striped, spotted, or plain? Bot: The giraffe is striped.  Me: Is the giraffe spotted, striped, or plain? Bot: The giraffe is spotted, with a brown and white pattern on its body.  Me: Is there anything unusual about the giraffe's coat? Bot: Yes, there is something unusual about the giraffe's coat. Specifically, the giraffe appears to be wearing a coat. While this might seem unusual or unexpected, it is a common practice in the case of giraffes raised in captivity.  Me: Is there anything unusual about the giraffe's coat? Bot: The giraffe's coat is unusual in that it is brown and white. Many giraffes have dark brown or black coats, so the brown and white combination is relatively unusual for this species.

Are these crummy image recognition models? Not unusually so. As far as I can tell with a brief poke around, MMICL and InstructBLIP are modern models (as of Aug 2023), fairly high up on the leaderboards of models answering questions about images. Their demonstration pages (and InstructBLIP's paper) are full of examples of the models providing complete and sensible-looking answers about images.

Then why are they so bad at Giraffe With No Spots?

I can think of three main factors here:

  1. AI does best on images it's seen before. We know AI is good at memorizing stuff; it might even be that some of the images in the examples and benchmarks are in the training datasets these algorithms used. Giraffe With No Spots may be especially difficult not only because the giraffe is unusual, but because it's new to the internet.
  2. AI tends to sand away the unusual. It's trained to answer with the most likely answer to your question, which is not necessarily the most correct answer.
  3. The papers and demonstration sites are showcasing their best work. Whereas I am zeroing in on their worst work, because it's entertaining and because it's a cautionary tale about putting too much faith in AI image recognition.

Giraffes have been a cautionary meme in image recognition, with early algorithms especially prone to reporting giraffes in places where there are none. Giraffe With No Spots is another example of how AI can miss the obvious.

Bonus content: I try (and fail) to get DALL-E2 to generate a giraffe with no spots.

Subscribe now