Image Description

Generate descriptions of images

The following demonstration uses Microsoft's Azure Computer Vision service to generate descriptions of images.

  1. Select a sample image below, upload one of your own or take a picture using your webcam.
  2. Once the image has been analysed the results will be shown below. A description of the image will be displayed in a text box at the bottom of the image.
More information about this demo

Generating a human-readable textual description of an image is an easy problem for a human but a very challenging one for a machine, as it involves both understanding the content of an image and how to translate that understanding into natural language. A description should capture not only the objects contained in an image, but it also must express how the objects relate to each other.

An image description model will combine an image classification model, which extracts and classifies the salient features of an image, with a language model that translates those image features into human language and arranges them into a meaningful sentence.

All images analysed using this demo are stored for 24 hours and then automatically deleted using Microsoft's Azure data lifecycle management.

Things to consider

As with many technologies of this kind, the results depend entirely on the models that have been trained. If the image classification model has not been trained to recognise certain objects, they will not be described in the caption. Or if the language model only contains common words, the generated captions may not be very precise.

Applications of this technology include helping visually impaired people better understand the content of images on the web.