Take a look at the new Microsoft image captioning AI: Read more here

This time Microsoft developed something interesting again. It has built an image captioning algorithm that even surpasses human accuracy in some cases. This AI system was utilized to upgrade the app by the company for the visually disabled, Seeing AI.

Image captioning algorithm in Microsoft products

Very soon it will be inculcated in other Microsoft services like PowerPoint, Outlook, and Word. This can be used to generate the alt text for the image, which is very crucial for improving the accessibility.

A software engineer at Microsoft, Saqib Shaikh has stated:

“Ideally, everyone would include alt text for all images in documents, on the web, in social media — as this enables people who are blind to access the content and participate in the conversation. But, alas, people do not. So, there are several apps that use image captioning as way to fill in alt text when it is missing.”

Microsoft’s very own software the Seeing AI was first created in 2017. It uses computer vision as the base. And it describes the world as seen through a smartphone camera to the visually challenged people. It has proven useful for many things. For example, it can identify household items, read and scan text and even identify friends. Some other uses include describing the images in various applications like WhatsApp or even email clients.


According to Eric Boyd, corporate VP of Azure AI stated that it is one of the leading apps for those who are blind or who have poor vision. But officially, Microsoft did not disclose the number of users using the Seeing AI application.

The current algorithm is twice more efficient than its ancestor

Apparently, the image captioning algorithm will enhance the accuracy of the Seeing AI software. The former aids in identifying the picture more precisely. The company believes that the present algorithm is twice more accurate than its predecessor invented in 2015.

The remarkable algorithm was published in a pre-print paper back in September. It achieved the highest score ever on the image captioning bench mark known as “nocaps”. For an image captioning software, this is a huge achievement indeed, irrespective of the limitations it has.

The nocaps benchmark comprises of more than 166,000 human generated captions. These captions describe over 15,100 images. They got the pictures from the Open Images Dataset. Various kinds of images are available here.

A very small set of pictures are there in the nocaps

One of the creators of the benchmark, Harsh Agarwal said:

“Surpassing human performance on nocaps is not an indicator that image captioning is a solved problem”. He further added that the parameters for evaluating the nocaps “only roughly correlate with human preferences”. He told that the benchmark only covers a small percentage of the possible visual concepts.

Nevertheless, image captioning is an ever evolving technology. Many advancements are coming up through the years. And we hope that we keep progressing in a similar way.

Leave a Reply