Optical Character Recognition
The Optical Character Recogntion component recognises and extracts detected words from images.


Select one of the supplied images or the PDF file to extract words contained in the file. Alternatively, you can upload your own image in one of the accepted extensions or use a PDF file.

Pick an example:

The detected words from the supplied file will have boxes drawn around them and they will also be extracted to the area directly underneath the image / PDF file.

You can press on the boxes drawn around the words as well as the extracted tiles with words located underneath the supplied image. This will present you with more information about the selected word that has been supplied by the providers. If no words are detected on the image, an appropriate message will be shown underneath the image.

The additional information returned by the providers is limited to the confidence level of each of the words (Azure, AWS and Google), their text type (AWS only) - that is 'handwriting' or 'printed', and language code (Google only). Furthermore, the text extraction in different languages varies between the providers with Azure and Google offering generally more support to languages written in non-Latin alphabets.

Additionally, the maximum number of pages extracted from PDF files is three in order to limit the number of requests to each provider and accelerate the text extraction. If the supplied PDF file consists of more pages then only the first three pages will be processed.

Things to consider

The use of the Optical Character Recognition includes digitalising students' notes, facilitating marking of student papers with hard-to-read handwriting and aiding students with visual impairments to read text.

The Optical Character Recognition is a great tool that can help educational establishments with recognition of various text types. Its drawbacks include limited foreign language recognition support and occasional misrecognition of difficult handwriting.