OCR Document

Extracts text from an image.

The OCR Document API extracts text from an image file or a file containing images.

Quick Start

For a list of supported languages for the OCR Document API, see Supported OCR Languages.

You send the document that you want to run OCR on in the file parameter. For example:

curl -X POST http://api.havenondemand.com/1/api/[async|sync]/ocrdocument/v1 --form "file=@mycv.jpg"

Note: API input is subject to a maximum size quota. If you upload text or a file that is too large, the API returns an error. For more information, see Rate Limiting, Quotas, Data Expiry, and Maximums.

The API returns the extracted text, along with information about the location of the detected text in the original image. The API does not provide a precise layout of the text on the page. The text is primarily useful for adding to a text index for search and retrieval of the original document.

{
  "text_block": [
      "text": "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.",
      "left": 2,
      "top": 2,
      "width": 800,
      "height": 1117
  ]
}

Note: The width and height value in the response might be 0 if the API cannot get this information from the document.

The API returns the best results for images with high contrast between the text and the background.

By default, the API treats the input as a clean photo of a document. You can improve the accuracy of the results by specifying the the OCR mode that matches the type of image. For example:

curl -X POST http://api.havenondemand.com/1/api/[async|sync]/ocrdocument/v1 --form "file=@mycv.jpg" --form "mode=document_scan"

The options for the mode parameter are:

  • document_photo: (default) Use to recognize text in a document that has been digitized with variable light, such as through a mobile phone camera.

  • document_scan: Use to recognize text in a document that has been digitized with constant lighting, such as through a flatbed scanner.

  • scene_photo: Use to recognize text in a scene, for example signs and billboards in a landscape.

  • subtitle: Use to recognize text superimposed on an image, such as TV subtitles.

For a list of file formats that you can use for images, see Supported Media Formats.

Optimize OCR Results

In general, OCR results are better for high quality images, and where the text is at high contrast (sharp, dark font on a white background).

When you take a picture of text or a document with a handheld camera, the OCR results are better in diffuse lighting. Natural light is diffuse, so photos taken in natural light are generally better for OCR. When this is not possible, try to ensure that the camera is not between the light source and the text, because this positioning can cause glare or cast shadows on the text. For example, if you want to photograph a business card under an overhead light, hold the camera and card perpendicular to the floor, so that the light is above both, rather than laying the card on a table.

Additionally, if you need to use a flash, ensure that the camera is far enough away that the text does not get washed out or saturated.

The image resolution can have an impact on the OCR results. Higher resolution images have more detail, and OCR might interpret background distortions as possible text. In this case, a high resolution picture with a lot of tiny details might give poorer results. However, when the image resolution is too low, the font becomes less sharp and the image becomes pixelated. The quality of your camera affects the ideal size and you might need to test to find the best results settings.

In all cases, use the appropriate mode for your data. For example, if you take a picture of a document or business card, use document_photo, and if you want to identify the text in a picture of road signs, use scene_photo.

The document_scan mode is best for document scans obtained with a good quality scanner, and for computer-generated images. If the page is not flat when scanned, you might get better results by using the document_photo mode.

Synchronous
https://api.havenondemand.com/1/api/sync/ocrdocument/v1
Asynchronous
https://api.havenondemand.com/1/api/async/ocrdocument/v1
Authentication

This API requires an authentication token to be supplied in the following parameter:

Parameter Description
apikey The API key to use to authenticate the API request.
Parameters

This API accepts the following parameters:

Required
Name Type Description
file
binary The image file to process.
reference
string A Haven OnDemand reference obtained from either the Expand Container or Store Object API. The corresponding document is passed to the API.
url
string A publicly accessible HTTP URL from which the image can be retrieved.
Optional
Name Type Description
mode
enum The type of image to process. Default value: document_photo.
languages
array<enum> The language used in the image you want to recognize text in. Default value: [en].
Enumeration Types

This API's parameters use the enumerations described below:

mode
The type of image to process.
document_photo Photo of a document
Use to recognize text in a document that has been digitized with variable light, such as through a mobile phone camera.
document_scan Scanned image of a document
Use to recognize text in a document that has been digitized with constant lighting, such as through a flatbed scanner.
scene_photo Photo of a scene containing text
Use to recognize text in a scene, for example signs and billboards in a landscape.
subtitle Text superimposed on an image
Use to recognize text superimposed on an image, such as TV subtitles.
languages
The language used in the image you want to recognize text in.
af Afrikaans
Afrikaans
ar Arabic
Arabic
eu Basque
Basque
bg Bulgarian
Bulgarian
ca Catalan
Catalan
hr Croatian
Croatian
cs Czech
Czech
da Danish
Danish
nl Dutch
Dutch
en English
English
eo Esperanto
Esperanto
et Estonian
Estonian
fi Finnish
Finnish
fr French
French
de German
German
el Greek
Greek
he Hebrew
Hebrew
hu Hungarian
Hungarian
is Icelandic
Icelandic
ga Irish
Irish
it Italian
Italian
ja Japanese
Japanese
ko Korean
Korean
la Latin
Latin
lv Latvian
Latvian
lt Lithuanian
Lithuanian
mk Macedonian
Macedonian
mt Maltese
Maltese
no Norwegian
Norwegian
fa Persian
Persian
pl Polish
Polish
pt Portuguese
Portuguese
ro Romanian
Romanian
ru Russian
Russian
sr Serbian
Serbian
zhs Chinese Simplified
Chinese Simplified
sk Slovak
Slovak
sl Slovenian
Slovenian
es Spanish
Spanish
sv Swedish
Swedish
zht Chinese Traditional
Chinese Traditional
tr Turkish
Turkish
uk Ukrainian
Ukrainian
ur Urdu
Urdu
cy Welsh
Welsh

This API returns a JSON response that is described by the model below. This single model is presented both as an easy to read abstract definition and as the formal JSON schema.

Asynchronous Use

Additional requests are required to get the result if this API is invoked asynchronously.

You can use /1/job/status/<job-id> to get the status of the job, including results if the job is finished.

You can also use /1/job/result/<job-id>, which waits until the job has finished and then returns the result.

Model
This is an abstract definition of the response that describes each of the properties that might be returned.
OCR Document Response {
text_block ( array[Text_block] ) Details of a section of text found in the image. If no text is detected, it returns an empty array [].
page_count ( integer , optional) The total number of pages in the document.
}
OCR Document Response:Text_block {
height ( integer ) The height of the bounding box for the text. This value defaults to 0 if the API cannot get the location information from the document.
left ( integer ) The position of the left edge of the bounding box for the text.
text ( string ) The text extracted in this section of the image.
top ( integer ) The position of the top edge of the bounding box for the text.
width ( integer ) The width of the bounding box for the text. This value defaults to 0 if the API cannot get the location information from the document.
page_num ( integer , optional) The page in the document that the text belongs to.
}
Model Schema
This is a JSON schema that describes the syntax of the response. See json-schema.org for a complete reference.
{
    "properties": {
        "text_block": {
            "items": {
                "properties": {
                    "height": {
                        "type": "integer"
                    },
                    "left": {
                        "type": "integer"
                    },
                    "text": {
                        "type": "string"
                    },
                    "top": {
                        "type": "integer"
                    },
                    "width": {
                        "type": "integer"
                    },
                    "page_num": {
                        "type": "integer"
                    }
                },
                "required": [
                    "text",
                    "left",
                    "top",
                    "width",
                    "height"
                ],
                "type": "object"
            },
            "type": "array"
        },
        "page_count": {
            "type": "integer"
        }
    },
    "required": [
        "text_block"
    ],
    "type": "object"
}
https://api.havenondemand.com/1/api/sync/ocrdocument/v1
/api/api-example/1/api/sync/ocrdocument/v1
Examples
See this API for yourself - select one of our examples below.
OCR PDF
OCR JPG
OCR JPG
Business card
Pull the details from a picture of a business card
Parameters
Required
Select file Change Remove
Optional
Name Type Value
mode
enum
languages
array
(Default: [en])


Async – Response An error occurred making the API request
Response Code:
Response Body

	
Making API Request…
Checking result of job

To try this API with your own data and use it in your own applications, you need an API Key. You can create an API Key from your account page - API Keys.

Output Refresh An error occurred making the API request View Input
Rendered RawHtml Response
Result Display
Response Code:
Response Body:

			
Make this call with curl


If you would like to provide us with more information then please use the box below:

We will use your submission to help improve our product.