Classify Document

Classifies a document into predefined collections.

The Classify Document API returns the collections and satisfied conditions for a document.

You can use this API to classify documents into different collections and understand which conditions caused the classification.

Quick Start

When you submit a document to the API, it stores the content and metadata, and processes the contents of any special fields. The type of processing depends on the index field type. For more information, see Index Field Types.

You must provide the name or ID of the collection sequence that you want to use to classify the document. For example:

/1/api/[async|sync]/classifydocument/v1?collection_sequence=corporate_sequence

You can modify this collection sequence in the collectionsequence API.

You can use the json parameter to add documents to your index. For example:

/1/api/[async|sync]/addtotextindex/v1?index=myindex&json=

The required format for the JSON is:

{
	"document" :
	[
		{
			"title" : "This is my document",
			"reference" : "mydoc1",
			"myfield" : ["a value"],
			"content" : "A large block of text, which makes up the main body of the document."
		}, {
			"title" : "My Other document",
			"reference" : "mydoc2",
			"content" : "This document is about something else"
		}
	]
}
  • You can include additional fields, such as myfield in the example.

Rather than submitting documents in JSON format, you can alternatively submit a file, reference, or URL to the API. For example:

/1/api/[async|sync]/classifydocument/v1?collection_sequence=corporate_sequence&file= or url= or reference=

In this case, the API uses the Text Extraction API to extract the text content from the file, and creates the JSON documents automatically. The document reference field is the file name or URL that you submit.

The Classify Document API supports multiple file input:

curl -X POST http://api.havenondemand.com/1/api/[async|sync]/classifydocument/v1 --form "file=@myfile1.txt" --form "file=@myfile2.doc"

Note: API input is subject to a maximum size quota. If you upload text or a file that is too large, the API returns an error. For more information, see Rate Limiting, Quotas, Data Expiry, and Maximums.

curl -X POST http://api.havenondemand.com/1/api/[async|sync]/classifydocument/v1 --form "collection_sequence=corporate_sequence" --form "file=@myfile.txt"

{
	"index" : "myindex",
	"references" : [{
			"reference" : "myfile.txt",
			"id" : 108
		}
	]
}
Synchronous
https://api.havenondemand.com/1/api/sync/classifydocument/v1
Asynchronous
https://api.havenondemand.com/1/api/async/classifydocument/v1
Authentication

This API requires an authentication token to be supplied in the following parameter:

Parameter Description
apikey The API key to use to authenticate the API request.
Parameters

This API accepts the following parameters:

Required
Name Type Description
file
array<binary> A file to examine. The API passes the file to the Text Extraction API to extract the contents for examination. Multipart POST only.
json
json The JSON document to examine.
reference
string A Haven OnDemand reference obtained from either the Expand Container or Store Object API. The corresponding document is passed to the API.
url
string A publicly accessible HTTP URL from which the document can be retrieved.
collection_sequence
string A name or ID of the collection sequence that the document should be evaluated against.
Optional
Name Type Description
additional_metadata
array<json> A JSON object containing additional metadata to add to the extracted documents. This option does not apply to JSON input. To add metadata for multiple files, specify objects in order, separated by an empty object.
reference_prefix
array<string> A string to add to the start of the reference of documents that are extracted from a file. This option does not apply to JSON input. To add a prefix for multiple files, specify prefixes in order, separated by a space.

This API returns a JSON response that is described by the model below. This single model is presented both as an easy to read abstract definition and as the formal JSON schema.

Asynchronous Use

Additional requests are required to get the result if this API is invoked asynchronously.

You can use /1/job/status/<job-id> to get the status of the job, including results if the job is finished.

You can also use /1/job/result/<job-id>, which waits until the job has finished and then returns the result.

Model
This is an abstract definition of the response that describes each of the properties that might be returned.
Classify Document Response {
result ( array[Result] , optional) Indicates the collections that were matched along with the conditions that caused that match, and any conditions that were unable to be evaluated.
}
Classify Document Response:Result {
collection_id_assigned_by_default ( , optional)
incomplete_collections ( , optional)
matched_collections ( , optional) A list of collections that the supplied document matched.
unevaluated_conditions ( , optional) Indicates any conditions that were unable to be evaluated and the reason.
}
Model Schema
This is a JSON schema that describes the syntax of the response. See json-schema.org for a complete reference.
{
    "properties": {
        "result": {
            "items": {
                "properties": {
                    "collection_id_assigned_by_default": {
                        "multipleOf": 1,
                        "type": [
                            "number",
                            "null"
                        ]
                    },
                    "incomplete_collections": {
                        "items": {
                            "multipleOf": 1,
                            "type": "number"
                        },
                        "type": [
                            "array",
                            "null"
                        ]
                    },
                    "matched_collections": {
                        "items": {
                            "properties": {
                                "id": {
                                    "multipleOf": 1,
                                    "type": "number"
                                },
                                "matched_conditions": {
                                    "items": {
                                        "properties": {
                                            "field_name": {
                                                "type": "string"
                                            },
                                            "matched_lexicon_expressions": {
                                                "items": {
                                                    "properties": {
                                                        "lexicon_expression_id": {
                                                            "multipleOf": 1,
                                                            "type": "number"
                                                        },
                                                        "terms": {
                                                            "items": {
                                                                "type": "string"
                                                            },
                                                            "type": "array"
                                                        }
                                                    },
                                                    "required": [
                                                        "id"
                                                    ],
                                                    "type": "object"
                                                },
                                                "type": "array"
                                            },
                                            "reference": {
                                                "type": "string"
                                            },
                                            "terms": {
                                                "items": {
                                                    "type": "string"
                                                },
                                                "type": "array"
                                            }
                                        },
                                        "type": "object"
                                    },
                                    "type": "array"
                                },
                                "name": {
                                    "type": "string"
                                }
                            },
                            "required": [
                                "id",
                                "name"
                            ],
                            "type": "object"
                        },
                        "type": [
                            "array",
                            "null"
                        ]
                    },
                    "unevaluated_conditions": {
                        "items": {
                            "items": {
                                "properties": {
                                    "id": {
                                        "multipleOf": 1,
                                        "type": "number"
                                    },
                                    "name": {
                                        "type": "string"
                                    },
                                    "reason": {
                                        "enum": [
                                            "missing_field",
                                            "missing_service"
                                        ],
                                        "type": "string"
                                    },
                                    "type": {
                                        "type": "string"
                                    }
                                },
                                "required": [
                                    "id",
                                    "name"
                                ],
                                "type": "object"
                            },
                            "type": "object"
                        },
                        "type": [
                            "array",
                            "null"
                        ]
                    }
                },
                "type": "object"
            },
            "type": "array"
        }
    },
    "type": "object"
}


If you would like to provide us with more information then please use the box below:

We will use your submission to help improve our product.