Document Categorization

Searches for categories that match a specified document.

The Document Categorization API allows you to categorize documents according to a set of categories that you create.

To use the API, you must create a text index with the Categorization flavor, by using the Create Text Index. This type of index stores documents that describe categories. For more information about the Categorization flavor, see Categorization Flavor Index Configuration

Category descriptions act like a query that matches any document that belongs to the category. For example, a category document for the dogs category might contain a list of dog breeds. If you use this list as query text in the Query Text Index API, it returns documents about dogs.

You can also optionally define Boolean or field text restrictions in a category, by using the BOOLEANRESTRICTION and FIELDTEXTRESTRICTION fields. For example, the following field text restriction ensures that all documents that get assigned to the category have an ENRICHED_PERSON field:

EXISTS{}:ENRICHED_PERSON

For more information about Boolean and field text expressions, see Boolean and Proximity Operators and Field Text Operators.

The Document Categorization API allows you to find the categories that a document matches. You can think of this process as the inverse of a normal query. You provide a document to the API, and it returns a list of the categories in your category text index that match it. The API uses the text in the CONTENT field of the document as query text to match categories.

You must specify the name of the category index to use (a Haven OnDemand text index with the Categorization flavor). You can optionally specify additional field_text to restrict the categories from the categorization index that can return. For example, if you know the documents you want to categorize are mostly about animals, you can add a field_text restriction that matches only values of the CATEGORY field that are about animals.

Synchronous
https://api.havenondemand.com/1/api/sync/categorizedocument/v1
Asynchronous
https://api.havenondemand.com/1/api/async/categorizedocument/v1
Authentication

This API requires an authentication token to be supplied in the following parameter:

Parameter Description
apikey The API key to use to authenticate the API request.
Parameters

This API accepts the following parameters:

Required
Name Type Description
file
binary A file containing the document to categorize. Multipart POST only.
json
json The JSON document to categorize.
reference
string A Haven OnDemand reference obtained from either the Expand Container or Store Object API. The corresponding document is passed to the API.
text
string The text content to categorize.
url
string A publicly accessible HTTP URL from which the document to categorize can be retrieved.
index
resource The name of the Haven OnDemand text index that you want to search for matching categories. This text index must be of the Categorization flavor. See Categorization Flavor Index Configuration.
Optional
Name Type Description
field_text
string A field restriction against the categorization index. Typically, this is a match against a parametric type field in the categories. See Field Text Operators.
max_results
number The maximum number of categories to return for this document from the total number of results matched. Default value: 6.
print
enum The types of fields and content to display in the results. Default value: fields.
print_fields
string The names of fields to print in the results.
Enumeration Types

This API's parameters use the enumerations described below:

print
The types of fields and content to display in the results.
all All fields.
fields Print the fields listed in the print_fields parameter.
none Do not print content fields.

This API returns a JSON response that is described by the model below. This single model is presented both as an easy to read abstract definition and as the formal JSON schema.

Asynchronous Use

Additional requests are required to get the result if this API is invoked asynchronously.

You can use /1/job/status/<job-id> to get the status of the job, including results if the job is finished.

You can also use /1/job/result/<job-id>, which waits until the job has finished and then returns the result.

Model
This is an abstract definition of the response that describes each of the properties that might be returned.
Document Categorization Response {
documents ( array[Documents] ) The details of the returned categories.
}
Document Categorization Response:Documents {
index ( string , optional) The database that the category returned from.
links ( array[string] , optional) The terms from the document that match in the results category.
reference ( string , optional) The reference string that identifies the result category.
summary ( string , optional) The summary of the results category.
title ( string , optional) The title of the result category.
weight ( number , optional) The percentage relevance that the result category has to the original document.
content ( string , optional) Content of the category document.
fieldtextrestriction ( string , optional)
booleanrestriction ( string , optional)
}
Model Schema
This is a JSON schema that describes the syntax of the response. See json-schema.org for a complete reference.
{
    "properties": {
        "documents": {
            "items": {
                "properties": {
                    "index": {
                        "type": "string"
                    },
                    "links": {
                        "items": {
                            "type": "string"
                        },
                        "type": "array"
                    },
                    "reference": {
                        "type": "string"
                    },
                    "summary": {
                        "type": "string"
                    },
                    "title": {
                        "type": "string"
                    },
                    "weight": {
                        "type": "number"
                    },
                    "content": {
                        "type": "string"
                    },
                    "fieldtextrestriction": {
                        "type": "string"
                    },
                    "booleanrestriction": {
                        "type": "string"
                    }
                },
                "type": "object"
            },
            "type": "array"
        }
    },
    "required": [
        "documents"
    ],
    "type": "object"
}


If you would like to provide us with more information then please use the box below:

We will use your submission to help improve our product.