Add to Text Index

Indexes a document.

The Add to Text Index API allows you to add content to a text index that you have set up. The API indexes your content and makes it available for use in other APIs, such as Query Text Index, Find Similar, and Find Related Concepts.

Note: Before you can add content, you must use the Create Text Index API to create an index. You can use the List Resources API to return a list of your available indexes.

Tip: You can also manage text indexes, and add individual documents by using the Text Indexes Account page.

Quick Start

For general information about Haven OnDemand unstructured text indexing, see Text Indexes - Key Concepts. You might also like to look at our beginner and advanced how tos: Introduction to Haven OnDemand Unstructured Text Indexing and Advanced Haven OnDemand Unstructured Text Indexing.

Note: Haven OnDemand recommends that you use the asynchronous version of the API for most indexing purposes, particularly for large documents or requests including more than a small number of test documents. See Get the Results of an Asynchronous Request. Synchronous requests block until index processing is complete, which might result in time outs for large requests.

When you submit a document to the API, it stores the content and metadata, and processes the contents of any special fields. The type of processing depends on the index field type. For more information, see Index Field Types.

You must provide the name of the index that you want to add the content to. For example:

/1/api/[async|sync]/addtotextindex/v1?index=myindex

You can then use this index in the Query Text Index API, and other index-based APIs to search, retrieve, and analyze your documents.

You can use the json parameter to add documents to your index. For example:

/1/api/[async|sync]/addtotextindex/v1?index=myindex&json=

The required format for the JSON is:

{
	"document" :
	[
		{
			"title" : "This is my document",
			"reference" : "mydoc1",
			"myfield" : ["a value"],
			"content" : "A large block of text, which makes up the main body of the document."
		}, {
			"title" : "My Other document",
			"reference" : "mydoc2",
			"content" : "This document is about something else"
		}
	]
}

In this case, it adds two documents to the already created index myindex.

  • The API creates one index document for each document definition in the JSON.

  • The reference field, if present, is used as the document reference in the index. If you do not include a reference field, Haven OnDemand automatically creates one during indexing.

  • The content field contains the main document content. This field is processed as an Index type field.

  • The title field contains the document title. This field is processed as an Index type field.

  • You can include additional fields, such as myfield in the example. The field type for these fields depends on the flavor of your index. For more information about flavors, see Create Text Index. For information about the special fields configured for the standard index flavor, see Standard Flavor Index Configuration.

For most fields, you can specify multiple values by adding an array. However, if you use the following field values in your documents, they must contain a string or number, and cannot contain an array (even if the array contains only one value):

  • reference
  • title
  • content
  • doc_iod_reference
  • parent_iod_reference

Note: Field names in Haven OnDemand are not case sensitive. For example, the TITLE field name is equivalent to Title or title.

Rather than submitting documents in JSON format, you can alternatively submit a file, object store reference, or URL to the API. For example:

/1/api/[async|sync]/addtotextindex/v1?index=myindex&file= or url= or reference=

In this case, the API uses the Text Extraction API to extract the text content from the file, and creates the JSON documents automatically. In this case, the document reference field is the file name or URL that you submit.

curl -X POST http://api.havenondemand.com/1/api/[async|sync]/addtotextindex/v1 --form "file=@myfile.txt" --form "index=default_index"

{
	"index" : "myindex",
	"references" : [{
			"reference" : "myfile.txt",
			"id" : 108
		}
	]
}

You can add multiple file inputs:

curl -X POST https://api.havenondemand.com/1/api/async/addtotextindex/v1 --form "file=@myfile1.txt" --form "file=@myfile2.doc" --form "index=default_index"

Note: API input is subject to a maximum size quota. If you upload text or a file that is too large, the API returns an error. For more information, see Rate Limiting, Quotas, Data Expiry, and Maximums.

You can also use Connectors to automatically extract content from your existing repositories, such as Microsoft SharePoint, and Intranet sites.

Get the Results of an Asynchronous Request

The asynchronous mode returns a job-id, which you can then use to extract your results. There are two methods for this:

  • Use /1/job/status/ to get the status of the job, including results if the job is finished.
  • Use /1/job/result/, which waits until the job has finished and then returns the result.

    Note: Because /result has to wait for the job to finish before it can return a response, using it for longer operations such as processing a large video file can result in an HTTP request timeout response. The /result method returns a response either when the result is available, or after 120 seconds, whichever is sooner. If the job is not complete after 120 seconds, the /result method returns a code 7010 (job result request timeout) response. This means that your asynchronous job is still in progress. To avoid the timeout, use /status instead.

Synchronous
https://api.havenondemand.com/1/api/sync/addtotextindex/v1
Asynchronous
https://api.havenondemand.com/1/api/async/addtotextindex/v1
Authentication

This API requires an authentication token to be supplied in the following parameter:

Parameter Description
apikey The API key to use to authenticate the API request.
Parameters

This API accepts the following parameters:

Required
Name Type Description
file
array<binary> A file to index. The API passes the file to the Text Extraction API to extract the contents for indexing. Multipart POST only.
json
json The JSON document to index.
reference
string A Haven OnDemand reference obtained from either the Expand Container or Store Object API. The corresponding document is passed to the API.
url
string A publicly accessible HTTP URL from which the document can be retrieved.
index
resource The text index to add the file to.
Optional
Name Type Description
additional_metadata
array<json> A JSON object containing additional metadata to add to the indexed documents. This option does not apply to JSON input. To add metadata for multiple files, specify objects in order, separated by an empty object.
duplicate_mode
enum The method to use to handle duplicate documents. Default value: replace.
reference_prefix
array<string> A string to add to the start of the reference of documents that are extracted from a file. To add a prefix for multiple files, specify prefixes in order, separated by a space.
Enumeration Types

This API's parameters use the enumerations described below:

duplicate_mode
The method to use to handle duplicate documents.
duplicate Keep multiple documents with the same reference.
replace On adding a document, remove all existing documents with the same reference.

This API returns a JSON response that is described by the model below. This single model is presented both as an easy to read abstract definition and as the formal JSON schema.

Asynchronous Use

Additional requests are required to get the result if this API is invoked asynchronously.

You can use /1/job/status/<job-id> to get the status of the job, including results if the job is finished.

You can also use /1/job/result/<job-id>, which waits until the job has finished and then returns the result.

Model
This is an abstract definition of the response that describes each of the properties that might be returned.
Add To Text Index Response {
index ( string ) The text index that the file was indexed to.
references ( array[References] ) The files that were indexed
}
Add To Text Index Response:References {
One of the following: References_1 or References_2
reference ( string , optional) The file reference.
}
References_1 {
id ( integer ) The ID of the file in this index.
reference ( string , optional) The document reference
}
References_2 {
error ( Error ) Error extracting this file.
}
References_2:Error {
error ( integer ) The Haven OnDemand error code.
reason ( string ) The error message.
}
Model Schema
This is a JSON schema that describes the syntax of the response. See json-schema.org for a complete reference.
{
    "properties": {
        "index": {
            "type": "string"
        },
        "references": {
            "items": {
                "oneOf": [
                    {
                        "properties": {
                            "id": {
                                "type": "integer"
                            },
                            "reference": {
                                "type": "string"
                            }
                        },
                        "required": [
                            "id"
                        ]
                    },
                    {
                        "properties": {
                            "error": {
                                "properties": {
                                    "error": {
                                        "type": "integer"
                                    },
                                    "reason": {
                                        "type": "string"
                                    }
                                },
                                "required": [
                                    "error",
                                    "reason"
                                ],
                                "type": "object"
                            }
                        },
                        "required": [
                            "error"
                        ]
                    }
                ],
                "properties": {
                    "reference": {
                        "type": "string"
                    }
                },
                "type": "object"
            },
            "type": "array"
        }
    },
    "required": [
        "references",
        "index"
    ],
    "type": "object"
}


If you would like to provide us with more information then please use the box below:

We will use your submission to help improve our product.