Query Text Index

Searches for items that match your specified natural language text, Boolean expressions, or fields.

The Query Text Index API searches for content in the Haven OnDemand databases. Your query can include natural language text, keywords, and Boolean expressions. The API returns documents from a specified text index that matches your query expression.

Quick Start

All queries must contain the text parameter. Haven OnDemand performs language processing to match terms as broadly as possible, including stemming (reducing plurals and verb forms of a word to the same stem) and transliteration (converting accented characters to non-accented forms).

A wide range of other optional parameters allow you to return a more specific results set, or to organize your results.

You can use quotation marks to enable exact phrase search. For example:

/1/api/[async|sync]/querytextindex/v1?text="Red+Panda" AND Elephant

Many other Boolean and Proximity operators are available for the text parameter. For more information see Boolean and Proximity Operators.

Search Result Ranking

Haven OnDemand indexes assign weights to terms based on their relevance to the dataset. The Text Tokenization API lets you examine what the weights are in the Wikipedia index as a basis. As a simplified example, if every document in your index is about red things, then the term red is not as relevant in the following query as the term Panda.

/1/api/[async|sync]/querytextindex/v1?text=Red+Panda

Haven OnDemand ranks query results according to various factors, such as the weights of the terms in the index, frequency in the returned documents, and proximity of the stemmed terms. However, it does not operate on a match all terms basis and actually operates better with more text rather than less. Documents return according to how closely they match the entire query.

{
  "documents": [
    {
      "reference": "http://en.wikipedia.org/wiki/Red panda",
      "weight": 87,
      "links": [
        "RED",
        "PAND"
      ],
      "index": "wiki_eng",
      "title": "Red panda",
      "wikipedia_category": [
        "Fauna of the Himalayas",
        "Mammals of India",
        "Fauna of South Asia",
        "Fauna of East Asia",
        "Mammals of Nepal",
        "Mammals of Bhutan",
        "Mammals of China",
        "Mammals of Laos",
        "Mammals of Burma",
        "Living fossils",
        "EDGE species",
        "Wildlife of Yunnan",
        "Monotypic mammal genera",
        "Animals described in 1825"
      ],
      "wikipedia_type": [
        "species"
      ]
    },
    {
      "reference": "http://en.wikipedia.org/wiki/Giant panda",
      "weight": 86.34,
      "links": [
        "RED",
        "PAND"
      ],
      "index": "wiki_eng",
      "title": "Giant panda",
      "wikipedia_category": [
        "Giant pandas",
        "Animals described in 1869",
        "Articles containing video clips",
        "Conservation reliant species",
        "EDGE species",
        "Fauna of East Asia",
        "Herbivorous animals",
        "Megafauna of Eurasia",
        "Mammals of China",
        "Mammals with sequenced genomes",
        "National symbols of China"
      ],
      "wikipedia_type": [
        "species"
      ]
    },
   ...

Note: You can also use field operators to bias the relevance of a document according to a value in a field. For more information, see Field Matching and Operations, or Field Text Operators.

Specify the Index

Haven OnDemand offers various public text indexes readily available for search, such as Wikipedia in various languages, or news sources. The indexes parameter allows you to restrict your results to these text indexes. For example:

/1/api/[async|sync]/querytextindex/v1?text="Red+Panda" AND Elephant&indexes=wiki_eng

You can also search any index that you have created by using the Create Text Index API by specifying it in the same way.

/1/api/[async|sync]/querytextindex/v1?text="Red+Panda" AND Elephant&indexes=myindex

You can specify multiple indexes by setting multiple versions of the indexes parameter. For example:

/1/api/[async|sync]/querytextindex/v1?text="Red+Panda" AND Elephant&indexes=myindex&indexes=wiki_eng

Modify the Output

You can set the print parameter to all to return all the fields for each document.

/1/api/[async|sync]/querytextindex/v1?text="Red+Panda"&indexes=wiki_eng&print=all

The default value for print is fields, which allows you to specify the fields you want to print out in the print_fields parameter. For example:

/1/api/[async|sync]/querytextindex/v1?text="Red+Panda"&indexes=wiki_eng&print_fields=WIKIPEDIA_TYPE,CONTENT

This example returns the WIKIPEDIA_TYPE and CONTENT fields for the result documents.

{
...
      "index": "wiki_eng",
      "title": "Red panda",
      "wikipedia_type": [
        "species"
      ],
      "content": "The red panda (Ailurus fulgens), also called lesser panda and red cat-bear, is a small arboreal mammal native to the eastern Himalayas and southwestern China that has been classified as Vulnerable by IUCN as its wild population is estimated at less than 10,000 mature individuals. ..." 
}
...

Control the Number of Results

By default, Haven OnDemand returns up to six documents for each search. You can use the max_results parameter to choose the exact number of documents that you want to return. For example:

/1/api/[async|sync]/querytextindex/v1?text="Red+Panda"&indexes=wiki_eng&max_results=100

You can set the total_results parameter to true to return the total number of results that match the query, assuming no maximum restrictions. This option is most useful for pagination.

1/api/[async|sync]/querytexindex/v1?text="Red+Panda"&indexes=wiki_eng&total_results=true

In this case, the totalhits tag returns in the results:

{
  "documents": [
...
  ],
  "totalhits": "214"
}

Field Matching and Operations

The field_text parameter allows for many operations, and capabilties vary based on the Index Field Types of the fields in your index. You can use field_text to match particular values or ranges of values in specified fields, to find documents where a particular field exists, or to bias the relevance of a result according to a value in a field.

You can use field_text expressions only for fields that have a field type that is optimized for a particular operation. For example, you can use the EQUAL operator only on fields with the numeric field type. The operation fails if you use a field type that is not optimized.

/1/api/[async|sync]/querytextindex/v1?text="Red+Panda"&indexes=wiki_eng&field_text=MATCH{species}:WIKIPEDIA_TYPE&print_fields=WIKIPEDIA_TYPE

This example uses the MATCH operator to search for pages that have the field WIKIPEDIA_TYPE with the value species. The WIKIPEDIA_TYPE field in the wiki_eng public text index is parametric type, which is optimized for MATCH.

{
  "documents": [
    {
      "reference": "http://en.wikipedia.org/wiki/Red panda",
      "weight": 87,
      "links": [
        "RED",
        "PAND"
      ],
      "index": "wiki_eng",
      "title": "Red panda",
      "wikipedia_type": [
        "species"
      ]
    },
    {
      "reference": "http://en.wikipedia.org/wiki/Giant panda",
      "weight": 86.34,
      "links": [
        "RED",
        "PAND"
      ],
      "index": "wiki_eng",
      "title": "Giant panda",
      "wikipedia_type": [
        "species"
      ]
    },

For more information on the operators, formats, and field type requirements of the field_text parameter, see Field Text Operators. For information about the field types of the fields in the public data sets, see Public Text Indexes. For field types in your private text indexes, see Index Flavors.

Summarization

The summary parameter enables the display of small summaries of the returned documents. For example:

/1/api/[async|sync]/querytextindex/v1?text="Red Panda"&summary=context

This example uses the context option to return summaries that contain the query terms. This option is very useful when dealing with large documents, and where you want to automatically generate informative snippets.


	  "title": "Red panda",
      "summary": "The red panda (Ailurus fulgens), also called lesser panda and red cat-bear, is a small arboreal mammal native to the eastern Himalayas and southwestern China that has been classified as Vulnerable by IUCN as its wild population is estimated at less than 10,000 mature individuals...

Note: The Query Text Index API does not create a summary if the document does not have enough text content. For example, the standard Transport data set documents do not have much text, so the API cannot create a summary.

Highlight Query Terms

The highlight parameter allows you to automatically highlight the relevant pieces of information in the result text. For example:

/1/api/[async|sync]/querytextindex/v1?text="Red Panda"&summary=concept&highlight=summary_terms

In this case we are using summary_terms to highlight our query terms in the summary.


     "title": "Red panda",
     "summary": "The red panda (Ailurus fulgens) ...

Sort Results

By default, Haven OnDemand sorts results by order of relevance. You can use the sort parameter to adjust the sort order. For example, when Date Fields are available, you can use the sort parameter to return the results in order of the date field value:

/1/api/[async|sync]/querytextindex/v1?text="Red+Panda"&indexes=news_eng&sort=date&print_fields=DATE

This example orders results by the value in the DATE field, with the most recent document first.

{
  "documents": [
    {
      "reference": "http://feeds.huffingtonpost.com/c/35496/f/677088/s/4a241755/sc/14/l/0L0Shuffingtonpost0N0C20A150C0A90C230Cmeet0Etofu0Ethe0Ebaby0Ered0Epanda0Eand0Ecutest0Elittle0Enugget0Eon0Eearth0In0I81915220Bhtml0Dutm0Ihp0Iref0Fscience0Gir0FScience/story01.htm",
      "weight": 86.92,
      "links": [
        "RED",
        "PAND"
      ],
      "index": "news_eng",
      "title": "Meet Tofu The Baby Red Panda, Who Is The Cutest Little Nugget On Earth",
      "date": [
        "1443121331"
      ]
    },
    {
      "reference": "http://feeds.huffingtonpost.com/c/35496/f/677055/s/4a23d118/sc/14/l/0L0Shuffingtonpost0N0C20A150C0A90C230Cmeet0Etofu0Ethe0Ebaby0Ered0Epanda0Eand0Ecutest0Elittle0Enugget0Eon0Eearth0In0I81915220Bhtml0Dutm0Ihp0Iref0Fdc0Gir0FDC/story01.htm",
      "weight": 86.92,
      "links": [
        "RED",
        "PAND"
      ],
      "index": "news_eng",
      "title": "Meet Tofu The Baby Red Panda, Who Is The Cutest Little Nugget On Earth",
      "date": [
        "1443121331"
      ]
    },
Synchronous
https://api.havenondemand.com/1/api/sync/querytextindex/v1
Asynchronous
https://api.havenondemand.com/1/api/async/querytextindex/v1
Authentication

This API requires an authentication token to be supplied in the following parameter:

Parameter Description
apikey The API key to use to authenticate the API request.
Parameters

This API accepts the following parameters:

Required
Name Type Description
file
binary A file containing the query text. Multipart POST only.
reference
string A Haven OnDemand reference obtained from either the Expand Container or Store Object API. The corresponding query text is passed to the API.
text
string The query text.
url
string A publicly accessible HTTP URL from which the query text can be retrieved.
Optional
Name Type Description
absolute_max_results
number The absolute maximum number of results to return for this query. Default value: 6.
check_spelling
enum Whether to check the spelling of the input text. Default value: none.
end_tag
string The closing HTML tag to use to highlight a match. If omitted, this is generated automatically from the start_tag.
field_text
string The fields that result documents must contain, and the conditions that these fields must meet for the documents to return as results. See Field Text Operators.
highlight
enum The highlighting option to use for the result text. Default value: off.
ignore_operators
boolean This option disables wildcards, phrase queries, field restriction, and Boolean operations. Default value: false.
indexes
array<string> The name of the Haven OnDemand text index that you want to search for results. You can use the public datasets, or your own text indexes. See Public Text Indexes. Default value: [wiki_eng].
max_date
string The latest creation date or time that a document can have to return as a result. See Parameter Date Formats.
max_page_results
number The maximum number of results to return for this query from the absolute number of results returned. You can use this option with the start parameter to page results. In this case, max_page_results sets the number of results to return in a particular page, while absolute_max_results sets the total maximum number of results the query can return.
min_date
string The earliest creation date or time that a document can have to return as a result. See Parameter Date Formats.
min_score
number The minimum percentage relevance that results must have to the query to return. Default value: 0.
print
enum The types of fields and content to display in the results. Default value: fields.
print_fields
string The names of fields to print in the results.
promotion
boolean Whether to return the promotion. Default value: false.
query_profile
string The name of the query profile that you want to apply. See Query Manipulation.
sort
enum The criteria to use for the result display order. By default, results are displayed in order of relevance. Default value: relevance.
start
number The number of the first result to display from the total list. You can use this option to return the query results in pages. This value must be greater than 1, and smaller than the value of absolute_max_results. Default value: 1.
start_tag
string The opening HTML tag to use to highlight a match. Default value: <span style="background-color: yellow">.
summary
enum The type of summary to create for result documents. Default value: off.
total_results
boolean Set to true to return an estimate of the total number of result documents, and the total number of documents and document sections in the query text indexes. This value is the number of matching documents in the specified text indexes, which might be larger than the value of absolute_max_results, or max_page_results. Default value: false.
Enumeration Types

This API's parameters use the enumerations described below:

check_spelling
Whether to check the spelling of the input text.
none None
Do not spell check content.
suggest Suggest spelling corrections
Suggest alternatives for misspelled terms, but perform the original query as normal.
autocorrect Autocorrect query terms
Automatically update the original query with suggested alternatives for misspelled terms.
highlight
The highlighting option to use for the result text.
off No highlighting.
terms Terms that match the query text.
sentences Sentences that contain query terms.
summary_sentences Summary sentences that contain query terms.
summary_terms Query terms that occur in summary sentences.
print
The types of fields and content to display in the results.
all All fields.
all_sections All fields and all sections.
date Date fields.
fields Print fields listed in the print_fields parameter.
none Do not print content fields.
no_results Do not print results.
parametric Parametric fields.
reference Reference fields.
sort
The criteria to use for the result display order. By default, results are displayed in order of relevance.
relevance Relevance order (most relevant first).
reverse_relevance Relevance order (least relevant first).
date Date order (most recent first).
reverse_date Date order (oldest first).
autn_rank Order by the standard relevance adjustment field.
off No sorting.
summary
The type of summary to create for result documents.
concept Concept Summary
Contains sentences that are typical of the result content. These sentences can be from different parts of the result document.
context Context Summary
Contains sentences that are typical of the result content, biased by the terms in the query text.
quick Quick Summary
The first few sentences of the document.
off No summary

This API returns a JSON response that is described by the model below. This single model is presented both as an easy to read abstract definition and as the formal JSON schema.

Asynchronous Use

Additional requests are required to get the result if this API is invoked asynchronously.

You can use /1/job/status/<job-id> to get the status of the job, including results if the job is finished.

You can also use /1/job/result/<job-id>, which waits until the job has finished and then returns the result.

Model
This is an abstract definition of the response that describes each of the properties that might be returned.
Query Text Index Response {
documents ( array[Documents] ) The details of the returned documents.
suggestion ( unknown , optional) Suggested spelling, if the check_spelling parameter is set to "suggest". This value is null if the spelling was correct.
auto_correction ( unknown , optional) Suggested spelling, if the check_spelling parameter is set to "autocorrect". This value is null if the spelling was correct.
}
Documents {
index ( string , optional) The database that the result returned from.
links ( array[string] , optional) The terms from the query that match in the results document.
reference ( string , optional) The reference string that identifies the result document.
summary ( string , optional) The summary of the results document.
title ( string , optional) The title of the result document.
weight ( number , optional) The percentage relevance that the result document has to the original query.
}
Model Schema
This is a JSON schema that describes the syntax of the response. See json-schema.org for a complete reference.
{
    "properties": {
        "documents": {
            "items": {
                "properties": {
                    "index": {
                        "type": "string"
                    },
                    "links": {
                        "items": {
                            "type": "string"
                        },
                        "type": "array"
                    },
                    "reference": {
                        "type": "string"
                    },
                    "summary": {
                        "type": "string"
                    },
                    "title": {
                        "type": "string"
                    },
                    "weight": {
                        "type": "number"
                    }
                },
                "type": "object"
            },
            "type": "array"
        },
        "suggestion": {
            "oneOf": [
                {
                    "type": "null"
                },
                {
                    "type": "object",
                    "properties": {
                        "corrections": {
                            "type": "array",
                            "items": {
                                "type": "string"
                            }
                        },
                        "original_query": {
                            "type": "string"
                        }
                    }
                }
            ]
        },
        "auto_correction": {
            "oneOf": [
                {
                    "type": "null"
                },
                {
                    "type": "object",
                    "properties": {
                        "corrections": {
                            "type": "array",
                            "items": {
                                "type": "string"
                            }
                        },
                        "original_query": {
                            "type": "string"
                        },
                        "corrected_query": {
                            "type": "string"
                        }
                    }
                }
            ]
        }
    },
    "required": [
        "documents"
    ],
    "type": "object"
}
https://api.havenondemand.com/1/api/sync/querytextindex/v1
/api/api-example/1/api/sync/querytextindex/v1
Examples
See this API for yourself - select one of our examples below.
Keyword Search
Simple keyword search for 'government' on Wikipedia.
Search
Get a list of all Irish winners of the Nobel Prize.
Search
Query for the latest news on Taylor Swift in four different languages.
Search
Get the flags of all countries smaller than 100 square kilometres.
Parameters
Required
Select file Change Remove
Optional
Name Type Value
absolute_max_results
number
check_spelling
enum
end_tag
string
field_text
string
highlight
enum
ignore_operators
boolean
(Default: False)
indexes
array
max_date
string
max_page_results
number
min_date
string
min_score
number
print
enum
print_fields
string
promotion
boolean
(Default: False)
query_profile
string
sort
enum
start
number
start_tag
string
summary
enum
total_results
boolean
(Default: False)


ASync – Response An error occurred making the API request
Response Code:
Response Body

	
Making API Request…
Checking result of job

To try this API with your own data and use it in your own applications, you need an API Key. You can create an API Key from your account page - API Keys.

Output Refresh An error occurred making the API request View Input
Rendered RawHtml Response
Result Display
Response Code:
Response Body:

			
Make this call with curl


If you would like to provide us with more information then please use the box below:

We will use your submission to help improve our product.