Find Related Concepts

Returns the best terms and phrases in documents that match the specified query.

The Find Related Concepts API returns a list of the best terms and phrases in query result documents. You can use these terms and phrases to provide topic disambiguation, automatic query guidance, or dynamic thesaurus generation.

Quick Start

You provide query text, which defines the set of documents that you want to describe. The response returns a ranked list of other terms and phrases that occur in these documents.

The Find Related Concepts API has many of the same options as the Query Text Index API. You submit text or a document, as a file, reference, text or url, for analysis against one or more text indexes. You can use the same query syntax as the Query Text Index API, including Boolean and Proximity Operators and Field Text Operators.

Haven OnDemand provides a number of Public Text Indexes that you can query, including Wikipedia and news feeds in multiple languages. You can also use your own indexes.

For more information about query syntax, public datasets, and text indexes, see the Documentation page.

You can use optional parameters to refine the request. These include:

  • min_score: A minimum percentage score for relevance, based on Haven OnDemand's ranking algorithms. This can be between 0 (the default) and 100.
  • sample_size, the number of documents to use to generate the concepts, per index. The default sample size is 250, and is limited to a maximum of 2500. This is multiplied by the number of indexes over which you are conducting the search. For example, calling this API on two indexes using a sample size of 250 will actualy search 500 documents.
  • Note: if you want to restrict result by date, Haven OnDemand recommends that you use the field_text parameter with the RANGE operator, rather than min_date and max_date.

The requests in these examples search the English language news feeds for concepts related to "mercury". The version on the left uses the default sample size. The version on the right expands the sample to 500 documents and sets a relevance score of 85%. Both requests limit responses to the first seven.

/1/api/sync/findrelatedconcepts/v1?text=mercury&indexes=news_eng&max_results=7
/1/api/sync/findrelatedconcepts/v1?text=mercury&indexes=news_eng&max_results=7&min_score=85&sample_size=500
	  

{
  "entities": [
    {
      "text": "Mercury",
      "docs_with_phrase": 250,
      "occurrences": 1513,
      "docs_with_all_terms": 250,
      "cluster": -1
    },
    {
      "text": "Mercury Retrograde",
      "docs_with_phrase": 21,
      "occurrences": 193,
      "docs_with_all_terms": 24,
      "cluster": 0
    },
    {
      "text": "Mercury Prize",
      "docs_with_phrase": 26,
      "occurrences": 56,
      "docs_with_all_terms": 28,
      "cluster": 1
    },
    {
      "text": "planet Mercury",
      "docs_with_phrase": 23,
      "occurrences": 32,
      "docs_with_all_terms": 96,
      "cluster": 2
    },
    {
      "text": "Phoenix Mercury",
      "docs_with_phrase": 23,
      "occurrences": 26,
      "docs_with_all_terms": 26,
      "cluster": 3
    },
    {
      "text": "Young Fathers",
      "docs_with_phrase": 16,
      "occurrences": 68,
      "docs_with_all_terms": 17,
      "cluster": 1
    },
    {
      "text": "during Mercury retrograde.",
      "docs_with_phrase": 12,
      "occurrences": 35,
      "docs_with_all_terms": 22,
      "cluster": 0
    }
  ]
}


	  
	  
{
  "entities": [
    {
      "text": "Mercury",
      "docs_with_phrase": 478,
      "occurrences": 2124,
      "docs_with_all_terms": 478,
      "cluster": -1
    },
    {
      "text": "Freddie Mercury",
      "docs_with_phrase": 40,
      "occurrences": 113,
      "docs_with_all_terms": 41,
      "cluster": 0
    },
    {
      "text": "Mercury Prize",
      "docs_with_phrase": 40,
      "occurrences": 88,
      "docs_with_all_terms": 47,
      "cluster": 1
    },
    {
      "text": "Phoenix Mercury",
      "docs_with_phrase": 39,
      "occurrences": 44,
      "docs_with_all_terms": 45,
      "cluster": 2
    },
    {
      "text": "Mercury Retrograde",
      "docs_with_phrase": 24,
      "occurrences": 201,
      "docs_with_all_terms": 28,
      "cluster": 3
    },
	    {
      "text": "power plants",
      "docs_with_phrase": 39,
      "occurrences": 119,
      "docs_with_all_terms": 48,
      "cluster": 4
    },
    {
      "text": "Environmental Protection",
      "docs_with_phrase": 28,
      "occurrences": 32,
      "docs_with_all_terms": 33,
      "cluster": 4
    }
  ]
}


In addition to a list of related words or phrases, the response includes:

  • Informaton on how many documents contain the related concept:
    • docs_with_phrase: The number of documents containing the related concept in the same form as returned, for example, planet Mercury. The exact match of the query term at the top of the list relates to the sample_size. In column 1, it is equal to the default size of 250. For the more restrictive search in column 2, it does not quite make the 500 limit specified in the sample_size parameter.
    • occurrences: The number of times the related concept occurs in the sample overall.
    • docs_with_all_terms: the number of documents containing all the terms in the related concept phrase, not necessarily as given. For example, the returns have both words planet and mercury, anywhere in the document.
  • A cluster number. The cluster number is a grouping value that can help you disambiguate results and identify discrete groups of concepts within them. In the examples, there are several distinct concepts evoked by the word mercury. In column 1, we have two mentions of Mercury Retrograde, an astronomical event, grouped in cluster 0. The Mercury Prize, a music award, is conceptually related to Young Fathers, a band that won the award, at cluster 1. In column 2, power plants and Environmental Protection group together. Mercury Retrograde, the Phoenix Mercury charity fund, the Mercury Prize, and Freddie Mercury all have separate cluster numbers.
    Note: Concepts or phrases that occur so commonly in the sample set, that they do not permit differentiation of a new concept, receive a negative cluster number. In both columns, this is the case for the search query term itself, Mercury, in cluster -1. Normal cluster numbers start at 0.

Some typical uses of the Find Related Concepts API include:

  • Finding synonyms for your query term. You can use these for Query Manipulation or to build a thesaurus.
  • Extending or refining research.
    • Add related concepts as cross-references or subgroupings for query topics.
    • Follow through on them with more searches to find out more about the subject you are querying.
    • Find terms that are not useful for the search (because too frequent or not significant) to add to a stopword list.
  • Disambiguating. Use cluster groupings to differentiate the various meanings and contexts of a query term.
Synchronous
https://api.havenondemand.com/1/api/sync/findrelatedconcepts/v1
Asynchronous
https://api.havenondemand.com/1/api/async/findrelatedconcepts/v1
Authentication

This API requires an authentication token to be supplied in the following parameter:

Parameter Description
apikey The API key to use to authenticate the API request.
Parameters

This API accepts the following parameters:

Required
Name Type Description
file
binary A file containing the query text. Multi part POST only.
reference
string A Haven OnDemand reference obtained from either the Expand Container or Store Object API. The corresponding query text is passed to the API.
text
string The query text.
url
string A publicly accessible HTTP URL from which the query text can be retrieved.
Optional
Name Type Description
field_text
string The fields that result documents must contain, and the conditions that these fields must meet for the documents to return as results. See Field Text Operators.
indexes
array<resource> Type the name of one or more Haven OnDemand text indexes to return only documents that are stored in these text indexes. You can use the public datasets, or your own text indexes. See Public Text Indexes. Default value: [wiki_eng].
max_date
string The latest creation date or time that a document can have to return as a result. See Parameter Date Formats.
max_results
number The maximum number of related concepts to return. The maximum value is 2500. Default value: 20.
min_date
string The earliest creation date or time that a document can have to return as a result. See Parameter Date Formats.
min_score
number The minimum percentage relevance that results must have to the query to return. Default value: 0.
sample_size
number The maximum number of documents to use to generate concepts. The maximum value is 2500. Default value: 250.

This API returns a JSON response that is described by the model below. This single model is presented both as an easy to read abstract definition and as the formal JSON schema.

Asynchronous Use

Additional requests are required to get the result if this API is invoked asynchronously.

You can use /1/job/status/<job-id> to get the status of the job, including results if the job is finished.

You can also use /1/job/result/<job-id>, which waits until the job has finished and then returns the result.

Model
This is an abstract definition of the response that describes each of the properties that might be returned.
Find Related Concepts Response {
entities ( array[Entities] ) A result term or phrase identified in the results set.
}
Find Related Concepts Response:Entities {
cluster ( number ) The cluster into which the phrase has been grouped. This value allows you to cluster the elements according to their occurrence.
docs_with_all_terms ( number ) The number of documents of the results set in which all terms of this element appear.
docs_with_phrase ( number ) The number of documents in the result set in which this element appears as a phrase.
occurrences ( number ) The total number of occurrences of this element in the results set.
text ( string ) The text of the identified term or phrase.
}
Model Schema
This is a JSON schema that describes the syntax of the response. See json-schema.org for a complete reference.
{
    "properties": {
        "entities": {
            "items": {
                "properties": {
                    "cluster": {
                        "type": "number"
                    },
                    "docs_with_all_terms": {
                        "type": "number"
                    },
                    "docs_with_phrase": {
                        "type": "number"
                    },
                    "occurrences": {
                        "type": "number"
                    },
                    "text": {
                        "type": "string"
                    }
                },
                "required": [
                    "text",
                    "docs_with_phrase",
                    "occurrences",
                    "docs_with_all_terms",
                    "cluster"
                ],
                "type": "object"
            },
            "type": "array"
        }
    },
    "required": [
        "entities"
    ],
    "type": "object"
}
https://api.havenondemand.com/1/api/sync/findrelatedconcepts/v1
/api/api-example/1/api/sync/findrelatedconcepts/v1
Examples
See this API for yourself - select one of our examples below.
'Autonomy'
Find Related Concepts to 'Autonomy'
'Christmas'
Find Related Concepts to 'Christmas'
'Browser'
Find Related Concepts to 'Browser'
'Jaguar cars'
Find Related Concepts to 'Jaguar cars'
Parameters
Required
Select file Change Remove
Optional
Name Type Value
field_text
string
indexes
array
max_date
string
max_results
number
min_date
string
min_score
number
sample_size
number


ASync – Response An error occurred making the API request
Response Code:
Response Body

	
Making API Request…
Checking result of job

To try this API with your own data and use it in your own applications, you need an API Key. You can create an API Key from your account page - API Keys.

Output Refresh An error occurred making the API request View Input
Rendered RawHtml Response
Result Display
Response Code:
Response Body:

			
Make this call with curl


If you would like to provide us with more information then please use the box below:

We will use your submission to help improve our product.