You provide query text, which defines the set of documents that you want to describe. The response returns a ranked list of other terms and phrases that occur in these documents.
The Find Related Concepts API has many of the same options as the Query Text Index API. You submit text or a document, as a file, reference, text or url, for analysis against one or more text indexes. You can use the same query syntax as the Query Text Index API, including Boolean and Proximity Operators and Field Text Operators.
Haven OnDemand provides a number of Public Text Indexes that you can query, including Wikipedia and news feeds in multiple languages. You can also use your own indexes.
For more information about query syntax, public datasets, and text indexes, see the Documentation page.
You can use optional parameters to refine the request. These include:
- min_score: A minimum percentage score for relevance, based on Haven OnDemand's ranking algorithms. This can be between
0(the default) and
- sample_size, the number of documents to use to generate the concepts, per index. The default sample size is
250, and is limited to a maximum of
2500. This is multiplied by the number of indexes over which you are conducting the search. For example, calling this API on two indexes using a sample size of
250will actualy search
- Note: if you want to restrict result by date, Haven OnDemand recommends that you use the field_text parameter with the RANGE operator, rather than min_date and max_date.
The requests in these examples search the English language news feeds for concepts related to "mercury". The version on the left uses the default sample size. The version on the right expands the sample to
500 documents and sets a relevance score of
85%. Both requests limit responses to the first seven.
In addition to a list of related words or phrases, the response includes:
- Informaton on how many documents contain the related concept:
docs_with_phrase: The number of documents containing the related concept in the same form as returned, for example,
planet Mercury. The exact match of the query term at the top of the list relates to the sample_size. In column 1, it is equal to the default size of
250. For the more restrictive search in column 2, it does not quite make the
500limit specified in the sample_size parameter.
occurrences: The number of times the related concept occurs in the sample overall.
docs_with_all_terms: the number of documents containing all the terms in the related concept phrase, not necessarily as given. For example, the returns have both words
mercury, anywhere in the document.
- A cluster number. The cluster number is a grouping value that can help you disambiguate results and identify discrete groups of concepts within them. In the examples, there are several distinct concepts evoked by the word
mercury. In column 1, we have two mentions of
Mercury Retrograde, an astronomical event, grouped in cluster 0. The
Mercury Prize, a music award, is conceptually related to
Young Fathers, a band that won the award, at cluster 1. In column 2,
Environmental Protectiongroup together.
Mercury Retrograde, the
Phoenix Mercurycharity fund, the
Mercury Prize, and
Freddie Mercuryall have separate cluster numbers.
Note: Concepts or phrases that occur so commonly in the sample set, that they do not permit differentiation of a new concept, receive a negative cluster number. In both columns, this is the case for the search query term itself,
Mercury, in cluster -1. Normal cluster numbers start at 0.
Some typical uses of the Find Related Concepts API include:
- Finding synonyms for your query term. You can use these for Query Manipulation or to build a thesaurus.
- Extending or refining research.
- Add related concepts as cross-references or subgroupings for query topics.
- Follow through on them with more searches to find out more about the subject you are querying.
- Find terms that are not useful for the search (because too frequent or not significant) to add to a stopword list.
- Disambiguating. Use cluster groupings to differentiate the various meanings and contexts of a query term.