Haven OnDemand Search Functionality
Make the most use of Haven OnDemand Search functionality, including Boolean, Geographical and Faceted Search

Use Haven OnDemand Search Functionality

This page highlights the operations available when you search data with Haven OnDemand. For simplicity and to let you try the queries on Haven OnDemand, most of the examples below query the English Wikipedia public dataset.

Contents

  1. Conceptual Search

    1. Search Operators

    2. Boolean Operators

    3. Proximity and Order Operators

    4. Advanced Search Operations

  2. Faceted Search and Field Matching

  3. Numeric Search

  4. Geographical or Coordinate Search

  5. Date Search

  6. Other Operations

Conceptual Search

The Query Text Index API allows you to search for content in the Haven OnDemand databases. It uses the power of IDOL to return conceptually relevant results simply by using a statistical understanding of the data and the terms in it, and without requiring any training. The Get Parametric Values, Find Related Concepts and Find Similar Documents APIs also make use of IDOL conceptual capabilities

Haven OnDemand indexes assign weights to terms based on their relevance to the dataset. As a simplified example, if every document in your index is about red things, then the term red is not as relevant in the following query as the term Panda.

/querytextindex/v1?text=Red Panda&indexes=wiki_eng

Tip: You can find the weight of a term in a particular text index by using the Tokenize Text API.

Haven OnDemand ranks query results according to various factors, such as the weights of the terms in the index, frequency in the returned documents, and proximity of the stemmed terms. However, it does not operate on a match all terms basis and actually operates better with more text rather than less. Documents return according to how closely they relate to the entire query.

{
    {
      "reference": "http://en.wikipedia.org/wiki/Red Panda Adventures",
      "weight": 84.59,
      "links": [
        "RED",
        "PAND"
      ],
      "index": "wiki_eng",
      "title": "Red Panda Adventures",
      "content": {}
    },
    {
      "reference": "http://en.wikipedia.org/wiki/Red panda",
      "weight": 84.59,
      "links": [
        "RED",
        "PAND"
      ],
      "index": "wiki_eng",
      "title": "Red panda",
      "content": {}
    },
   ...
The Text Parameter

The text parameter is a common parameter to all the conceptual search APIs, and is the source of all the conceptual term matching and weighting that Haven OnDemand applies to the results.

Search Operators

Wildcard Search

The asterisk (*) character acts as the wildcard operator. The following example searches for any term that starts with Obam.

/querytextindex/v1?text=Obam*

Tip: Do not use a wildcard with a single letter (for example a*). This wildcard expands to a very large number of values, and the query is likely to be very slow.

Occurrence Search

You can specify the number of times a term must occur in a document by specifying a colon (:) separated range of numbers in square brackets ([]). For example:

/querytextindex/v1?text=Gene[3:5]

In this example, the query returns documents that contain the term Gene between three and five times.

Exact Term Match

You can place a term in quotes ("") to search for only the exact version of the word, and not for all words that share the same stem.

/querytextindex/v1?text="lovely"

This example finds documents that must contain the exact word lovely, rather than expanding the query to include other words with the same stem such as love, loved, loving.

Case Sensitive Match

You can prefix a term with a tilde (~) in quotes("") to match case sensitively. For example:

/querytextindex/v1?text="~Apple"

This example returns only documents where the term Apple is capitalized.

Phrase search

You can also use quotes ("") to enable exact phrase search. For example:

/querytextindex/v1?text="Red Panda"

Boolean Operators

Boolean Operators

You can use Boolean search or bracketed Boolean search in the text parameter.

The Boolean operator must be in capitals. The following example searches for documents that contain both red pandas and elephants:

/querytextindex/v1?text="Red Panda" AND Elephant

You can use brackets to group elements together and ensure the correct order in which to apply the operators. The following example returns any document that mentions lions, with either pandas or elephants

/querytextindex/v1?text=(Panda OR Elephant) AND Lion

The NOT operator can restrict your searches to exclude certain terms. The following example returns documents about cats that do not include dog in the result.

/querytextindex/v1?text=cat NOT dog

The following example uses the NOT and OR operators to find documents that contain cat or dog, but not both:

/querytextindex/v1?text=(cat NOT dog) OR (dog NOT cat)

For further information about Boolean Operators, see Boolean and Proximity Operators.

Proximity and Order Operators

Sentence and Paragraph Search

The SENTENCE and PARAGRAPH operators allow you to make sure that two terms you are searching for are in the same sentence or the same paragraph respectively. For example:

/querytextindex/v1?text=cat SENTENCE dog
/querytextindex/v1?text=cat PARAGRAPH dog
Order Operators

The BEFORE and AFTER operators act like the AND operator and ensures that both terms are in the document result, but it also ensures the ordering of the results. For example, the first of the following queries returns documents where cat appears before dog in the text. The second returns documents where cat appears after dog.

/querytextindex/v1?text=cat BEFORE dog
/querytextindex/v1?text=cat AFTER dog
Term Proximity

The NEARN allows you to specify a maximum word distance between the terms you query for. In he following example, the words monkey and red must be within four words of each other. This operator allows you to make sure that the term red is associated closely with the term monkey, returning documents that contain phrases such as red-tailed monkey or red-faced spider monkey.

/querytextindex/v1?text=(monkey NEAR4 red)

DNEARN is a directed NEARN operator. It ensures that the terms occur in the specified order, like the BEFORE operator, and also specifies the maximum number of words that can appear in between. For example:

/querytextindex/v1?text=(red DNEAR3 monkey)

The NEARN and DNEARN operators are very useful for associating adjectives with nouns, because an exact phrase search for red monkey does not return red-tailed monkey, and an unrestricted search for red AND monkey returns any document that include both terms including, for example, a red berry that is eaten by monkeys.

Advanced Search operations

Precedence of Operators

Haven OnDemand queries apply Boolean and proximity operators in the following order:

First:      NOT
                 NEAR; DNEAR
                 AND; BEFORE; AFTER; SENTENCE; PARAGRAPH
Last:       OR

Operators that have the same level of precedence have neither left or right associativity. Use brackets to bind terms together as appropriate. Proximity operators must have terms on either side, and cannot be adjacent to brackets.

Search Specific Index Fields

You can restrict your search results to values in a specified index type field, by adding a colon and the name of the field. You can also use these field-specific searches in more complicated Boolean expressions.

/querytextindex/v1?text=galaxy:DRETITLE
/querytextindex/v1?text=("LA galaxy"):DRETITLE
/querytextindex/v1?text=("LA galaxy"):DRETITLE AND Beckham

For further information about field types, see Index Field Types.

Use Multipliers to Modify Term Weights

You can adjust the weight of a specific term by adding a multiplier with an asterisk (*) in square brackets ([]). This syntax multiplies the weight of the term in the index by the specified value, and returns it with the new weight in your query. In the following example, Panda has five times its usual weight:

/querytextindex/v1?text=Red Panda[*5]

This option might be useful if the query returns too many results about red things that are not about red pandas. The exact multiple you use affects the relevance score of particular results in your query, depending on the new weight of the term compared to the other terms. You can find the original weight of a term by using the Tokenize Text API.

Set Manual Term Weights at the Query Level

You can set manual weights for query terms by adding the new weight in square brackets ([]). The following example sets the weight of the term Panda to 20 and the weight of the term Red to 10.

/querytextindex/v1?text=Red[10] Panda[20]
Apply Weights to Bracketed Expressions

You can multiply the weights of any bracketed expression to adjust relevance. The following example assign a triple weight to spider and monkey terms.

/querytextindex/v1?text= (Spider Monkey)[*3] OR Tiger

Faceted Search and Field Matching

Documents generally do not only contain free text, but also custom fields, such as category tags, prices, authors, or any other value that might be relevant to associate with the text. This section looks at the different types of values and how you can use them.

The Field_Text Parameter

All the Haven OnDemand APIs that include the text parameter for conceptual search also have a field_text parameter. This parameter allows you to define rules and restrictions to apply on the fields of the documents.

For information on the standard fields that exist in the different flavor text indexes, see Index Flavors.

Note: field names in Haven OnDemand are not case sensitive. For example, the TITLE field name is equivalent to Title or title.

Facets - Parametric Fields

Wikipedia data contains some fields of type Parametric, these are fields that have values that can be listed and counted based on a query using the Get Parametric Values API.

The parametric fields available are wikipedia_type, person_profession for people, place_country_code for places, and company_exchange for companies

The Get Parametric Values API retrieves the unique values that occur in a particular field, which you can then use to provide faceted search.

For example, with a color parametric field, you can use the API to retrieve all the color values that occur in documents, and the corresponding counts. A common use for this information is to provide filters for a search. In this case, the count information represents the number of results available for each value of the filter.

The following example uses the wikipedia dataset and the WIKIPEDIA_TYPE to offer total counts for each WIKIPEDIA_TYPE Value.

/getparametricvalues/v1?index=wiki_eng&field_name=wikipedia_type
{
  "WIKIPEDIA_TYPE": {
    "PERSON": 1126163,
    "PLACE": 515896,
    "MUSICAL ALBUM": 113211,
    "SPECIES": 242170,
    "COMPANY": 77766,
    "FILM": 94852,
    "SONG": 55061,
    "BOOK": 46433,
    "VIDEO GAME": 16559,
    "GEOGRAPHICAL FEATURE": 90180,
    "PLAY": 6670
  }
}
Faceted Search

The Get Parametric Values API offers many of the search functionalities of the Query Text Index API such as the text parameter as well as the field_text operator.

For example, the following query returns the WIKIPEDIA_TYPE field values for only documents about cats and dogs:

/getparametricvalues/v1?index=wiki_eng&field_name=wikipedia_type&text=cats AND dogs
{
  "WIKIPEDIA_TYPE": {
    "PERSON": 32738,
    "MUSICAL ALBUM": 4885,
    "BOOK": 3704,
    "FILM": 6012,
    "COMPANY": 3119,
    "SONG": 2079,
    "VIDEO GAME": 1158,
    "PLACE": 6280,
    "GEOGRAPHICAL FEATURE": 1302,
    "SPECIES": 2835,
    "PLAY": 441
  }
}

You can use these field values in search implementations to provide a list of all the available values for certain queries so that you can filter the queries by these facets.

Text Match Selectors

Match a Single Value

The previous queries show that PERSON is a parametric entry for the WIKIPEDIA_TYPE field. The following query matches documents that relate to painting, and which have a type of PERSON.

/querytextindex/v1?text=Painting&field_text=MATCH{PERSON}:WIKIPEDIA_TYPE

This query might not return all the painters, so you can then run another parametric query to find the list of all the professions in the documents that return from the previous query for people related to painting. For example:

/getparametricvalues/v1?index=wiki_eng&field_name=person_profession&text=painting&field_text=MATCH{PERSON}:WIKIPEDIA_TYPE
{
  "PERSON_PROFESSION": {
    "PAINTER": 4409,
    "PHOTOGRAPHER": 52,
    ...
   }
}

This query confirms that you need the value painter for the field, and you can send the following query to return all the relevant results:

/querytextindex/v1?text=Painting&field_text=MATCH{PERSON}:WIKIPEDIA_TYPE AND MATCH{PAINTER}:PERSON_PROFESSION

While it might be redundant in this case, because the entries with a PERSON_PROFESSION are also of type PERSON, the example demonstrates that you can use Boolean operators in the field_text parameter. This example now retrieves all the painters related to the query text painting.

Match Multiple Values

The MATCHALL operator allows you to ensure that each of the values specified has a match in the documents returned. For example, the following query finds all the people who were both painters and sculptors:

/querytextindex/v1?text=Painting&field_text=MATCHALL{PAINTER,SCULPTOR}:PERSON_PROFESSION
Exclude Matches

You can use the NOT Boolean operator with the MATCH operator to ensure that the value specified does not occur in any of the specified fields. For example, the following query finds results about painting that are not about painters.

/querytextindex/v1?text=Painting&field_text=NOT MATCH{PAINTER}:PERSON_PROFESSION

In this case, the results do not need to contain the PERSON_PROFESSION field at all.

You can also use the NOTMATCH operator to ensure that the specified field exists, and that at least one occurrence of contains a value that is not the specified value. For example:

/querytextindex/v1?text=Painting&field_text=NOTMATCH{PAINTER}:PERSON_PROFESSION

This query finds results about painting that contains a PERSON_PROFESSION field with a value other than painter. If there are multiple occurrences of the PERSON_PROFESSION field in a result, at least one of them must contain a different value. For example, it might find non-painters, or painters who are also sculptors.

Numeric Search

For Numeric type fields, you can perform many numeric operations using field_text operators. The following queries show some examples.

Search for places with more than a million people:

/querytextindex/v1?text=*&field_text=GREATER{1000000}:PLACE_POPULATION 

Search for places with less than 100000 people:

/querytextindex/v1?text=*&field_text=LESS{100000}:PLACE_POPULATION

Search for documents with population exactly equal to 1061235:

/querytextindex/v1?text=*&field_text=EQUAL{1061235}:PLACE_POPULATION

Search for documents with population between 12 and 26:

/querytextindex/v1?text=*&field_text=NRANGE{12,26}:PLACE_POPULATION

Geographical or Coordinate Search

Numeric fields can define many things, such as population or prices. Two numeric fields paired together can indicate coordinates.

Places indexed in the wikipedia dataset have LAT and LON fields, indicating their approximate latitude and longitude. You can use the DISTSPHERICAL field_text operator to find documents about places close to a specified location. For each operator, you must specify coordinates, and a distance in kilometers.

The DISTSPHERICAL operator treats the coordinates as spherical coordinates, and the radius as a value in kilometers.

querytextindex/v1?text=*&field_text=DISTSPHERICAL{lat,lon,radius in KM}:LAT:LON

This option is very useful for finding places in the vicinity of another. For example:

querytextindex/v1?text=*&field_text=DISTSPHERICAL{40,-100,25}:LAT:LON

The DISTCARTESIAN operator works in a similar way to DISTSPHERICAL, but rather than using latitude and longitude, it treats the coordinates as being on a two-dimensional plane, and the distance is given in the units of that plane. If you use this operator with latitude and longitude fields, you might not get the expected results over large distances. This option is most useful if you create your own text index and documents, for example you might create X and Y coordinate fields that correspond to grid reference coordinates on a small map.

querytextindex/v1?text=*&field_text=DISTCARTESIAN{x,y,radius}:X:Y

For example:

querytextindex/v1?text=*&field_text=DISTCARTESIAN{324,236,20}:X:Y

Date Search

Date type fields allow for useful date filtering on the results.

The RANGE operator lets you specify exact time range that the specified date field must match. The following example finds documents where the MODIFIED_DATE is less than seven days in the past.

/querytextindex/v1?text=*&field_text=RANGE{-7,0}:MODIFIED_DATE

You can also use the following date syntaxes:

  • D+/M+/#YY+, HH:NN:SS D+/M+/#YY+, HH:NN:SS D+/M+/#YY+ #ADBC. Date formats

  • N. A number of days, as in the previous example.

  • Ne. An epoch time.

  • Ns. A negative or positive number of seconds from now.

The GTNOW and LTNOW operators let you restrict the results to documents with a date in the past, or the future. For example:

/querytextindex/v1?text=*&field_text=GTNOW{}:DATE
/querytextindex/v1?text=*&field_text=LTNOW{}:DATE

These options might be useful in your own text indexes to find results with an expiration date in the past (or future).

Other Operations

Boolean Operators in Field_Text

The field_text parameter supports the three basic Boolean operators NOT, AND and OR. For example:

/querytextindex/v1?text=Painting&field_text=MATCH{PERSON}:WIKIPEDIA_TYPE AND MATCH{PAINTER}:PERSON_PROFESSION
/querytextindex/v1?text=*&field_text=GREATER{1000000}:PLACE_POPULATION OR DISTCARTESIAN{50,-10,2}:LAT:LON
Field Existence

The EXISTS operator allows you to ensure that a field is present in the result. The following example returns only documents that have the field PLACE_POPULATION, but it can have any value.

/querytextindex/v1?text=*&field_text=EXISTS{}:PLACE_POPULATION
The EMPTY returns only results where the field value is empty, or if the field does not exist.
/querytextindex/v1?text=*&field_text=EMPTY{}:PLACE_POPULATION 

Note: If you want to return only documents where the field does not exist, you can combine the NOT Boolean operator with the EXISTS field_text operator as NOT+EXISTS{}. An empty value counts for the EXISTS operator.