Use Parameteric Search
Use Haven OnDemand Parametric (faceted) Search

Use Parametric (Faceted) Search

Parametric or faceted search is a powerful tool used in many search applications to allow users to filter their results.

For example, you might want to set up an application to:

  • search a product catalog and restrict results by color or model
  • search a news database and filter results by a country or subject

In an application, you might provide a list of filter values (for example, the different colors or subjects), and users can select the one they are most interested in, and view a restricted list of results based on that filter.

In Haven OnDemand, you can create parametric searches by using the Get Parametric Values API.

Key Concepts

This section describes the things you need to know about before you set up parametric search.

Text Indexes

Parametric search works with Haven OnDemand text indexes. Before you continue, you might want to look at Text Indexes - Key Concepts.

Parametric Fields

Parametric search uses the values in parametric fields. Haven OnDemand stores the values in these fields to allow fast retrieval. Parametric fields generally contain short, discrete pieces of information, such as a name, location, or color, rather than free-form text or continuous numeric data.

You can retrieve these values by using the Get Parametric Values API.

The public text indexes contain standard parametric fields. For example, in the English Wikipedia text index, the wikipedia_category field is parametric type, so that you can use the values in this field to filter search results.

In your own text indexes, the main search text index flavors all have several standard fields configured as parametric by default, and you can also create your own custom parametric fields when you create the text index.

Field Search

Parametric search is closely related to field search. In a field search, you restrict the results of a Query Text Index or Find Similar query to documents that contain a particular value (or a value within a range) in a particular field. For example, you can match only results that have the value blue in the color field.

For more information, see Use Haven OnDemand Search Functionality.

The difference between field search and parametric search is that parametric search allows you to find the possible values of the field. You can use Get Parametric Values to find all possible values that occur in a specified parametric field, or values that occur in documents that match another query.

For example, you might send a Get Parametric Values API call with a query for shirts and request the values of the color field. The API returns a list of color values that occur in documents about shirts. You can then send a Query Text Index API call with the same query text, and add a field search to find documents that contain your favorite color in the color field.

Plan Your Application

You can use Get Parametric Values on public text indexes, so if you want to test out the API, or create an app that uses this public data, you can skip ahead to Use the Get Parametric Values API.

For other uses, you can set up a text index and index your own data.

Note: This section considers only the parametric search functionality. For more detailed information about choosing a text index flavor, and creating a text index, see Advanced Haven OnDemand Unstructured Text Indexing.

You can create parametric fields when you create the text index, but you cannot add them later. Before you create the index, think about the kind of filters you want to set up, and the fields that contain this data.

The main search text index flavors (Explorer, Standard, Custom_fields, and Jumbo), all have a set of standard parametric fields, such as author and category. For more information about the standard fields, see Index Flavors.

The Explorer, Standard, and Jumbo flavors also allow you to configure up to five custom parametric fields, and the Custom_fields flavor allows you to configure up to 10.

Choose Parametric Fields

You must choose your parametric fields before you create the text index.

If you create the index JSON manually, you can choose the fields from your JSON documents.

If you intend to use a connector, or to add documents directly by using the Add to Text Index API, you can use the Text Extraction API to see which fields Haven OnDemand will extract from a document.

If you use the Add to Text Index API, you can use the additional_metadata parameter to add custom metadata. For more information, see Advanced Haven OnDemand Unstructured Text Indexing. You cannot add additional fields to the JSON when you index by using a connector.

{
   "document" : [
      {
         "title" : "Product Name",
         "reference" : "Product106253849",
         "date" : "2016-04-01",
         "price" : "20.00",
         "currency" : "GBP",
         "content" : "Here is the description of the product",
         "type" : "electrical",
         "color" : [
           "blue",
           "green",
           "red"
         ],
         "manufacturer" : "companyname",
         "model" : "superfast",
         "free_delivery" : "yes",
         "delivery_location" : [
            "UK",
            "USA",
            "France"
         ]
      }
   ]
}

This example document contains a number of fields that you might want to use as filters for your searches. If a user runs a search, the might want to filter by the category, color, manufacturer, model, free_delivery, and delivery_location.

When you have only one or two filter criteria in your documents, it is easy to choose your parametric fields. However, if you have more than five (or more than 10 if you create a custom_fields flavor text index), you might need to choose your information more carefully, or adjust your documents. For example:

  • You can change a field name to correspond to one of the standard fields in your text index flavor. In this case, you could change type to category, and delivery_location to enriched_place, to index them as standard parametric fields.

  • You can create multiple small text indexes rather than a single large one. For example, if you have several types of products with different filters, you can create several Explorer flavor text indexes, with custom parametric fields appropriate to each type.

    Note: For information about resource costs for text indexes, see API and Resource Unit Consumption.

  • You can use the same field name for different filter types in different subsets of your documents. You can add a field_text restriction to the Get Parametric Values API, so you can use a standard field to differentiate between each set. There is more detail about this later (see Combine Search and Get Parametric Values).

Filter by Numeric Values

The examples so far all deal with fields that contain text information. You can also use parametric fields for numeric data, as long as the fields contain a discrete set of values. For example, you could use a parametric field for clothing or shoe sizes.

For continuous numeric data, such as prices, Haven OnDemand recommends you do not use a parametric field. The Get Parametric Values API returns every value separately, so if you have 100 query results with 100 different prices, your filter list would contain 100 options.

In this case, you can use numeric type fields, and add a field_text restriction to the query, for example using the RANGE field operator to restrict to results where the value in the price field falls within a specified range.

Filter by a Date

The Get Parametric Values API can also retrieve the values of date type fields, to allow you to filter documents by a date. You cannot add custom date fields, but each flavor of text index has some standard date fields available. For more information, see the Index Flavors documentation.

When you use Get Parametric Values to return the values in a date field that contains times as well as dates, the API returns the times in groupings of one hour.

Create the Text Index

You can create the text index by using the Create Text Index wizard on the Text Indexes section of the accounts pages, or by using the Create Text Index API.

You can add custom parametric fields in the wizard on the Flavor Attributes page.

In the Create Text Index API, you add parametric fields by adding the parametric_fields parameter. For example:

http://api.havenondemand.com/1/api/sync/createtextindex/v1?index=parametricindex&flavor=standard&parametric_fields=color&parametric_fields=model

This example creates a standard flavor text index, with the custom parametric fields color and model.

Index your Content

After you create the text index, you can index your content by using the Add To Text Index API. For more information, see Advanced Haven OnDemand Unstructured Text Indexing and the API documentation.

Find a List of Parametric Fields

As well as the text index flavor documentation, you can retrieve a list of fields from a text index by using the Retrieve Index Fields API. You can use this API for public and private text indexes.

For example:

https://api.havenondemand.com/1/api/sync/retrieveindexfields/v2?field_types=parametric&indexes=parametricindex

This API call returns a response of the following type:

{
   "parametricindex": {
      "total_fields": 3,
      "field_type_counts": {
         "parametric_count": 3
   },
   "parametric_type_fields": [
      "category",
      "color",
      "model"
      ]
   }
}

Note: The Retrieve Index Fields API returns only fields that exist (or that once existed) in a document in your text index. It does not retrieve a configured field if it has not been used. To ensure that you can retrieve all your custom parametric fields, you can index a test document that contains all the fields, and then delete it.

For example:

{
   "document" : [
      {
         "title" : "test",
         "reference" : "testdocument1",
         "content" : "test",
         "category" : "test",
         "color" : "test",
         "model" : "test"
       }
   ]
}

The following Add to Text Index API call adds this document to your text index:

https://api.havenondemand.com/1/api/sync/addtotextindex/v2?json={"document":[{"reference":"testdocument1","content":"test","category":"test","color":"test","model":"test"}]}&index=parametricindex

You can then delete the document by using the following call:

https://api.havenondemand.com/1/api/sync/deletefromtextindex/v1?index=parametricindex&index_reference=testdocument1

The Retrieve Index Fields API returns the parametric fields that you have indexed, even though the document has been deleted.

Use the Get Parametric Values API

The Get Parametric Values API returns a list of all the values that occur in the parametric or date type fields that you specify. You can specify multiple field names in a comma-separated list. For example:

https://api.havenondemand.com/1/api/sync/getparametricvalues/v1?field_name=color,category&indexes=parametricindex

This API returns a response of the following form:

{
   "color": {
      "GREEN": 18,
      "BLUE": 400,
      "YELLOW": 284,
      "RED": 282,
      "BLACK": 95,
      "PURPLE": 21,
      "GREY": 19,
      "TEAL": 217,
      "WHITE": 13,
      "ORANGE": 27
   },
   "category": {
      "ELECTRICAL": 509,
      "COMPUTING": 347,
      "ENTERTAINMENT": 930,
      "CLOTHING": 715,
      "HARDWARE": 41,
      "FURNITURE": 83
   }
}

For each field, the API returns a list of values, and the number of documents in your text index that the value occurs in. For example, in this case there are 18 documents in the index with the value GREEN in the color field.

You can use the document count to provide a number of results for each value. If you do not want to use the document count, you can set the document_count parameter to false.

Note: the number of values for the two different fields does not necessarily have the same total; there might be documents in the index that have a color field but no category, or a category field but no color field. Some documents might also have more than one value in a particular field.

Combine Search with Get Parametric Values

You can add a query to your Get Parametric Values API call to restrict the documents that retrieves field values from.

The text and field_text parameters in the Get Parametric Values API have the same syntax as for Query Text Index. You can also set the min_score parameter to the minimum percentage relevance that a document must have to your query to be included.

The API returns the values for your specified field_name that occur in documents that also match your query.

For example:

https://api.havenondemand.com/1/api/sync/getparametricvalues/v1?field_name=color,category&indexes=parametricindex&text=computers

The API returns the same response format as before (that is, it returns a list of field values, rather than any query results). However, the results are different, because it only considers documents that match the query for computers.

{
   "color": {
      "GREEN": 1,
      "BLUE": 4,
      "RED": 6,
      "BLACK": 78,
      "PURPLE": 1,
      "GREY": 17,
      "WHITE": 10
   },
 "category": {
      "ELECTRICAL": 114,
      "COMPUTING": 293,
      "ENTERTAINMENT": 125
   }
}

The query restriction means that there are fewer values, and the document counts are lower. For example, there are no documents that mention computers in the CLOTHING category.

The Choose Parametric Fields section mentioned using the search functionality to allow you to differentiate between different sets of documents in your text index. If you have a single text index, and you use a custom parametric field to mean different things for different types of documents, you can use field_text to restrict the Get Parametric Values API results to the appropriate options.

For example, in the parametricindex text index created earlier, there is a custom parametric field, model. For the product catalog examples above, the ELECTRICAL category might use model to mean a type of appliance, while in the CLOTHING category, it might be a style of clothing. You can use field_text to display the relevant values, according to the category:

https://api.havenondemand.com/1/api/sync/getparametricvalues/v1?field_name=model&indexes=parametricindex&field_text=MATCH{clothing}:category

This call returns the values in the model field for documents in the CLOTHING category:

{
   "model": {
      "TSHIRT": 29,
      "SHIRT": 2,
      "ACCESSORIES": 47,
      "SKIRT": 12,
      "SHOES": 3,
      "TROUSERS": 10
   }
}

Changing the category to ELECTRICAL might have very different results:

{
   "model": {
      "TOASTER": 28,
      "MICROWAVE": 132,
      "KETTLE": 96,
      "IRON": 128,
      "WASHING MACHINE": 42s
   }
}

The categories might also correspond to functionality in your application that changes the display for the different types of product.

Filter Selection

In your applications, you can use the Get Parametric Values response to produce a set of dynamic filters for user search. If you display the document count as well, the user knows how many results there are for each option.

When a user selects a filter, you can use a Query Text Index call, with the field_text parameter to retrieve the document set that the filter corresponds to. For example, the following call retrieves the documents in the index with the value GREEN in the color field:

https://api.havenondemand.com/1/api/sync/querytextindex/v1?indexes=parametricindex&field_text=MATCH{GREEN}:color

Note: Field names and values are not case sensitive. This field_text query matches the values green, GREEN, or GrEeN, in the field color, COLOR, cOLor, and so on. The Get Parametric Values API converts all field values to upper case in the response, and the field names are given in lower case.

Modify the Parametric Values Response

By default, Haven OnDemand does not order the field values in the Get Parametric Values response, and it returns up to 100 values. You can use the sort and max_values parameters to specify the order to return the field names in, and how many fields to return.

https://api.havenondemand.com/1/api/sync/getparametricvalues/v1?indexes=parametricindex&field_name=color&sort=alphabetical&max_values=5

This example returns a maximum of five values, in alphabetical order:

{
   "color": {
      "BLACK": 95,
      "BLUE": 400,
      "GREEN": 18,
      "GREY": 19,
      "ORANGE": 27
   }
}