Anomaly Detection

Detects anomalies inside a given set of data.

The Anomaly Detection API detects anomalies in structured data. This API analyses structured data (in CSV format), and uses a novel anomaly scoring algorithm developed at Hewlett Packard Labs to extract the most anomalous records (rows) in the data.

Quick Start

The API requires one parameter to provide the input data. You can provide the data in the file, url, or reference parameters. For example:

/1/api/[async|sync]/anomalydetection/v1?file= or url= or reference=

Note: As this API typically processes larger files, Haven OnDemand recommends that you use the asynchronous version. Synchronous requests block until processing is complete, which might result in time outs. See Get the Results of an Asynchronous Request.

In each case, the file that you provide must be in CSV format, with the following structure:

  • The first row must be the comma-separated list of column headers.
  • The second row must be a list of the types for each of the columns. The following types are supported:
    • STRING
    • NUMERIC
  • The rest of the rows contain the data, with one record on each row, and column values that correspond to the specified headers and types.

The following very simple example contains a list of colored shapes:

color,shape
STRING,STRING
blue,triangle
blue,triangle
blue,triangle
blue,square
blue,square
blue,square
blue,pentagon
red,triangle
red,triangle
red,triangle
red,square
yellow,triangle

You can submit this file to the API, and it returns a list of the anomalies it detects.

curl -X POST http://api.havenondemand.com/1/api/[async|sync]/anomalydetection/v1 --form "file=@shapes.csv"

The anomaly can be a single value, or a combination of values. For this example, pentagon and yellow are anomalous values, because each of them occurs only once. In addition, the combination red square occurs only once, so this is also anomalous, even though the values red and square occur in other records.

For more complicated data sets, you can optionally use the columns parameter to restrict the analysis to particular columns. You can use this option if you want to detect particular types of anomalies in the data.

You can also use the max_results parameter to only return up to a specific number of anomalous values or combinations.

Get the Results of an Asynchronous Request

The asynchronous mode returns a job-id, which you can then use to extract your results. There are two methods for this:

  • Use /1/job/status/ to get the status of the job, including results if the job is finished.
  • Use /1/job/result/, which waits until the job has finished and then returns the result.

    Note: Because /result has to wait for the job to finish before it can return a response, using it for longer operations such as processing a large video file can result in an HTTP request timeout response. The /result method returns a response either when the result is available, or after 120 seconds, whichever is sooner. If the job is not complete after 120 seconds, the /result method returns a code 7010 (job result request timeout) response. This means that your asynchronous job is still in progress. To avoid the timeout, use /status instead.

Synchronous
https://api.havenondemand.com/1/api/sync/anomalydetection/v1
Asynchronous
https://api.havenondemand.com/1/api/async/anomalydetection/v1
Authentication

This API requires an authentication token to be supplied in the following parameter:

Parameter Description
apikey The API key to use to authenticate the API request.
Parameters

This API accepts the following parameters:

Required
Name Type Description
file
binary A CSV file that contains the data for anomaly detection.
reference
string A Haven OnDemand reference obtained from either the Expand Container or Store Object API.
url
string A publicly accessible HTTP URL from which the CSV document to analyze can be retrieved.
Optional
Name Type Description
max_results
number The maximum number of anomalies to return. Default value: 20.
columns
array<string> The list of columns to analyze. Use this parameter to restrict the analysis scope to combinations of the columns that you specify. An exact match is required, for example, column=region, column=location&column=product. By default, the API uses all columns in the CSV file.

This API returns a JSON response that is described by the model below. This single model is presented both as an easy to read abstract definition and as the formal JSON schema.

Asynchronous Use

Additional requests are required to get the result if this API is invoked asynchronously.

You can use /1/job/status/<job-id> to get the status of the job, including results if the job is finished.

You can also use /1/job/result/<job-id>, which waits until the job has finished and then returns the result.

Model
This is an abstract definition of the response that describes each of the properties that might be returned.
Anomaly Detection Response {
result ( array[Result] , optional)
}
Anomaly Detection Response:Result {
row ( number , optional) The number of the row inside the data.
row_anomaly_score ( number , optional) The anomaly score of the row.
anomalies ( array[Anomalies] , optional) The most important anomalies in the row.
}
Anomaly Detection Response:Result:Anomalies {
type ( enum<Type> , optional) Whether the anomaly is a single value, or a combination of two values.
columns ( array[object] , optional) List of anomaly related columns
anomaly_score ( number , optional) The anomaly score of the above values combination.
}
enum<Anomaly Detection Response:Result:Anomalies:Type> {
'single' , 'combination'
}
Model Schema
This is a JSON schema that describes the syntax of the response. See json-schema.org for a complete reference.
{
    "type": "object",
    "properties": {
        "result": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "row": {
                        "type": "number"
                    },
                    "row_anomaly_score": {
                        "type": "number"
                    },
                    "anomalies": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "type": {
                                    "enum": [
                                        "single",
                                        "combination"
                                    ]
                                },
                                "columns": {
                                    "type": "array",
                                    "items": {
                                        "column": {
                                            "type": "string"
                                        },
                                        "value": {
                                            "type": "string"
                                        }
                                    }
                                },
                                "anomaly_score": {
                                    "type": "number"
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
https://api.havenondemand.com/1/api/sync/anomalydetection/v1
/api/api-example/1/api/sync/anomalydetection/v1
Examples
See this API for yourself - select one of our examples below.
Find Anomalies
Find anomalous data in CSV file
Parameters
Required
Select file Change Remove
Optional
Name Type Value
max_results
number
columns
array
Add another value


ASync – Response An error occurred making the API request
Response Code:
Response Body

	
Making API Request…
Checking result of job

To try this API with your own data and use it in your own applications, you need an API Key. You can create an API Key from your account page - API Keys.

Output Refresh An error occurred making the API request View Input
Rendered RawHtml Response
Result Display
Response Code:
Response Body:

			
Make this call with curl


If you would like to provide us with more information then please use the box below:

We will use your submission to help improve our product.