The API requires one parameter to provide the input data. You can provide the data in the file, url, or reference parameters. For example:
/1/api/[async|sync]/anomalydetection/v1?file= or url= or reference=
Note: As this API typically processes larger files, Haven OnDemand recommends that you use the asynchronous version. Synchronous requests block until processing is complete, which might result in time outs. See Get the Results of an Asynchronous Request.
In each case, the file that you provide must be in CSV format, with the following structure:
- The first row must be the comma-separated list of column headers.
- The second row must be a list of the types for each of the columns. The following types are supported:
- The rest of the rows contain the data, with one record on each row, and column values that correspond to the specified headers and types.
The following very simple example contains a list of colored shapes:
color,shape STRING,STRING blue,triangle blue,triangle blue,triangle blue,square blue,square blue,square blue,pentagon red,triangle red,triangle red,triangle red,square yellow,triangle
You can submit this file to the API, and it returns a list of the anomalies it detects.
curl -X POST http://api.havenondemand.com/1/api/[async|sync]/anomalydetection/v1 --form "email@example.com"
The anomaly can be a single value, or a combination of values. For this example, pentagon and yellow are anomalous values, because each of them occurs only once. In addition, the combination red square occurs only once, so this is also anomalous, even though the values red and square occur in other records.
For more complicated data sets, you can optionally use the columns parameter to restrict the analysis to particular columns. You can use this option if you want to detect particular types of anomalies in the data.
You can also use the max_results parameter to only return up to a specific number of anomalous values or combinations.
The asynchronous mode returns a job-id, which you can then use to extract your results. There are two methods for this:
/1/job/status/to get the status of the job, including results if the job is finished.
/1/job/result/, which waits until the job has finished and then returns the result.
/resulthas to wait for the job to finish before it can return a response, using it for longer operations such as processing a large video file can result in an HTTP request timeout response. The
/resultmethod returns a response either when the result is available, or after 120 seconds, whichever is sooner. If the job is not complete after 120 seconds, the
/resultmethod returns a code 7010 (job result request timeout) response. This means that your asynchronous job is still in progress. To avoid the timeout, use