Train Prediction

Trains a prediction model.

Haven OnDemand contains Predictive Analytics APIs to classify, predict, and analyze data. For more information about Predictive Analytics, see Introduction to Predictive Analytics.

The Train Prediction API creates a prediction model according to a training data set that you provide.

You can create the following types of prediction model:

  • Classification model. A model to predict categorical fields.
  • Regression model. A model to predict numerical values.

The API runs the training data set with multiple prediction algorithms, and different sets of parameters for each algorithm. It tests and compares all the prediction models that it creates, and automatically selects the algorithm that performs best from the results. You can select the measurement criterion to determine which algorithm is the best performing.

The Train Predictor API publishes this model as a prediction model, with a name that you specify.

Quick Start

The Train Prediction API accepts training data in CSV or JSON format. To create an accurate prediction model, provide as many relevant data features and as much training data as possible. For more information about the training data format, see Introduction to Predictive Analytics.

Note: Due to the runtime of this API, it is available only as an asynchronous version. See Get the Results of an Asynchronous Request.

To run the Train Prediction API, you must provide training data, as JSON, a file, a URL, or an object store reference. You must also set the following parameters:

  • prediction_field. The name of the field in your training data that contains the field that you want to predict. For example, if you want to predict the species of flower according to various features, you might use the field that contains the species as the prediction_field.

    Note: To train a classification type model, the prediction_field that you specify must contain categorical values. To successfully train a prediction model, the number of different categories in this field must be equal to or less than the square root of the number of lines in your dataset.

  • model_name. The name that you want to use for the prediction model that you want to create.

By default, the API trains a classification type model. You can set the predictor_type parameter to regression to train a regression model instead.

For example:

/1/api/async/trainpredictor/v2?model_name=my_model&prediction_field=predictionField&file= or url= or reference=

Note: API input is subject to a maximum size quota. If you upload text or a file that is too large, the API returns an error. For more information, see Rate Limiting, Quotas, Data Expiry, and Maximums.

The Train Prediction API creates a prediction model based on an input file. This example creates a prediction model named my_model, which contains the prediction model. You can use this model name later, in the Predict API.

The API response contains the model name that it creates, as well as the status of the model. For example:

{
	"model": "my_model",
	"status": "Ready"
}

When the API trains the prediction model, it tests the following algorithms to find the algorithm and parameter set that produces the best results for your data.

For classification models:

  • Decision Tree
  • Logistic Regression
  • Random Forest
  • SVM

For regression models:

  • Decision Tree Regression
  • Random Forest Regression
  • Linear Regression
  • Lasso Regression
  • Ridge Regression

To determine which results are "best", the API uses the measurement criterion you set in the selection_strategy parameter.

For classification models:

  • accuracy, the default value.
  • precision
  • recall
  • f_measure

For regression models:

  • mean_square_error, the default value.
  • root_mean_square_error
  • mean_absolute_error
  • r_square

You can retrieve the details of the algorithm and parameters that the final prediction model uses by using the Get Prediction Model Details API.

Get the Results of an Asynchronous Request

The asynchronous mode returns a job-id, which you can then use to extract your results. There are two methods for this:

  • Use /1/job/status/ to get the status of the job, including results if the job is finished.
  • Use /1/job/result/, which waits until the job has finished and then returns the result.

    Note: Because /result has to wait for the job to finish before it can return a response, using it for longer operations such as processing a large video file can result in an HTTP request timeout response. The /result method returns a response either when the result is available, or after 120 seconds, whichever is sooner. If the job is not complete after 120 seconds, the /result method returns a code 7010 (job result request timeout) response. This means that your asynchronous job is still in progress. To avoid the timeout, use /status instead.

Asynchronous
https://api.havenondemand.com/1/api/async/trainpredictor/v2

This API only supports Asynchronous invocation.

Authentication

This API requires an authentication token to be supplied in the following parameter:

Parameter Description
apikey The API key to use to authenticate the API request.
Parameters

This API accepts the following parameters:

Required
Name Type Description
file
binary A file that contains the JSON or CSV data to use to train the prediction model.
json
json A JSON object that contains the data to use to train prediction model.
reference
string A Haven OnDemand reference obtained from either the Expand Container or Store Object API. The corresponding JSON document is passed to the API.
url
string A publicly accessible HTTP URL from which the JSON document can be retrieved.
prediction_field
string The name of the field in your data that contains the categories that you want to predict. The prediction model is trained to fill in categories where values are missing.
model_name
string The name of the prediction model to create. This name identifies the model and must be unique.
Optional
Name Type Description
predictor_type
enum The type of model to train. Default value: classification.
fields
json The list of fields in the dataset. This parameter is required if the dataset is in JSON format.
selection_strategy
enum Enables the user to select the measurement criterion to use, to determine which algorithm yields best results. For classification models, default value:<strong>accuracy</strong>. For regression models, default value: <strong>mean_square_error</strong>.
Enumeration Types

This API's parameters use the enumerations described below:

predictor_type
The type of model to train.
classification Classification predictor for categorical values.
Creates a classification model to predict the values for categorical fields.
regression Regression predictor for numeric values.
Creates a regression model to predict the values for numeric fields.
selection_strategy
Enables the user to select the measurement criterion to use, to determine which algorithm yields best results. For classification models, default value:<strong>accuracy</strong>. For regression models, default value: <strong>mean_square_error</strong>.
accuracy Accuracy.
A statistical measure of how well the classification model performs on the test data. Valid only for classification. For more information see: https://en.wikipedia.org/wiki/Accuracy_and_precision
precision Precise.
A measure that indicates the proportion of true positives out of all the results that the model identifies as positive. Valid only for classification. For more information see: https://en.wikipedia.org/wiki/Precision_and_recall#Definition_.28classification_context.29
recall Recall.
A measure that indicates the proportion of positives that are correctly identified. Valid only for classification. For more information see: https://en.wikipedia.org/wiki/Precision_and_recall#Definition_.28classification_context.29
f_measure F-Measure.
A measure of the accuracy of the model, which uses the harmonic mean of the precision and recall. Valid only for classification. For more information see: https://en.wikipedia.org/wiki/F1_score
mean_square_error Mean-Square-Error.
The mean square error of the regression model on the test data. Valid only for regression. For more information see: https://en.wikipedia.org/wiki/Mean_squared_error
root_mean_square_error Root-Mean-Square-Error.
The root mean square error of the regression model on the test data. Valid only for regression. For more information see: https://en.wikipedia.org/wiki/Root-mean-square_deviation
mean_absolute_error Mean-Absolute-Error.
The mean absolute error of the regression model on the test data. Valid only for regression. For more information see: https://en.wikipedia.org/wiki/Mean_absolute_error
r_square R-Square.
The number that indicates the proportion of the variance in the dependent variable that is predictable from the independent variable. Valid only for regression. For more information see: https://en.wikipedia.org/wiki/Coefficient_of_determination

This API returns a JSON response that is described by the model below. This single model is presented both as an easy to read abstract definition and as the formal JSON schema.

Asynchronous Use

Additional requests are required to get the result if this API is invoked asynchronously.

You can use /1/job/status/<job-id> to get the status of the job, including results if the job is finished.

You can also use /1/job/result/<job-id>, which waits until the job has finished and then returns the result.

Model
This is an abstract definition of the response that describes each of the properties that might be returned.
Train Prediction Response {
message ( string ) The status of the model.
model ( string ) The name of the created model.
}
Model Schema
This is a JSON schema that describes the syntax of the response. See json-schema.org for a complete reference.
{
    "properties": {
        "message": {
            "type": "string"
        },
        "model": {
            "type": "string"
        }
    },
    "required": [
        "model",
        "message"
    ],
    "type": "object"
}
https://api.havenondemand.com/1/api/async/trainpredictor/v2
/api/api-example/1/api/async/trainpredictor/v2
Examples
See this API for yourself - select one of our examples below.
CSV Format
Train a classification model with data in CSV format.
JSON Format
Train a classification model with data in JSON format.
CSV Format
Train a regression model with data in CSV format.
JSON Format
Train a regression model with data in JSON format.
Parameters
Required
Select file Change Remove
Name Type Value
prediction_field
string
model_name
string
Optional
Name Type Value
predictor_type
enum
fields
json
selection_strategy
enum

Note: This API will be invoked asynchronously.



ASync – Response An error occurred making the API request
Response Code:
Response Body

	
Making API Request…
Checking result of job

To try this API with your own data and use it in your own applications, you need an API Key. You can create an API Key from your account page - API Keys.

Output Refresh An error occurred making the API request View Input
Rendered RawHtml Response
Result Display
Response Code:
Response Body:

			
Make this call with curl

Version 2 (2016-07-25)

This page outlines the changes to the Train Prediction API from the previous version.

  • Prediction services are now called prediction models. The service_name parameter has been renamed to model_name. Similarly, service is now named model in the API response.
  • The accepted prediction data format has changed. Fields in the data must now have the type NUMERIC or STRING. For JSON input data, you must set the fields parameter to specify the details of the fields in your input data.
  • Prediction models that you create with the version 2 API are available in the List Resourses API.
  • You can now train a regression model for prediction.


If you would like to provide us with more information then please use the box below:

We will use your submission to help improve our product.