There was something wrong with fetching your inbox. Please contact support here.
Trend Analysis extracts the most important trends from a dataset. It is particularly suited for business intelligence applications, where you want to understand changes between two points in time. Trend Analysis returns a list of the most important trends, and groups the trends into collections of related trends.
You might have a dataset that describes sales for a French supermarket chain in 2014 and 2015. You can use Trend Analysis to find the most significant trends in the data. Maybe sales of cottage cheese in Lyon were significantly lower in 2015 compared to 2014. Maybe shoppers in Paris were responsible for a much greater proportion of all fruit sales in 2015 compared to 2014.
Trend Analysis first identifies the categories that exist in your dataset, and identifies the players within those categories. In the previous example, "cottage cheese" would be a player in the category "Lyon", but "Lyon" would also be a "player" in the category "cottage cheese". Trend Analysis then aggregates data about your categories. For example, your input data might come from a database that contains raw data describing sales or transactions. Haven OnDemand can aggregate data about each category, for example the number of sales or the value of those sales.
The API then runs trend analysis and returns the most significant categorical trends in your dataset.
The Trend Analysis API requires an input file in comma-separated values (CSV) format.
The file must contain text in rows. The first row of the file must be a header that provides column names. The second row must describe the data type of each column (
NUMERIC). The following rows must be data rows. On each row, the values must be separated by commas.
The API expects the data to include the following columns:
|Column Type||Number of columns||Description|
|Group Column||Exactly one.||The group column contains values that are used to split the dataset into two groups that you want to compare. For example, in our sample data the group column could be named "Year" and you could compare the groups 2014 and 2015.|
|Attribute column||Zero or more.||These columns specify information about a record in a specific group, for example "product name", "product type", "country", "city", and so on. If you don't specify any attribute columns, the API uses all columns in the table as attribute columns, with the exception of the group column and the aggregation column.|
|Aggregation column||Zero or one.||Aggregation columns contain values that you want to aggregate, such as "units sold" or "sale/transaction value". The values in this column are used to identify trends and must be of the type
Trend Analysis performs the following operations:
Validate the request. Your request must contain:
Validate the data.
Split the data. The data is split into your selected groups, using the values in the specified group column.
Find all attribute combinations. Trend Analysis inspects the specified attribute columns, and finds all possible combinations of values for those columns. Each unique combination of values is considered a player to be analyzed. Using the previous example, "cottage cheese" would be a player in the category "Lyon", and "Paris" would be a player in the category "fruits".
Compute trends. Taking the first group, Trend Analysis aggregates records for each player in each category, and calculates the percentage (share) each player has in the category. This is then repeated for the second group. The trends are the respective differences between the second-group percentages and the first-group percentages.
Compute scores. For each player in each category, Trend Analysis computes a trend score. The trend score is based on the percentage change of the player from the first group to the second, but is also influenced by the category size and category activity, which reduces the chance of returning big trends in insignificant categories.
Cluster the trends into collections. Similar trends are clustered into collections.
Return the top trend collections. The API returns the collections of trends with the highest scores. If you specified a list of terms that you are interested in, the API only returns trends that are associated to one or more of those terms.
Consider the following sample data, which relates to sales by a Canadian store (truncated - the complete data file is used on the Trend Analysis API page):
Year,Sales,Province,Customer Segment,Product Sub-Category 2011,261.54,Nunavut,Small Business,Eldon Base for stackable storage shelf; platinum 2012,10123.02,Barry French,Consumer,1.7 Cubic Foot Compact "Cube" Office Refrigerators 2012,244.57,Barry French,Consumer,Cardinal Slant-D® Ring Binder; Heavy Gauge Vinyl
The sample data contains the following columns:
The column "Year" can serve as the group column, so that we can compare sales between 2011 and 2012. The column "Sales" can serve as the aggregation column, so that we can see the trend in the value of sales. The remainder of the columns are attribute columns.
The API might return the following trends:
Each of these three top trends are associated to additional similar trends in their corresponding collections. For instance, the second trend in the second collection is: "Customer Segment: Corporate" grew from 35.8% to 50.6% in "Province: British Columbia".
To run trend analysis, send a request to the API with the following parameters: