ThingWorx Model Definition and Composer > Analytics > Data Analysis Definition
Data Analysis Definition
Data Analysis Definition is not available in ThingWorx 8.3 and later.
The data analysis definition entity represents information that connects the ThingWorx model to sources that are used for performing analysis activities. The sources are ThingWorx DataConnect and ThingWorx Analytics.
As an example, if your model consists of a fleet of tractors and you want to generate prediction models based on characteristics such as tire pressure and tractor speed to determine when tire replacement will be necessary, you can create a data analysis definition entity that will create files that can be processed by ThingWorx DataConnect and analyzed by ThingWorx Analytics to create a predictive model.
In order to use the ThingWorx data analysis definition functionality (end-to-end), the following applications must be downloaded and installed. Installation and configuration for each application is not part of this documentation, but can be referenced in the links below.
ThingWorx DataConnect: an application that can automate the transformation and preparation of your source data for consumption by ThingWorx Analytics. Essentially, ThingWorx DataConnect needs a predefined data model and a set of mapped data files that it can process. It applies the necessary data transformations, according to the data model, to generate CSV data files ready for use by ThingWorx Analytics.
ThingWorx Analytics (previously ThingWorx Machine Learning): an embedded advanced analytics tool that automatically does the work of complex data science.
ThingWorx DataConnect and Analytics can be downloaded from the PTC Software Downloads site. Documentation is also available on the PTC Reference Documents site.
End-to-End Process Overview
1. The data analysis definition model is built in ThingWorx.
2. Data is extracted and mapped for each feature in the data analysis definition.
3. When the mapping is complete and executed, an API request is made to DataConnect to begin applying the specified transformations. In this request, the following information must be included:
The location of the data to be processed.
The data analysis definition file in JSON format.
ThingWorx DataConnect requires a URI to identify data analysis definition file locations, and this configuration is performed during DataConnect setup. DataConnect accepts the relative path by using symbolic links. For example, if a DAD file is physically stored in /ThingworxStorage/repository/SystemRepository/, then the path ThingWorx sends to DataConnect is: file:///repository/SystemRepository/
4. ThingWorx DataConnect transforms the data according to the model specified in the JSON file.
5. The transformed data is uploaded to ThingWorx Analytics.
6. Because the data transformation process is asynchronous and possibly long running, the status of the job can be viewed on the ThingWorx Status UI.
Configuring the Platform Subsystem
The location, application ID, and application key of ThingWorx DataConnect must be configured in the Platform Subsystem.
ThingWorx DataConnect must be configured before the settings below can be determined.
1. In Composer, select Subsystems and then click Platform Subsystem.
2. Configure the following settings:
ThingWorx DataConnect URL Path: the URL of DataConnect.
ThingWorx DataConnect Application ID
: the application ID of DataConnect.
ThingWorx DataConnect Application Key
Creating a Data Analysis Definition Entity and Mapping Features
This process assumes that certain aspects of your model have been created in ThingWorx, such as data shapes. A data shape is used to define the Features and Goal (objective) used by ThingWorx Analytics to perform its analysis. The data shape must have only one field defined as a key that is the Goal . The other field definitions of the data shape define the Features for ThingWorx Analytics.
1. In the Explorer, click Data Analysis Definitions and click the green plus sign.
2. In the General Information section, provide the following information:
Name: Name of the data analysis definition.
ThingWorx DataConnect cannot read entity types with special characters or spaces in the name.
Description: Optional information about the data analysis definition.
Project: Project(s) that are associated with the data analysis definition.
Tags: Tag(s) that are associated with the data analysis definition.
File Repository: The location the CSV files are written to. If a location is not specified, the default location is /ThingworxStorage/repository/SystemRepository in the location of the ThingWorx installation. For information on configuring the location of /ThingworxStorage, refer to Configuring a Custom File Repository Location.
Time Sampling Interval: The sampling time for the data analysis definition.
ThingWorx DataConnect takes the data (CSV) files from ThingWorx and combines them into a single dense table which is then sent to ThingWorx Analytics for analysis. To create this table, ThingWorx DataConnect converts ThingWorx time series data (which may not be periodic) into a periodic form based on the Time Sampling Interval. For example, there is a Feature named TirePressure with values taken at 1:00 AM, 1:06 AM, 1:12 AM, 1:23 AM and 1:29 AM. If the Time Sampling Interval is five minutes, start date is 1:00 AM, and end date is 1:30 AM, then the data will be transformed so there are entries at 1:00, 1:05, 1:10, 1:15, 1:20, 1:25, and 1:30. If there is an entry that corresponds to those times it is used; otherwise ThingWorx DataConnect uses interpolation to calculate a value for those times.
3. Click Features. Select a data shape to define the features of the data analysis definition.
For each feature that is selected, a CSV file will be generated. For more information about Feature rules, see section below.
Goal: A goal must have a datasource associated with it. Must be a boolean or numeric value.
Feature name: Correlates with the field definitions of the data shape.
Type: The base type of the feature.
Datasource: The datasource can be a service or a property.
For services, the entity and service category must be selected. For each service that is selected, a property must be defined as the returned data. For non-timeseries properties, the ID is returned.
For property mapping, the ID is always mapped to the name property.
For time series properties, the timestamp is returned.
Datasource Type: Service or property.
Transformation: Determines if and how the data is transformed when it is sent to the DataConnect application. The default is Identity, which means that each row in the CSV file is treated as a discrete data point. The possible options include:
Identity: Applies to all types.
Count: Applies to all types.
Min: Applies to numeric base type.
Max: Applies to numeric base type.
Standard Deviation: Applies to numeric base type.
Mean: Applies to numeric base type.
Sum: Applies to numeric base type.
ID: The selected property for service-type datasources.
Timestamp: Timestamp of the datasource service.
4. Click Execution Settings.
5. Select a time range for the analysis.
6. Click Execute. The status of the job can be viewed on the ThingWorx Status UI. See image below. Status codes are as follows:
Processing: Job is processing in ThingWorx.
Queued: Job has been submitted to ThingWorx Analytics and is waiting to be processed.
Running: Job is being processed in ThingWorx Analytics.
Job is complete.
Failed: Job failed in ThingWorx Analytics.
Creating Aliases and Changing Service Input Parameter Default Values
For each feature service parameter, you can change the default values for each datasource. This will change the way the data is output in the CSV files.
1. Assign a service datasource and map the results for id and timestamp by clicking the circle in each column.
The primary key should have its value mapped to ID. In this example, id is the primary key.
2. Click Done.
3. Click the gear icon to the right of the service mapping to map the input parameters of the service.
4. Select the check boxes in the Default Value at Execution column for each input parameter that should be provided at each execution. In the Default Value column, assign aliases for the selected inputs. Parameters that have the Default Value at Execution checkbox selected and a Default Value defined will display on the Execution Settings page.
The same alias can be used for multiple parameters of the same base type.
Aliases are case sensitive. For example, startDate and StartDate are different.
5. To provide default values for property values that will be executed, but not displayed on the Execution Settings page, provide a Default Value name and do not check the Default Value at Execution checkbox.
6. On the Execution Settings page, the aliases display (in this example, MyStartDate and MyEndDate are set).
The Time range fields can be used to link aliases named startDate and endDate. In the example below, there are four aliases defined: startDate, endDate, MaxItems, and OldestFirst. The aliases correspond to the following entry fields on the Execution Settings page:
Output Data from ThingWorx
After a data analysis definition is executed, CSV files are created for each feature. Each CSV file contains the following rows/fields:
Timestamp (optional)
Feature Rules
Time series data is important for providing ThingWorx Analytics with a dense dataset. The following rules apply to features to understand what is considered a time series feature:
If a feature has a Timestamp column and an Identity Transformation type, it is a time series feature. This is true if the feature base type is Integer or Double, has a timestamp, and has a Identity Transformation type. The reason for this rule is to be able to interpolate missing values in ThingWorx DataConnect so that ThingWorx Analytics is provided with a dense dataset.
If a feature does not have time stamp, or have time stamps with a transformation type of anything other than Identity, it is not a time series feature. For example, if the transformation type is Mean, it will aggregate the data and will not be a timeseries feature.
If the dataset contains at least one time series feature, then it is a time series type. Otherwise, it is standard type.
Deleting Data Analysis Definition Jobs
If a data analysis definition is created, executed, and deleted from ThingWorx, it does not clear from ThingWorx DataConnect or ThingWorx Analytics. Additional steps must be taken to delete from ThingWorx DataConnect and Analytics, reference the applicable documentation for more information.
Error Message Troubleshooting
Error messages are logged in the application log in ThingWorx.
Error Message
Error in transformation/ uploading dataset
If the error message also includes the following: ArrayIndexOutOfBoundsException, this error occurs when the request includes a time sampling frequency that results in data aggregating with no valid rows in the dataset for ThingWorx Analytics to act upon. The job will fail.
If the error message also includes the following: IOException: File Not Found, this error occurs when the data zip file is missing from the location specified in the request. The job will fail.
If the error message also includes the following: NumberFormatException, this error occurs when the datatype for a feature is a number but the CSV file has an incorrect value. The job will fail.
If the error message also includes the following: Cannot perform transformation with no data in a file, this error occurs when CSV file is empty.
There must be exactly 1 feature marked as an object feature, 0 found
Error occurs when no goal/objective is specified in the request JSON. The job will fail.
The specified datatype for objective field is invalid. Accepted values are BOOLEAN or CONTINUOUS for standard data and CONTINUOUS timeseries data
Error occurs when a goal/objective is specified for a string datatype in the request JSON. The job will fail.
Please provide features[featureName] -> transformation value
Error occurs when no transformation is specified for a feature in the request JSON. The job will fail.
The specified transformation cannot be applied to STRING or BOOLEAN data types
Error occurs when an aggregate transformation (mean, sum, min, max, etc.) is specified for a feature in the request JSON. The job will fail.
MLD of append job does not match with original meta-data specification. Temporal -> time sampling interval does not match.
This error occurs when user tries to change the time sampling frequency once the dataset is created, as this can cause problems running analytics on for time series datasets.