Working with Predictive Models

Analytics Builder > Working with Predictive Models

The Models List

The Analytics Builder provides a Models list page where all of the existing models for every available dataset are listed. The list includes the following information:

Column Name

Description

Model Name

The name of a specific model. The name must be unique.

Dataset Name

The name of the dataset the model is based on.

Goal

List of the variables in a dataset that can serve as goals, (also known as dependent variables).

Filter

A filter that was applied to the dataset when the model was created. All models must have at least the all_data filter applied.

State

The current status of the model. When you first create the model, the state is "Queued" and then "Running." When the new model is created and ready for use, the state changes to "Completed." If the job fails, the state will show "Failed."

Confidence Level %

A percentage used to calculate confidence intervals during the training process. When scoring a predictive model that includes confidence intervals, the confidence level indicates the likelihood that the actual score or prediction falls within a specified range. The default value is 80%. Multiple confidence models can be generated by including multiple Confidence Level values separated by commas.

ROC

Represents the area under the ROC (Receiver Operating Characteristics) curve. This statistic indicates how well the model separates positives and negatives. It’s a measurement of the model's ability to correctly classify predictions as true or false across various discrimination thresholds. Displays when either a ROC curve or a confusion matrix graph is shown.

If you retrain a model that was first generated in an earlier release of ThingWorx Analytics, the ROC value might be different in the retrained model. This change is the result of an enhancement, made in the 8.2 release, to the calculation of area under the ROC curve.

RMSE

Root Mean Square Error is a measurement of the difference between values predicted by the model and the values actually observed. A low RMSE value is better than a high value. For CONTINUOUS and ORDINAL goals.

RMSE Normalized

The normalization is done by dividing the RMSE value by the difference between the MAX and MIN values of the goal variable if such values are provided in the dataset metadata. If the MAX and MIN values are not set in the dataset metadata, the normalization is achieved by dividing the RMSE value by the standard deviation of the goal field over the training set.

The dataset metadata can be changed over time and the most recent version is used during model training. During evaluation of a predictive model, the effective dataset metadata from the model's training phase will be used. Customer needs to retrain the model to see the desired effect of change in the dataset metadata.

Pearson Correlation

A measure of the linear correlation (or dependence) between the predicted and actual results. Values can range between -1 (total negative correlation) and +1 (total positive correlation). For CONTINUOUS and ORDINAL goals.

MCC

Matthews Correlation Coefficient is a measurement of the quality of binary classifications. It takes into account the distribution of true and false positives in comparison with the distribution of true and false negatives. For BOOLEAN or CATEGORICAL goals.

Accuracy

A statistical measure, expressed in a percentage, that indicates how well the model predicted both successful and failed outcomes. For BOOLEAN and CATEGORICAL goals.

Number of Records in Validation Set

The number of records used to create the validation set. This value is based on the Validation Holdout % selected when the model was created.

Created Date

The date the model was created.

Model List Options

To create a new model, click New and follow the steps in Creating New Predictive Models.

To work with an existing model, select it in the models list and use one of the following options:

• Delete – Removes the selected model.

• View – Opens the Model Results page with a detailed view of the model statistics.

• Job Details – Opens the Model Job Details page and displays run time information for the model training and validation jobs.

• Retrain – Generates a new model using the same configuration as the selected model. A prompt will request a new model name. The old model is retained along with any associated scoring jobs.

• Publish – Publishes the completed model to Analytics Manager and opens it there as an analysis model

• Export – Downloads the completed model as a PMML file. Only one model can be downloaded at a time.

• Copy Job ID – Automatically copies the Model Job ID to the system clipboard.

• Prev/Next – Pages back or forward through the list of models when the list is too long to display all at once.

Filtering the Model List

To filter the list of models displayed in the chart, click Add Filter and select from the list of columns (example: "Goals"). Then identify the conditions for filtering the column (example: Contains "Pump"). Click Save (only datasets with "Pump" included in the Goals column will be displayed).

Was this helpful?