ThingWorx Analytics Functionality > Learners and Ensemble Techniques
  
Learners and Ensemble Techniques
What Type of Learners Are Available?
The chart below shows which learner techniques are available for selection when training a model. Learner techniques can be used individually or in an ensemble. Each learner technique is appropriate for specific machine learning scenarios.
Learner Technique
Description
Model Complexity
Linear Regression
Predicts a continuous goal based and tries to assess the weight of each input feature in reducing the difference between actual and predicted values. This technique is good for predicting values, quantities, counts, and other continuous variable outcomes. It’s not well suited for Boolean or categorical outcomes.
Example: Predict the number of microns a drill will be off-target based on features such as hours of operation, pump pressure, temperature, drill bit changes.
Simple
Logistic Regression
Mainly a classification algorithm that predicts the probability that a goal belongs to a specific category. Tries to assess the weight of each input feature in reducing the number of misclassifications. This technique is best suited to predicting categorical and Boolean outcomes.
Example: Predict a pump failure based on features such as hours of operation, pump pressure, pump type.
Simple
Decision Tree
Builds a tree model that predicts a class or value for the goal. Tries to find the best way to partition the data, according to the input features, to improve classification accuracy. This technique is easy to interpret, can handle both numerical and categorical data, automatically learns feature interactions, and can train quickly even on large datasets. Tree models are most appropriate when a model needs to explain rules such as “If this AND that THEN outcome.”
Simple
Support Vector Machines (SVM)
Can be applied only to a Boolean goal. Builds a model that predicts the classification of a goal into one of two categories. Tries to classify data in a way that maps the largest separation between the categories. Currently this technique performs linear classifications only.
An SVM learner can only be used with a Soloist or a Majority Vote ensemble technique.
Simple
Neural Network
Uses a set of interconnected nodes and layers to predicts a class or value for a goal. Tries to create a network that learns a set of adaptive weights to predict outcomes accurately based on the input features. This technique can handle both numerical and categorical data. It can be used to represent various linear or non-linear functions and it can train quickly even on large datasets.
Moderate
Random Forest
Generates multiple, random decision trees and outputs predictions by computing the average from all the trees. This technique provides an unbiased estimate of model accuracy and can handle both large datasets and large numbers of features.
High
Gradient Boost
Generates multiple, sequential decisions trees, where each tree is constructed using information from the previous tree. The output is a final boosted model which is the sequential culmination of the iterative process. This technique sometimes performs better than other decision tree techniques but multiple trials may be necessary to decide on the best number of trees to include. Training can be slow and noisy data can lead to overfitting.
High
What Ensemble Techniques Are Available?
During predictive model building, learners can be used individually or in combinations. A combination of learners is known as an ensemble. Each ensemble technique handles a set of learners in a different way to achieve results and minimize prediction errors.
* 
When using a machine learning ensemble, model complexity can increase.
The following options are available for working with an ensemble of learners:
Average – Each learner scores each record separately and the scores are averaged.
Best – Only the learner that performed best during training is used for scoring.
Elite Average – The best learners during training are selected as elite learners, then they each score records separately and their scores are averaged.
Majority Vote – For Boolean goals only. Each learner scores each record separately and the scores are tallied. The score with the largest tally is selected.
Soloist – Includes only a single learner.