ThingWorx Analytics Functionality > Time Series Predictions
  
Time Series Predictions
A time series contains a sequence of data points observed at specific intervals over time. A time series prediction uses a model to predict future values based on previously observed values. The natural temporal order of time series data makes analysis of time series different from cross-sectional or spatial data analyses, neither of which depends on a time component.
Time series predictions can be useful in a variety of settings, from processing signal data streaming from a sensor at an industrial site, to monitoring trends in a financial market, or maintaining inventory in a commercial setting. In all of these scenarios, recent data can be used to inform predictions about future goal values.
How Does Time Series Data Differ from Non-Time Series Data?
Non-time series data can originate from many subjects and contains values that either do not vary over time or are measured at only a single point in time. Non–time series data can include data about many features, is not order-dependent, and is ready for analytics as is. Examples:
Given a specific set of characteristics, measured at a single point in time, predict a patient’s chance for contracting heart disease.
For a used car with specific features, predict how much you can expect to sell it for.
By contrast, time series data originates from one or more subjects over a period of time. It is order-dependent and the data needs to be transformed before it can be used by machine learning algorithms. Examples:
With knowledge of gross revenue and recent stock prices, predict next week’s stock price.
With knowledge of average global temperature over the last 100 years, predict the average temperature 20 years from now.
Time series data also differs somewhat from non-time series data in the training of predictive models. A lookbackSize parameter is required for training time series models. The lookbackSize defines the number of recent data points to be used when predicting each future value in the time series. Any value greater than 1 is acceptable but generally, a power of 2 is used (2, 4, 8, 16). Larger values affect performance because more records are used for predictions. When a value of 0 is specified, auto-windowing will take place and ThingWorx Analytics will try a set of lookback sizes (2, 4, 8, and 16) in order to select the size that produces the most accurate results.
Training a time series predictive model also requires a lookahead parameter that indicates the number of time steps ahead to predict. In most cases, the lookahead defaults to 1 (it defaults to 0 if goal history is not in use). A lookahead of 1 means the model can be used to predict one time step ahead. To predict outcomes further ahead, enter any value greater than 1.
Time Series Predictions
For non-time series models, once a predictive model is generated, scoring on new records can be performed independently of other records. In contrast, when a prediction model is based on time series data, scoring new records continues to depend on a lookback window of recent records.
In the example shown below, the lookback window is defined as 3 and the default lookahead of 1 is in use. Therefore, predicting the goal at any time t will depend on the feature values and goal values at times t – 1, t – 2, and t – 3.
Because basic machine learning algorithms are not time-aware, ThingWorx Analytics uses history pivoting to transform time series data into non-time series data that can be trained using the same basic algorithms as non-time series data. During this transformation, the data is grouped and sorted, by entity and time, and any necessary interpolations take place to produce Analytics-ready data. The table below shows the history-pivoted data from both sets of time series predictions shown above.
Goal History and Time Series Predictions
In some time series scenarios, it’s not possible to know the value of the goal variable during predictive scoring, either because the value of the goal feature is not observable during scoring or because measuring it is physically or financially difficult. ThingWorx Analytics can handle this variation of time series predictions by turning off the useGoalHistory parameter during training. Disabling goal history allows you to train a model on time series data when the goal variable is not provided as input during scoring. The goal value is still required during training, but during scoring, these models can be used to predict the value of a time series goal without providing recent values for the goal field.
In the sample below, where the lookback window is defined as 3 and the default lookahead of 0 is in use, predicting the goal at any time t will depend on the feature values at times t – 0, t – 1, and t – 2.
The table below shows the history-pivoted data from both sets of time series predictions shown above. Note that during scoring, the goal column (as shown above) is not available.
The following scenarios illustrate some of the common uses for a time series without goal history:
Predicting Time to Failure – The brake pads on a truck need to be maintained such that they can be replaced before they fail. It’s not possible to know the date that any given set of brake pads will fail. However, sensors allow other aspects of the truck’s operation to be monitored. Using machine learning, a model can be generated that will help predict the time to failure of brake pads on a given truck, using data from sensors on the truck.
Creating a Virtual Sensor – A very expensive sensor is added to a pump to measure its efficiency in a controlled environment. The sensor captures running conditions on the pump and captures readings for the pump's efficiency. That pump also has several inexpensive sensors that are also collecting data. A model can be trained to emulate this expensive sensor, as a virtual sensor, by predicting its value from the inexpensive sensor values. From there, many pumps could be deployed with only the inexpensive sensors. By using the model created on the first pump, each pump could have a virtual version of the expensive sensor without the cost of it being deployed with each pump