The Certainty Parameter

When implementing anomaly detection, there are a number of factors to consider. At its most basic, ThingWatcher functionality compares two sets of data, a validation set (collected during the Calibrating phase) and a test set (data streaming from a remote device). ThingWatcher tries to determine the likelihood that the distribution of values in the test set is from the same distribution of values contained in the validation set. The accuracy of the model plays a large role in this determination, but so does the Certainty parameter, used for the statistical analysis of the two data sets.

Certainty is a tunable parameter that you set for each ThingWatcher instance when you create a new anomaly alert (acceptable values must be greater than 50% and less than 100%). Certainty defines a percentage threshold used by the ThingWatcher to identify whether the comparison between the validation and test data sets shows anomalous values. For instance, a certainty of 99.99 means that you want to be 99.99% certain before indicating an anomaly. Very high certainty values will make the ThingWatcher less likely to initiate a false positive, while lower certainty values will lead to fewer false negatives.

For example, the figure below shows two sets of compared distributions. Each graph displays an expected distribution, built from the validation data collected during the Calibrating state, and a test distribution, representing new incoming data. In the graph on the left, the two distributions are obviously different and will produce a very low probability that the distributions are equivalent. The graph on the right shows two very similar distributions. The probability that these distributions are equivalent is greatly increased. However, the similarity of the distributions does not imply that they are the same. Depending on the Certainty parameter, ThingWatcher could still indicate an anomaly between the distributions on the right. A very high value for Certainty (eg: 99.9999) will not allow the comparison on the right to trigger an anomaly.

For a ThingWatcher with very high certainty values, the anomaly indicator will be weighted to reduce false positives at the expense of true positives. Lower certainties will increase the rate of true positives, but will also increase the number of false alarms. The decision about whether to use higher or lower certainty values depends on your environment. If the risks associated with a failure are great for the device you are monitoring (such as a medical device), you will want to set lower certainty values so that ThingWatcher will detect any possible anomalies. However, if a failure is not mission critical, but the costs are high for false positives (such as sending technicians for a false alarm), you will want to set higher certainty values so that ThingWatcher will detect anomalies only when they are highly likely.