ThingWorx Analytics Microservice Architecture
|
In this release, the classic ThingWorx Analytics server monolith has been divided into a series of independent microservices. This new structure groups services around specific elements of functionality (data, training, results). The new architecture provides a more robust and stable deployment. Services are executed by job type so that multiple microservices can run in parallel but problems in one microservice do not interrupt performance in others.
This rearchitecture of the ThingWorx Analytics server maintains feature parity with previous releases, with the following exceptions:
• Prescriptive scoring as an asynchronous (batch) process is not available. It will be available in a subsequent release.
• Distributed server installation is not available. Customers who wish to use a distributed installation can continue to use a pre-8.1 release. This functionality will return in a subsequent release.
• The use of DataConnect is not supported. Customers who need DataConnect functionality can continue to use a pre-8.1 release.
|
Integration with ThingWorx Foundation
|
With this release, ThingWorx Analytics functionality becomes native to the ThingWorx Foundation platform. This restructuring includes an edge agent that acts as an integration point between the ThingWorx Foundation and ThingWorx Analytics servers.
When both servers are running, the edge agent automatically connects to the ThingWorx server and instantiates an AnalyticsServerThing. All of the connected Analytics microservices are represented as Things under the AnalyticsServerThing. The configuration of the AnalyticsServerThing takes place during server installation.
This integration includes the following changes in functionality:
• ThingWorx Analytics APIs are accessible only through the ThingWorx API layer and customers can take advantage of ThingWorx Foundation security and authentication mechanisms.
• Microservice functionality can be accessed via the edge agent, as part of a mashup, or via REST calls.
• A separate TomCat installation is no longer necessary for the Analytics server.
• ThingWorx Foundation data shape objects can now be leveraged to store and handle data. Two commonly-used data shapes in ThingWorx Analytics include these infotables: AnalyticsDatasetRef and AnalyticsDatasetMetadata.
• ThingWorx Foundation file repositories can now be used to upload/store data files and save/store results.
• The amount of configuration required during installation has been reduced.
|
Additional new functionality in ThingWorx Analytics
|
• Changes to Training – Training no longer performs iterative sampling. Instead, training processes the entire dataset (except any validation holdout). If a validation holdout is specified, the training job initiates a separate validation job. Two sets of results are returned, Model results and Validation results.
• Categorical and Ordinal Predictions – This release provides support for training and scoring with a categorical or an ordinal goal field. A categorical goal is a variable that contains discreet, unordered values. An ordinal goal is a variable that contains discreet values that have an inherent ordering to them.
When training and scoring with a categorical goal, a model is built and predictions scored for every value category in the variable. For this reason, using a categorical goal with a large number of categories (such as zip code) can greatly affect performance time for both model training and scoring. Limit use to goals with a smaller set of categorical values.
During scoring, customers can choose to limit a categorical goal to the top N values returned or they can specify a list of values to see scores for.
• Virtual Sensors – ThingWorx Analytics virtual sensor functionality provides the capability to train a model on time series data when the goal variable is not available as input during scoring. In contrast to other time series models, which use recent values to predict the future value of a goal variable, a virtual sensor time series model predicts the current value of an unknown or unobservable goal. These models can be useful in scenarios such as:
◦ Predicting time to failure for predictive maintenance purposes
◦ Emulating an expensive sensor in a model that can be widely deployed without multiple expensive sensors
For more information about Time Series and Virtual Sensors, see Time Series Predictions.
• OpType – Dataset metadata now requires both a dataType and an opType to better describe each field. The addition of the opType allows the format of a data field to be separated from its operational characteristics. The dataType identifies the format of the data in a given field (Double, Integer, String, Boolean). The opType indicates how the data behaves (Continuous, Boolean, Ordinal, Categorical, Entity ID, Temporal, Informational).
• Is Static Flag – Fields that do not change over time for a given entity, but are still useful for predictions, can be marked as static. Marking a field as static reduces training time by removing redundant data points for fields that do not change.
• External PMML Models – Customers can upload their own models as long as they conform to PMML 4.3 specifications.
• Data Storage – Data is no longer stored in a database, but rather, is persisted directly to a file system which is optimized for ThingWorx Analytics. When data is uploaded, it’s converted and stored on the file system in a Parquet format. This change removes the PostgreSQL limitations on the number of columns in a dataset.
• Data Loading – Data loading and dataset creation has been streamlined significantly. CSV data and JSON metadata can be uploaded in a single job and the dataset is optimized automatically when it’s created. When data is appended to the dataset a new partition is added to the data. Optional services are available to re-optimize the data so that partitions are reorganized.
• List Pagination – Pagination can be used to limit the size of responses by segmenting them into pages.
• Tagging – Tags are string values that can be attached to any persisted entity. The tags can be leveraged for searching and filtering purposes.
• Multi-tenancy – An App ID/Key combination is no longer necessary to secure access to the server. In multi-tenancy scenarios, the ThingWorx Foundation permissions and visibility hierarchy can be used to control access.
|
ThingWorx Analytics Server Installation
|
The standalone server installation is now provided both via a Docker installer tool and a (non-Docker) Linux installer tool. Both installers bundle the microservices and other libraries into a series of files (container images for the Docker tool, JAR files for the non-Docker tool). When the installation process is launched, each tool unpacks and installs the necessary components.
The Docker installer supports both Windows and Linux environments. The non-Docker installer is currently geared for use only with Linux.
|
New Functionality in Analytics Builder
|
In this release, Analytics Builder has been updated to work with the new ThingWorx Analytics microservice architecture. In addition, this release includes some notable new functionality for Analytics Builder:
• Profiles List Page – A new list page has been added where all of the available profiles are listed in a table and can be viewed on the same page by clicking through the rows in the table. New profiles can be created from this page, without the need to first create a model. Generating profiles without a model provides more flexibility for adding and filtering data. Profiles can still also be generated from the model results page, which offers the convenience of using the same dataset, filters, and exclusions as the selected model.
• Signals List Page – A new list page has been added where all of the available signals are listed in a table and can be viewed on the same page by clicking through the rows in the table. New signals can be identified in the data from this page, without the need to first create a model. Identifying signals without a model provides more flexibility for adding and filtering data. Signals can still be identified from the model resuts page, which offers the convenience of using the same dataset, filters, and exclusions as the selected model.
• Model Results and Statistics – The metrics reported for model results have been updated to correspond to the specific model type. For a confusion matrix, Accuracy is reported. For a ROC curve, ROC and Pearson Correlation are reported. For a bubble plot, RMSE and Pearson Correlation are reported.
In addition, when a Boolean goal value is selected during model creation, the Results Graph offers the option to view either a ROC curve or a confusion matrix. Both graphs are available on separate tabs that you can toggle between
• List Pagination – Pagination has been added to the list tables. It limits the size of the tables by segmenting them into pages.
• Setup Configuration – The setup process for Analytics Builder has been simplified. The only setting that requires configuration is the connection to the AnaltyicsServer Thing in ThingWorx Foundation.
• Time Series Datasets – Time series predictive models can now be generated in Analytics Builder. As a result, some new parameters have been added to the Create New Model process to accommodate temporal data.
• Upload Thing removed – Now that ThingWorx Analytics is native functionaltiy to ThingWorx Foundation, the Upload Thing is no longer required in order to upload data in Analytics Builder.
|
New Functionality in Analytics Manager
|
With this release, Analytics Manager has the following new functionality:
• Analysis models can accept time series data as input, and after computation, provide either a single result or multiple results. While triggering an analysis event, you can now collect historical data by specifying the time for which you want to collect the data, or number of data slices that you want to collect. To collect a larger data set, you can specify the frequency at which you want to collect this trailing data.
• Analysis events can be configured to save the job history to the Analysis Jobs page. You can configure the event to save all the jobs, none of the jobs, or only failed jobs. If you want to modify the configuration after the analysis event has been created, you can do that as well.
|
Enhancement Description
|
Reference #
|
ThingWatcher: Accuracy enhancements made.
This release includes enhancements that improve anomaly detection accuracy. As a result of these changes:
• Data collection restart is no longer necessary after a long gap. Long gaps are handled such that no data has to be lost and all collected data can be used.
• The H2 database that installs with the Training Microservice is no longer stored as a persisted file, but rather in memory. This change is also reflected in the corresponding YML file.
|
TW-22411
|
ThingWorx Analytics Server: Update to PMML version 4.3
The Training microservice has been updated to generate models that are compliant with the latest PMML version. In addition, the scoring microservices, as well as Analytics Builder, ThingWatcher, and ThingPredictor, have all been updated to use JPMML libraries that support the latest PMML version.
Models generated with older versions of PMML will continue to be supported.
|
TW-13343
|
ThingWorx Analytics Server: Update clustering functionality
Clustering functionality is no longer goal-centered. It has been updated to perform true, unsupervised clustering, using a k-means clustering algorithm.
|
TW-14508
|
Analytics Manager: Updates to Analysis Replay functionality
With this release, Analytics Manager has the following enhancements:
• Analysis replay can be executed in the single execution mode. In this mode, all replay entries are executed as a single job.
• You can map results of analysis replay execution to the entities that are defined in the selected event result mapping.
|
-
|
ThingPredictor: Support for categorical models
With this release, you can now score the categorical models by using the ThingPredictor functionality.
|
—
|
Bug Fix Description
|
Reference #
|
Analytics Builder: Dataset unavailable after an additional data upload fails
Previously, when an attempt to upload additional data to a dataset failed, the ability to view the original dataset was suppressed and data had to be reloaded to resolve the issue. In the new architecture of this release, the Data microservice does not require the combination of entry date/time and ID as unique identifier fields. Now that this requirement is removed, the same data can be uploaded multiple times with no problems.
|
LYNX-379
|
Known Issue Description
|
Reference #
|
||
ThingWorx Analytics Server: Docker installation fails when using localhost URI for connection to ThingWorx
When ThingWorx server is installed on the local server, and localhost is entered for connection purposes during the ThingWorx Analytics Docker installation, the connection validates successfully but the ThingWorx Analytics Server Things are not created in ThingWorx. The Edge Microserver in the Docker container cannot accept localhost as the connection to ThingWorx. The following possible work arounds are available to resolve this issue:
• Use the native Linux standalone installer instead of the Docker installer. (preferred)
• Uninstall and reinstall using the ThingWorx external IP address instead of localhost during ThingWorx Analytics Server Docker installation.
• Update the analytics-server.properties and system-environment-variables.properties files with the correct ThingWorx address. Then restart the ThingWorx Analytics Server. For more detailed information about this option, see Article - CS273311.
|
TA-536
|
||
ThingWorx Analytics Server: Two-at-a-time Signals request does not return MI for all field combinations
When running a request for Signals, and specifying maxAtATime = 2, Mutual Information (MI), in relation to the goal, is not returned for all field combinations as expected. Instead, individual (one-at-a-time) MI scores are returned for all fields and then are filtered down to the top 25% most relevant fields. Two-at-a-time signals are calculated only for those fields. There is currently no way to modify this filtering behavior.
|
TA-729
|
||
ThingWorx Analytics Server: ROCPair calculations incorrect
The true positive rates (TPR) and false positive rates (FPR) are being calculated incorrectly in validation statistics. This error has been corrected in the 8.2 release so that:
• TPR represents the number of true positives divided by the total number of positives.
• FPR represents the number of false positives divided by the total number of negative.
However, for Boolean goals in the 8.1 release, these statistics can be manually recalculated from the validation metrics available in the confusion matrix. The matrix displays four quadrants:
• true positive (TP)
• false positive (FP)
• false negative (FN)
• true negative (TN)
To recalculate the TPR and FPR, use the following calculations:
• TPR = TP / (TP + FN)
• FPR = FP / (FP + TN)
|
TA-944
|
||
Analytics Builder: Retraining while the model grid is refreshing
On the Models list page, if you select a model and click the Retrain button, while the model grid is refreshing (the Refresh button is greyed out), the retraining dialog box will open but may select the first model in the list for retraining, instead of the model you selected. The issue seems to correspond to the speed of your connection. With a slower connection, refresh occurs less frequently and the content remains focused on the correct model. With a faster connection, the content refreshes properly and the row is refocused while the retrain dialog box is open.
To avoid the issue, only click Retrain when the grid is not refreshing (the Refresh button is green). To resolve the issue when it occurs, click Cancel and wait for the refresh to complete. Then reselect the row to be retrained and click Retrain. A solution for this scenario will be provided in a future release.
|
LYNX-400
|
||
Analytics Builder: Deleting a filter does not delete all jobs associated with it
When a dataset filter is deleted, all models, signals, profiles, and scoring jobs created using that filter should also be deleted. Currently, these items are not all being removed. New jobs cannot be initiated using the deleted filter, but if a user tries to retrain a model that was based on a deleted filter, the job will run against the entire dataset.
|
TW-24625
|
||
Analytics Builder: Retraining a time series model does not retain the default lookback window size
When a time series model is built using the default lookback window size of 0, the lookback size is not retained when the time series model is retrained. This issue can manifest itself in two scenarios:
• In a time series model with one feature, and a lookback size of 0, a retraining job will fail. It will not be able to run time series training because the default lookback size is not retained, and it will not be able to run standard training because there is not enough data.
• In a time series model with more than one feature, and a lookback size of 0, a retraining job will succeed, but only as a standard training model. It will not be able to run time series training because the default lookback size is not retained. However, it will have enough data to run standard training.
As a work-around, rerun the training as a new model, using the same configuration as the original training job.
|
TW-24654
|
||
Analytics Manager: Failure to deploy model by downloading file from a network URL for ThingPredictor
While creating an analysis model for ThingPredictor, if you specify a network location in the Model File URL field, ThingPredictor cannot download the file and deploy the model.
|
DT-8891
|
||
Analytics Manager: Deployment job status for docker deployer agent is shown as incorrect
When deployer agent deploys an agent on a machine, the corresponding job status is shown to be in the INPROCESS state. However, the deployment is completed.
|
DT-9912
TW-18763
|
||
Analytics Manager: An existing simulation event fails if it is triggered after server restart
If a simulation event is triggered after server restart, it fails. In this case, when the server restarts, you must restart the agent on which the simulation event is running.
|
—
|
||
Analytics Manager: Current analysis replay framework is not supported by earlier SDK versions
|
—
|
||
ThingWorx Analytics Server: Binned Metrics in Validation are returned in the wrong order
The ContinuousModelEvaluator calculates Pearson Correlation and RMSE in different bins of the validation dataset. However, when the results are returned, their order is swapped and the values are mapped to the wrong keys. As a result, Pearson Correlation values can exceed 1 and RMSE results can be returned as negative values. Because of this incorrect mapping, the normalized RMSE values are actually a normalization of the Pearson Correlation, which is not a relevant result. The incorrect mapping will be resolved in a future release.
|
TA-1628
|