Data Processing

Installation and Upgrade > ThingWorx Sizing Guide > ThingWorx Hardware Sizing Steps > Data Processing

Data Processing

In between ingestion and visualization, the ThingWorx Platform also needs enough system resources to execute any business logic and data transformation needs of the application.

This section will explain the concepts for the more significant areas that can affect data processing requirements. Data processing is very dependent on the specific business use cases, making a standardized calculation less useful.

Once your ingestion and visualization estimates are in place, application load or stress testing is an important step before going live in production, and could lead you to select a different size system than the baseline if your application requires more resources for complex business logic or event / alert processing.

Subscriptions and Events

Subscriptions to timers and property change events often constitutes most of the business logic in a ThingWorx application. These subscriptions use memory from Event Processing subsystems, which are systems not used during ingestion and device connectivity.

Once the system is sized for stable ingestion and device connectivity, load testing with business logic in place is an important step to make sure the system is properly sized to support production use.

File Transfer and Management

Transferring files to and from edge devices is a common requirement of ThingWorx application use cases:

• Deploying software updates

• Troubleshooting or accessing log files

• Receiving images, PDFs, or other files to check functionality and performance

If you expect many simultaneous file transfers and/or large files to be transferred, then additional platform memory may be needed to handle this load.

High Availability Requirements for ThingWorx

When deployed in a high availability configuration (as described in the ThingWorx High Availabilitysection), a high availability relational database configuration is often used.

While this configuration can prevent downtime due to a system failure, it can also cause a slight reduction in write performance.

To address this, if the expected WPS from the data ingestion calculation is close to the threshold for the desired configuration consider the next-size system configuration.

Tunnels and Third Party Tools

Many business use cases take advantage of tunneling sessions to edge devices, which require WebSocket connections to be open and maintained throughout the sessions.

Similarly, third-party tools (such as SCADA, ERP, or other integrated back-office applications) may also make use of WebSocket connections to the platform.

If those tools are accessing ThingWorx using REST API calls, this will increase the number of HTTP requests per second calculated previously for data visualization.

Each WebSocket connection requires memory on the platform, so consider increasing the total memory allocation to the platform (or the size/class of the VM being selected), if many concurrent tunneling sessions or REST API requests are expected.

Data Retention, Aggregation, and Archival

Historical data is often needed both for data processing (how far back does my business logic need to look?) and for data visualization (how far back do users need to look?).

The amount of historical data to be stored on the platform impacts both database and platform system sizing. Larger data sources (streams, data tables, and value streams) will require longer transactions to query from the database. These long transactions can also cause the stream and value stream queues to back up and utilize additional memory.

Aggregating data so that no one data source contains too many entries is highly recommended (see this Best Practice Guide for details). If a large amount of historical data is needed, or large volumes of ingested data must be kept for a lengthy time period, consider a more robust database solution. See details about the database options below:

• H2 - A small, in-memory database ideally suited for development and smaller systems; does not scale well beyond small implementations.

H2 is not recommended for production ThingWorx systems.

• Microsoft SQL Server – A robust, mature relational database that can be used to manage ThingWorx data models, streams, and value streams in development or production systems. It can be scaled for all small, medium, and large implementations.

• PostgreSQL – Similar to SQL Server, PostgreSQL is a relational database that can be used in production environments of all sizes. The choice between SQL Server and PostgreSQL is often dictated by existing IT experience, either with databases or operating systems.

• InfluxDB – A time-series database ideally suited for high-scale ingestion of ThingWorx streams and value streams in development and production systems, alongside a relational database (Microsoft SQL Server or PostgreSQL) managing the ThingWorx data model.

ThingWorx can be used with either the open-source, single server version of InfluxDB, or with an InfluxDB Enterprise cluster for high availability and increased performance. The open-source version was used for the tests in this guide.

Was this helpful?