ThingWorx High Availability
ThingWorx High Availability
Overview of ThingWorx High Availability
To reduce the duration of outages for critical Internet of Things (IoT) systems, you can configure ThingWorx to operate in an High Availability (HA) environment. This guide discusses the HA considerations required for a ThingWorx system and the components that comprise a ThingWorx HA deployment.
All HA deployments require additional resources when compared to a deployment designed only to meet functional and scale requirements. These additional resources are hardware based (such as servers, disks, load balancers, and so forth) and software based (such as synchronization services and load balancers). The additional resources are then configured to ensure there are no single points of failure within the HA deployment.
All HA deployments should be based from an SLA (Service Level Agreement) where you have analyzed their uptime requirements of the application for your deployment. For example, how many hours per month can the system be offline? Is this allowed downtime for system failures, application upgrades, or both? The number of additional resources required for an HA system depends on the SLA it is designed to achieve. In general, as the SLA grows so does the need for resources to fulfill it.
Definitions
high availability
A system or component that is continuously operational for a desirably long amount of time.
active/active
Instances of the same application that can simultaneously function.
active/passive
One instance of an application that can function at a time. Additional instances are available and able to take over service as needed.
Leader or master
The active server in an active/passive HA configuration where all traffic is routed.
standby
A server in an active/passive HA configuration that is waiting to take over service in case the current leader fails.
virtual IP address
An IP address that represents an application. Clients that use the virtual IP are usually routed to a load balancer that then directs the request to the server running the application.
load balancer
A device that receives network traffic and distributes it to the application ready to accept it. For an active/passive HA configuration, the load balancer directs traffic to the current leader. For an active/active HA configuration, the load balancer directs traffic to one of many applications.
failover
A backup operational mode in which the functions of a system component (such as a processor, server, network, or database) are assumed by secondary system components when the primary component becomes unavailable due to failure or scheduled down time.
ThingWorx Reference Architecture for High Availability
The following image shows ThingWorx in a high availability configuration.
Following are the components in this configuration and their role in an HA deployment:
Users and Devices- No role in HA functionality. From their perspective nothing changes. They always use the same URLs and IP address even if there is a change in the primary ThingWorx server.
Firewalls- No HA function and can be considered optional. Firewalls are often placed to implement security requirements.
Load Balancers- Load balancers manage a virtual IP address for the application they are supporting. All traffic routed to that virtual IP address is directed to the active application that can receive it.
ThingWorx Connection Servers- Receives web socket traffic from assets and route it to the ThingWorx Platform. The connection servers can operate in an active/active configuration. Once an asset is directed to a specific Connection Server, it should always use the same connection server. If that server goes offline, then the asset should be redirected to another available Connection Server.
ThingWorx Foundation- Receives all user and asset traffic. ThingWorx Foundation operates in an active/passive configuration with one leader server and one or more standby servers. The leader server is online and receiving traffic. The standby servers are running in a warmed-up state when the application is running, but has no active connections to the database and is not receiving traffic. A load balancer routes all traffic to the leader. If the leader goes offline, then the standby is promoted to leader and traffic is then routed to it.
ThingWorx Repositories- These are required storage locations such as ThingworxPlatform, ThingworxStorage, and ThingworxBackupStorage, and any additional storage locations added to support your implementation. For an HA environment, ThingWorx repositories must exist in a common storage location where all ThingWorx servers (leader and standby) can equally access them.
Apache ZooKeeper- ZooKeeper is a centralized coordination service used by ThingWorx to elect one of the ThingWorx servers as the leader at any given time. A ZooKeeper client is embedded in each ThingWorx server to maintain a heartbeat and react to changes in the configuration, such as failure of the current ThingWorx leader.
PostgreSQL- For an HA configuration, PostgreSQL will operate through two or more server nodes in a hot standby configuration. One node receives all write traffic, and one of the other nodes can receive all read traffic. Streaming replication is activated between all nodes to keep each node up to date.
Pgpool-II- This is only used in PostgreSQL HA configurations. Pgpool-II nodes receive the ThingWorx requests (reads and writes) and directs them to the appropriate PostgreSQL node. It also monitors the health of each PostgreSQL node and can initiate failover and retargeting tasks when one of the nodes goes offline.
Microsoft SQL Server (not pictured)- Microsoft Failover is used to ensure at least one MS SQL server is online and available.
DataStax Enterprise (DSE)- A DSE implementation is not required for a ThingWorx HA configuration. If it is needed to meet the ingestion requirements of the implementation, then ensure it is configured for HA. The typical DSE implementation meets most HA requirements. It has multiple Cassandra nodes collecting content and at least two Solr nodes. The DSE design replicates all content to at least one other node.
* 
As of version 8.5.0 of ThingWorx platform, DataStax Enterprise is no longer for sale and will not be supported in a future release. Reference the End of Sale article for more information.
Requirements before Installation
Notes and Warnings:
The steps in this HA process should be used by a database administrator (DBA) with previous experience with the relational databases in the HA configuration (PostgreSQL, Microsoft SQL Server, and DataStax Enterprise). The required knowledge includes installation, optimization, and high availability clustering.
The guidance provided here is for deploying HA environments. Additional performance tuning in a production environment might be necessary, but is not provided here.
Detailed steps are examples for reference and are intended for a QA or sandbox environment only. Installers may need to edit the commands and settings for optimal performance in a production environment.
All failover configurations must be fully tested and validated before being used in production.
The steps in this process do not discuss failback scenarios, where a failed leader is corrected and then returned to the leader position. It assumes the failed component is corrected and return to service as a non-leader component.
Supported Operating Systems
General HA Requirements
Virtual IP Addresses
Users and assets to Connection Servers (if Connection Servers are used)
Connection servers to ThingWorx Foundation
ThingWorx Foundation to PostgreSQL HA (if PostgreSQL is used)
ThingWorx Foundation to Microsoft SQL Server HA (if Microsoft SQL Server is used)
Hardware Requirements
The steps provided here assume complete hardware redundancy is used in a ThingWorx HA configuration.
Each instance of an application should be running on separate hardware to avoid single points of failure at the hardware level. For example, ThingWorx servers, whether physical, virtual, or cloud-based, should not operate on the same physical hardware.
This requirement is expected for all applications in the ThingWorx HA configuration (ThingWorx, PostgreSQL, DataStax Enterprise, ZooKeeper, and so forth) to mitigate the risk of hardware failures.
Redundant routers, switches, power supplies, and so forth are assumed by the process provided here.
ThingWorx Properties in an HA Configuration
ThingWorx properties should be set to persistent to prevent data loss in the event of a failover. If they are not persistent, a failover from primary to secondary servers would clear the in-memory values.
PostgreSQL Requirements
Pgpool-II and PostgreSQL DB installed in RHEL or Ubuntu environments.
Minimum of two DB host servers running a supported version of PostgreSQL. Three is recommended.
Two servers running Pgpool-II 3.7.<latest> with watchdog configured is typical. This is the example given here, but other HA configurations that do not use Pgpool-II are options.
Microsoft SQL Server Requirements
Minimum of two DB host servers running a supported version of Microsoft SQL Server.
Microsoft SQL Server is configured to operate through one of the following Microsoft’s HA methodologies:
AlwaysOn Failover Cluster Instances
Always on Availability Groups
DataStax Enterprise Requirements
Minimum of five nodes for a DataStax Enterprise cluster:
Three Cassandra nodes
Two Solr nodes
(optional) One DSE OpsCenter node for administrative work, as OpsCenter is not critical to the operation and does not require a high availability configuration.
InfluxDB Requirements
Minimum of two meta nodes, recommended three for most use cases
Minimum of two data nodes, recommended to have even numbers of data nodes
A typical deployment should have three meta nodes, and an even number of data nodes
Was this helpful?