ThingWorx High Availability > Eventual Consistency in ThingWorx HA
Eventual Consistency in ThingWorx HA
When running ThingWorx in cluster mode, model changes are eventually made consistent across the cluster. Changes on server A take a little time to sync to servers B, C, and so on. To synchronize the changes, they are logged in the synch_log maintained in the ThingWorx model persistence provider. Each server in the cluster runs a change watcher service at a configurable frequency (with a default of 100 ms) and watches for entity changes in the synch_log. The change watcher will unload and reload entities that change on that server while assuming that the database is the source of truth. This eventual consistency only applies to model and configuration changes, while state is consistent immediately.
HTTP - HTTP uses sticky sessions so that an individual user is tied to a single machine. This ensures that a user sees all changes immediately. Changes from other users on other servers will be eventually consistent.
WebSocket - When a model change is made through a WebSocket, the current model version is stored in the WebSocket session. WebSocket requests are always round-robin to help with load distribution. The next request made on that WebSocket session will pause until the server to which it is connected is at least the version stored in the session. If the server cannot sync within a second or so, it will time out.
A reduced impact to users is described in the following scenarios:
Scenario 1: HTTP
Device > HAProxy > Platform1..N
In this scenario, you connect to platform 1 and make model changes.If you then connect a different user to platform 2 and check for the changes, they will show up eventually.The cluster is eventually consistent for model changes using a sync process.If this is your use case, you should build in a retry to wait for the changes to be available.
Scenario 2: WebSocket
Device > HAProxy > CXServer1..N => Platform1..N
In this scenario, the device is point-to-point with a Connection Server.Platform requests will round robin.If the device makes a model change and then makes another request that goes to a different server, the first changes are there before processing.The request is delayed until the model version syncs to at least the version on which the change was made.If it cannot sync in a period of time, the request will time out. Therefore, this device can talk to any server and will get the proper response.
If you connect a second device to have it look at the changes made by the first device, eventual consistency applies to the second device as well. The system is only guaranteed to wait for changes across a single connection.
Scenario 3: HTTP request to change the model triggers a notification to a connected device
Device > HAProxy > CXServer1..N => Platform1..N
A user makes a model change and the device is notified of the change so it can re-pull definitions.The device had the possibility to pull the data from a server that does not see the change yet. Now, the model version is injected into the WebSocket session, so that if it requests on a server with a lower model version, it will wait for it to sync up to at least that model version before returning.If it cannot sync in time, it times out the request.
Therefore, a new postCommitEventQueue was added. Any added events, such as property changes, will be fired after the full transaction commits and the new model version is known. When the device is notified, the model version is injected into the WebSocket session to ensure that future requests from that device will wait for the server to be consistent with the original changes.
Purging synch_log
As changes are made to the Thing model, synch_log will continually grow until old entries are purged. Once model changes are made, the associated entries in synch_log can be purged, reducing the size of synch_log and improving performance and stability of the cluster.
Synch_log purging is configured in the Clustering subsystem on the Configuration tab. To enable automatic purging, configure purge schedule and configure batch size.
Setting
Default
Description
Enable
True (Enabled)
This setting enables or disables automatic purge.
Schedule
0 30 0 * * ?
This sets the synch_log purge schedule.
Batch Size For Purge
10000
This indicates the number of records to be deleted from the synch_log, starting with the oldest.
Was this helpful?