Batch Processing with Durable Queues
When ThingWorx is processing an event, persistent property, or value stream from the ThingWorx nodes using Apache Kafka or Azure Event Hubs, it fetches a batch of messages from the respective durable queue and caches it in external memory for processing. Until the processing of that batch is completed in an at-most-once (AMO) delivery, the next batch is not fetched from the durable queue. In AMO delivery, the messages are marked as processed as soon as they are fetched. Therefore, at any given point there will be limited messages in the batch available in the ThingWorx node. If the Thingworx node goes down, the messages that are not read are preserved, which ensures limited data loss.
When a message fails to be sent to Kafka, there is a retry. After the set number of retries, that message is lost.
The maximum size of a message that you can send from ThingWorx to Event Hubs is 1 MB. For more information, see the Event Hubs frequently asked questions on the Microsoft Azure website. For Kafka, the default message size is 1 MB and is configurable. For more information, see the Apache Kafka website.
To monitor metrics in Kafka, see Monitoring Apache Kafka. For information about monitoring metrics in Event Hubs, see the PTC Community article Durable Queues Are Here.
Batch Size of Cached Data in Durable Queues
ThingWorx Platform fetches data from Apache Kafka in batches to scale it to the necessary throughput. The batch size impacts the processing performance. If the batch size is higher, the performance is better. However, the risk of data loss increases with higher batch sizes. After a batch is pulled from Kafka, the ThingWorx node may go down before the batch is processed. In this case, the cached data in the batch will not be processed and is considered lost.
For events, you can configure the Max Cached Data for Durable Events in Composer under EventProcessingSubsystem > Configuration. For more information about this setting, see Event Processing Subsystem.
For persistent and logged properties, you can configure the maximum cached data in the platform-settings.json file. For more information, see Queue Provider Configuration Settings.
Was this helpful?