Building Blocks > Solution-Specific Building Blocks > Operation KPI Building Block > Database Availability Loss and Its Implications
Database Availability Loss and Its Implications
The DPM solution uses two databases: the DPM database that is created when the solution is initially deployed, and the ThingWorx database. A database can become unavailable for many reasons, such as regular database maintenance, loss of network connection, or disk failure. In the event that either database becomes unavailable, there is the potential for either the loss of data that is incoming through data automation, or the duplication of this data.
When lost or duplicated data includes job order, material master, or target quantity events, any subsequent events and counts can be recorded against the incorrect job orders or materials.
When lost or duplicated data includes production count or scrap count events, the value for the production and scrap counters can become incorrect. For rollover counters, this has a negative ripple effect. For more information on rollover counters, see Setting Up Data Automation.
Both DPM and ThingWorx have processes in place to avoid data loss or duplicated data in most scenarios when a database becomes unavailable:
If the DPM database becomes unavailable at any point during batch processing, the batch processing quits at that point. The PTCLastProcessedEventTimestamp property is set to the timestamp of the last successfully processed event. The next time the automation event timer is triggered, the value stream is queried for unprocessed event data. All events with a timestamp that occurs after the current PTCLastProcessedEventTimestamp property value are processed, including the events that were not yet processed when the batch processing previously quit.
If the ThingWorx database becomes unavailable, event data is written to the value stream queue. ThingWorx has a retry mechanism to reestablish the connection to the database. When the database becomes available, entries from the value stream queue are added to the value stream.
This topic describes the uncommon scenarios in which database unavailability can result in data loss or duplicated data, provides information on what can be done to identify when these scenarios have occurred, and how to address resulting data issues.
Data Automation Flow
To better understand these scenarios, it is useful to understand the high-level flow for processing event data from data automation. The following flow is followed by every pacemaker that is configured for data automation:
1. Event data is received through data automation. Good quality data is written to the value stream queue. Entries from the queue are added to the value stream.
2. The automation event timer is triggered, and batch processing begins:
a. The value stream is queried for unprocessed events. Unprocessed events are those events which have a timestamp that occurs after the current value of the PTCLastProcessedEventTimestamp property.
b. The queried events are processed for each timestamp, in order.
c. Production counts and scrap counts are buffered to consolidate the counts into groups for each counter.
d. The grouped counts are inserted into the DPM database.
3. The PTCLastProcessedEventTimestamp property is updated after each event is successfully processed. For production counts and scrap counts, the PTCLastProcessedEventTimestamp property is not updated until all of the grouped counts have been inserted into the database. The latest timestamp from all of the buffered groups is set as the PTCLastProcessedEventTimestamp property value.
Scenarios
The following table describes uncommon scenarios which can result in lost or duplicated data. Keep in mind that the data automation flow happens on every pacemaker that is configured for data automation. Depending on the timing when a scenario occurs, it can impact some pacemakers and not others.
Scenario
Description
Result
The DPM database becomes unavailable while processing multiple events which all have the same timestamp.
Every event received through data automation has a timestamp. It is possible, though uncommon, for multiple events to have the same timestamp. When this occurs, events with the same timestamp are processed in the following order: job order, material master, target quantity, availability, and finally production counts and scrap counts.
If there are multiple events with the same timestamp, the PTCLastProcessedEventTimestamp property is updated after the first event with that timestamp is processed.
In the event that the DPM database becomes unavailable before all events with that same timestamp have been successfully processed, any unprocessed events with that timestamp are lost, including production count and scrap count events with that timestamp. This is because the next time the automation event timer is triggered, the batch processing queries for events with timestamps that occur after the current PTCLastProcessedEventTimestamp property value.
Any unprocessed events that have the same timestamp as the PTCLastProcessedEventTimestamp property are lost, including production count and scrap count events.
The ThingWorx database is unavailable, and the value stream queue becomes full.
When the ThingWorx database becomes unavailable, events coming from data automation continue to be added to the value stream queue until its queue size is reached. When the database becomes available again, the value stream queue is processed to add entries to the value stream, where they are queried during the batch processing when the automation event timer is next triggered.
If the value stream queue becomes full, any new events coming from data automation are rejected.
Rejected events are lost.
The ThingWorx database is unavailable, and the ThingWorx server is restarted.
When the ThingWorx server is restarted, all contents in the value stream queue and persistence queue are lost. Entries that were added to the value stream, but have not yet been processed, are retained.
All data in the value stream queue and persistence queue is lost.
The ThingWorx database is unavailable, the connection retry count has been exhausted, and ThingWorx is shut down.
When the number of retry attempts that is configured for the ThingWorx database retry mechanism is exhausted, the ThingWorx server is shut down. For more information on the ThingWorx retry mechanism, see Storing Data in ThingWorx in the ThingWorx Help Center.
All data in the value stream queue and persistence queue is lost.
Any new events coming from data automation while ThingWorx is shut down are lost.
The ThingWorx database becomes unavailable before the persistence queue is processed to update the PTCLastProcessedEventTimestamp property, and the ThingWorx server is restarted.
If the ThingWorx database becomes unavailable before the persistence queue is processed to update the PTCLastProcessedEventTimestamp property, and the ThingWorx server is restarted, the contents of the persistence queue are lost. The PTCLastProcessedEventTimestamp property value is left at the previous value. This means that events with timestamps that occur after the PTCLastProcessedEventTimestamp property value, which have already been processed and added to the DPM database, will be reprocessed and added to the database again.
Reprocessed events create duplicate data.
Identifying Database Availability Incidents
You can identify when a database availability event has resulted in data loss or data duplication in two ways:
Monitor the Production Dashboard during production. Operators of automated pacemakers can identify when incorrect data is being shown, in the both current production blocks and the expanded event logs.
Review the ThingWorx logs for database unavailability errors, particularly when you are aware of database maintenance or other actions occurring that can impact database availability. Identify the impacted pacemakers and check the data in Production Dashboard to see if there is missing or duplicated data.
Addressing Data Issues
While there is no guaranteed strategy to avoid these rare scenarios, there are actions you can take to address the results when they have happened.
Reset incorrect production and scrap counters for individual pacemakers:
1. Stop the Kepware server.
2. In ThingWorx Composer, navigate to the pacemaker Thing.
3. Under Properties, add a row to the PTCLastAutomationProcessedValues property infotable with the following information for each counter that you are resetting:
propertyName—For production counters, enter PTCPoductionCount. For scrap counters, enter the name of the scrap property.
value—The value to which you want to reset the counter.
jobOrderUid—This field is ignored by DPM.
4. Start the Kepware server.
For more information on production counters and scrap counters, see Setting Up Data Automation.
Manually enter missing data for individual pacemakers using the Production Dashboard. For production from the last 24 hours, loss events, production counts, and scrap counts that occurred during the last 24 hours can be manually entered for each pacemaker. Additionally, historical scrap events can be added for up to one week.
Manually remove duplicated production count for individual pacemakers using the Production Dashboard. For the current production block, you can remove production count from the Production Entry pane of the Production Dashboard. To remove production count from the last 24 hours, you can use the Time Loss Accounting window.
Was this helpful?