Servigistics InService Deployment > Backup and Recovery > Manually Recovering From a Failure
  
Manually Recovering From a Failure
Servigistics InService is designed to recover on its own following an abrupt termination. Below are some of the key points in an automatic recovery:
1. Servigistics InService is capable of a roll back to a consistent state during start-up. If a server goes down, then during the next start the system will clean its state and abort tasks which were in Paused, Failed or In Progress state. During that recovery period Servigistics InService will not allow an administrator to login to the Task Manager application. Instead, a System is in recovery mode error message appears.
* 
Do not stop the servers during the recovery period. To monitor the recovery progress you can review the enigma.log file on the Viewer as this is updated during the recovery execution. This log file can be found at <Installation>/log/System/Log/enigma.log
2. If a task is in the state ABORTED, then Servigistics InService recovers to its last known good state. All changes that were done during the execution of this task execution are rolled back.
However, under some circumstances Servigistics InService may not be able to recover automatically. This could happen if a task goes to a Failed state, or if servers were stopped when the startup recovery was in progress. In this case you need to recover the system manually. To do this use the following procedures.
Using the Task Manager note all running tasks which were in progress at the time of the failure.
* 
1. This document is applicable for Servigistics InService 6.3 release.
2. The name of the segment referred to in the following examples is "TestNew".
3. The repositories folder name contains a timestamp in their name, so the actual value is based on your environment and time at which the repositories were created. The values below are just examples.
4. The process outlined below covers recovery steps for the vast majority of cases. However, there may be additional recover steps to perform, depending on the cause of the failure. Please contact PTC Technical Support for any questions.
Prerequisites
 The following should be noted before beginning this process:
1. You have access to Publisher and Viewer.
2. No Servigistics InService task is in a running or Paused Failed state. This must be verified from the Task Manager > Task Monitor tab before performing the steps that follow.
* 
The servers need to be running in order to collect information from the Task Monitor tab. After confirming the status of tasks, stop the servers and proceed as described below. The servers need to be restarted once all recovery steps have been completed.
Manually Recovering
Database failures cause exception errors to appear in task logs or the enigma.log file. For example:
java.lang.RuntimeException: java.sql.SQLRecoverableException: No more data to read from socket
If these errors appear use the following procedure to manually recover your data.
1. If they are not already running, start the database and then the application servers.
2. After the server restarts, the application will roll back processes that were remaining when the database operation was interrupted. The amount of time this process may take is dependent on the amount of data.
3. Check the recovery status:
On the Publisher login to the Task Manager. If you are able to login, the recovery process is complete. A System is in recovery mode indicates that the recovery process is still in progress.
On the Viewer refer to the following file: <Installation>/log/System/Log/enigma.log file. Once the servers restart, scan the logs for the following message: Action RollbackTasksCommand has finished successfully. Note the timestamp to ensure that the message is current, and not from a previous restart of the Viewer.
4. Once the recovery has completed ensure that the following directories in all segments are empty
..\data\Titles\<SegmentName>-20141229124706\Data\Core\Tr.
\data\Titles\<SegmentName>-20141229124706\Data\Core\Delta
If these directories are NOT empty contact PTC Technical Support.
5. After all above checks are done, create and run the XAWorkflow.sql script on the E3C Schema of both the Publisher and Viewer. This script can be downloaded from the PTC Technical Support site:
https://support.ptc.com/appserver/cs/portal/
6. Restart all servers.
Verification Points
On the Viewer machine:
File or Directory
Verification Point
Information
Config/System/Config/Core/coreCMIPublications.xml
<Publication ID="TestNew" Version="Version3" Preload="NO">
Check the preload flag; this should have the value Preload="NO" for all publications EXCEPT CoreUtils. If Preload ="YES", change it manually to "NO".
Config/System/Config/Core/corePublications.xml
<Publication ID="SHARED" Version="Version3" Preload="YES">
Check the preload flag; this should have the value Preload="YES" for all publications. If you see Preload ="NO", change it manually to "YES".
Ensure the MaxDeltaLevel value is 1.
Config/Titles/<SegmentName>/Config/BusinessLogic/runtimeProperties.conf.xml
<Property Name="RepositoryFlipStatus" Value="Regular"/>
<Property Name="ActiveConnectionManager" Value="IConnectionManager"/>
For all segments these properties should have the exact values as shown; if any other values appear, such as "Flipped" or "IConnectionManager2" then correct those to point to "Regular" and "IConnectionManager”.
Config/Titles/<SegmentName>/Config/Presentation/runtimeProperties.conf.xml
<Property Name="ActiveConnectionManager" Value="IConnectionManager"/>
<Property Name="DataSource_<SegmentName>_Read_FS" Value="Regular"/>
For all segments these properties should have the exact values as shown.
Tr directories
<Installation>\data\Titles\<SegmentName>-20141229124706\Data\Core\Tr
Should be empty for all segments. If NOT, contact PTC Technical Support.
../Data/Titles/<SEGMENT_NAME>-20160212161907/Data/Core/Delta
This directory should be empty. If not, delete all contents inside this directory.
Database Recovery
Before you begin, ensure the following:
You have appropriate access rights on the database.
Database, account usernames, and passwords need to be known beforehand.
You see backup tables with a suffix ID, such as TN_CM_ADMIN_TREE_<AXD3ER> (this suffix is variable).
* 
If you don’t see any backup tables, then there is no need to execute the queries below. The system either performed a database rollback itself, or the failure point was not working with a database transaction.
If any of the following tables appear, then see Recovery Queries.
Publisher Schema
E3C Schema:
If a backup exists, there will be backup tables for:
TN_CM_ADMIN_TREE
TN_CM_SOURCES
TN_CM_FOLDERS
TN_CM_FEED
TN_CM_SHARED
Viewer Schema
1. CMI Schema. Copy all tables from backup. Four backup tables exist:
TN_CM_ADMIN_TREE
TN_CM_SOURCES
TN_CM_FOLDERS
TN_CM_SHARED
2. TitanDB and TitanDB2
Copy data of all tables from TitanDB schema to TitanDB2 schema. TitanDB and TitanDB2 user accounts have access to each other’s tables, so you can copy data from one schema to another.
The schema on TitanDB2 does not have to be used. Instead, create a table using TitanDB.
Recovery Queries
Publisher: E3C Schema
* 
CollectionName is the name of the collection for which the task was running when failure occurred.
Replace BACKUP table name with the original backup table name created in the actual environment.
1. DELETE FROM TN_CM_FOLDERS WHERE NODE_ID IN (SELECT NODE_ID FROM TN_CM_SHARED WHERE COLL_NAME = CollectionName)
2. DELETE FROM TN_CM_SOURCES WHERE NODE_ID IN (SELECT NODE_ID FROM  TN_CM_SHARED WHERE COLL_NAME = CollectionName)
3. DELETE FROM TN_CM_SOURCES WHERE NODE_ID IN (SELECT NODE_ID FROM  TN_CM_SOURCES_BACKUP)
4. DELETE FROM TN_CM_ADMIN_TREE WHERE NODE_ID IN (SELECT NODE_ID FROM TN_CM_SHARED WHERE COLL_NAME = CollectionName)
5. DELETE FROM TN_CM_ADMIN_TREE WHERE NODE_ID IN (SELECT NODE_ID FROM TN_CM_ADMIN_TREE_BACKUP)
6. DELETE FROM TN_CM_SHARED WHERE COLL_NAME = CollectionName
7. DELETE FROM TN_CM_FEED where FEED_ID in (SELECT FEED_ID FROM  TN_CM_FEED_BACKUP)
8. INSERT INTO TN_CM_ADMIN_TREE SELECT * FROM TN_CM_ADMIN_TREE_BACKUP
9. INSERT INTO TN_CM_FOLDERS SELECT * from TN_CM_FOLDERS_BACKUP
10. INSERT INTO TN_CM_SOURCES SELECT * from TN_CM_SOURCES_BACKUP
11. INSERT INTO TN_CM_FEED SELECT * from TN_CM_FEED_BACKUP
12. INSERT INTO TN_CM_SHARED SELECT * from TN_CM_SHARED_BACKUP
13. DELETE FROM TN_CM_FEED where FEED_ID NOT IN (SELECT NODE_ID FROM TN_CM_SOURCES)
Viewer CMI Schema
* 
Collectioname is the name of the collection for which a task was running when failure occurred.
1. DELETE FROM TN_CM_FOLDERS WHERE NODE_ID IN (SELECT NODE_ID FROM TN_CM_SHARED WHERE COLL_NAME = CollectionName)
2. DELETE FROM TN_CM_SOURCES WHERE NODE_ID IN (SELECT NODE_ID FROM  TN_CM_SHARED WHERE COLL_NAME = CollectionName)
3. DELETE FROM TN_CM_SOURCES WHERE NODE_ID IN (SELECT NODE_ID FROM  TN_CM_SOURCES_BACKUP)
4. DELETE FROM TN_CM_ADMIN_TREE WHERE NODE_ID IN (SELECT NODE_ID FROM TN_CM_SHARED WHERE COLL_NAME = CollectionName)
5. DELETE FROM TN_CM_ADMIN_TREE WHERE NODE_ID IN (SELECT NODE_ID FROM TN_CM_ADMIN_TREE_BACKUP)
6. DELETE FROM TN_CM_SHARED WHERE COLL_NAME = CollectionName
7. INSERT INTO TN_CM_ADMIN_TREE SELECT * FROM TN_CM_ADMIN_TREE_BACKUP
8. INSERT INTO TN_CM_FOLDERS SELECT * from TN_CM_FOLDERS_BACKUP
9. INSERT INTO TN_CM_SOURCES SELECT * from TN_CM_SOURCES_BACKUP
10. INSERT INTO TN_CM_SHARED SELECT * from TN_CM_SHARED_BACKUP