How does the TPAM HA functionality work?
When an HA Pair and the HA module of TPAM are purchased, customers are able to configure TPAM for High Availability failover in the event of the Primary Appliance becoming uncontactable and the secondary (Replica) is then able to take over the service of passwords.
The below information details the HA capabilities of the TPAM solution.
The HA pair is intended to function as a pair when shipped. The Primary and Replica are labeled depending on the role they hold. These are not to be changed.
The Primary is the appliance that holds all configuration and is the appliance that users access for /PAR and the admins access via /Paradmin and /Parconfig. All password management is handled on the Primary.
When HA is configured via /Paradmin - HA settings page the two appliances communicate over port 443 and 22, pinging each other to ensure the other appliance is available. You will note the Primary has the option of "Standalone" and "Replicating". Standalone as the name suggests works independently, Replicating passes the backup to the Replica on a schedule. On the HA settings tab you will see the option "Send updates to Replica(s) every X mins". This is the setting that is configurable to send the backups from Primary to Replica. Traffic is one way only from Primary to Replica. These incremental backups will be sent to the Replica on the schedule specified.
The Replica has the following options "Replica" "Test Mode" and "Primary".
Replica - Allows the Replica to accept the backups from Primary.
Test Mode - Puts the database in Read only mode to allow mode changes from Replica to Primary or vice versa.
Primary - Allows the Replica to service passwords (with subsequent manual steps)
You will see the setting "Automatically failover if Primary is unreachable for X Minutes"
If a problem is detected with the Primary or connection is severed then a countdown begins based on the above setting. If connection to the Primary is not reestablished then the failover will occur. Emails and alerts are triggered all the way through the process. It is expected that the issues with Primary be resolved during this countdown and the failover to Replica be a safety net.
If the failover happens then the Replica will then change its mode to be Primary, however at this point it is not managing or servicing password requests until a DNS change is made or users are informed to point to the Replica IP address, you may have a load balancer handling this or DNS.
Still the Replica would need another manual step to fully service password changes, and you would need to enable the Automation Engine on the Replica /PARADMIN - Automation Engine - Auto Mgt Agent, this ensures that the two appliances are NOT servicing password requests at the same time causing an out of sync situation.
To return the Replica back to the secondary appliance a manual restore would be required, as mentioned previously traffic is one way, so any password requests or changes will be unknown to the Primary, when connection to the Primary is restored and admins want to put traffic back through the Primary a backup would need to be taken from the Replica and restored to the Primary. The following solution steps you through this restore process: