Use the Cluster settings to create a clustered environment, to monitor the health of the cluster and its members, and to define managed networks for high availability and load distribution.
It is the responsibility of the Appliance Administrator or the Operations Administrator to create a cluster, monitor the status of the cluster, and define managed networks.
Before creating a Safeguard for Privileged Passwords cluster, become familiar with the Disaster recovery and clusters chapter to understand:
Go to the following:
- web client: Navigate to Cluster.
Cluster Management allows you to create and diagnosis clusters.
When using Cluster Management from the web client, performing operations against other members of the cluster will incur a Cross-Origin Resource Sharing (CORS) HTTP request. This may require you to change the Trusted Servers, CORS, and Redirects setting to allow the specific host name being used in your web browser.
Navigate to Cluster > Cluster Management.
Cluster Management grid
- Health indicators: Health indicators display in the first column in the Cluster Management grid. Cluster members periodically query other appliances in the cluster to obtain their health information. Cluster member information and health information is cached in memory, with the most recent results displayed.
The health indicators on the nodes indicate if cluster members are in any of these states:
error: Indicates a definite problem impacting the functionality of the cluster
warning: Indicates a potential issue with the cluster
locked: Indicates the cluster is locked
(green) healthy state.
Expand the View More section to see more details.
- Name: The name of the appliance.
- Network Address: The IPv4 address (or IPv6 address) of the appliance configuration interface. You can modify the appliance IP address. For more information, see How do I modify the appliance configuration settings.
- Primary: Displays Yes if the appliance is the primary.
- Appliance State: Indicates the appliance state. For a list of available states, see Appliance states.
When you select an appliance, the details for the appliance display on the right. The grid information displays: name, network address, primary, and state. This additional information is available:
- Disk Space: The amount of used and free disk space.
- Version: The appliance version number.
- Last Health Check: Last date and time the selected appliance's information was obtained.
- Uptime: The amount of time (days, hours, and minutes) the appliance has been running.
- If the replica is selected, this additional information displays for the Primary:
- Network Address: The network DNS name or the IP address of the primary appliance in the cluster
MAC Address: The media access control address (MAC address), a unique identifier assigned to the network interface for communications
Link Present: Displays either Yes or No to indicate if there is an open communication link
Link Latency: The amount of time (in milliseconds) it takes for the primary to communicate with the replica. Network latency is an expression of how much time it takes for a packet of data to get from one designated point to another. Ideally, latency is as close to zero as possible.
- Errors and warnings are reported:
Errors: Errors are reported. For example, if an appliance is disconnected from the primary (no quorum), an error message may be: Request Workflow: Cluster configuration database health could not be determined.
Warnings: Warnings are reported. For example, if an appliance is disconnected from the primary (no quorum), a warning message may be: Policy Data: There is a problem replicating policy data. Details: Policy database slave IO is not running. The Safeguard primary may be inaccessible from this appliance.
Unlocking a locked cluster
In order to maintain consistency and stability, only one cluster operation can run at a time. To ensure this, Safeguard for Privileged Passwords locks the cluster while a cluster operation is running, such as enroll, unjoin, failover, patch, reset, session module join, update IP, and audit log maintenance. While the cluster is locked, changes to the cluster configuration are not allowed until the operation completes.
The lock notification displays as follows:
- web client: The Appliance State will show a red lock icon ().
You should never cancel the cluster lock for an SPP unjoin, failover, cluster reset, restore, patch, or IP address update. Other considerations:
- If a SPP join (enroll) is taking a long time, you may cancel it during the streaming audit data step.
- If a patch distribution is taking a long time, you may cancel it and upload the patch to the replicas directly.
- If an audit log synchronize operation is taking a long time, or you have reason to believe it will not complete due to a down appliance in the cluster, you may cancel it. Canceling this operation requires monitoring as detailed in Cancel Audit Log Maintenance from the Audit Log Maintenance page.
If an audit log archive or purge operation is taking a long time, or you have reason to believe it will not complete due to a down appliance in the cluster, you may cancel it. Canceling this operation requires monitoring as detailed in Cancel Audit Log Maintenance from the Audit Log Maintenance page.
To unlock a locked cluster
- Go to Cluster Management:
- web client: Navigate to Cluster > Cluster Management.
- Click the lock icon in the upper right corner of the warning banner.
In the Unlock Cluster confirmation dialog, enter Unlock Cluster and click OK.
This will release the cluster lock that was placed on all of the appliances in the cluster and close the operation.
IMPORTANT: Care should be taken when unlocking a locked cluster. It should only be used when you are sure that one or more appliances in the cluster are offline and will not finish the current operation. If you force the cluster unlock, you may cause instability on an appliance, requiring a factory reset and possibly the need to rebuild the cluster. If you are unsure about the operation in progress, do NOT unlock the cluster.
Task delegation in a cluster
A Safeguard for Privileged Passwords' cluster delegates platform management tasks (such as password and SSH key check and password and SSH key change) to appliances based on platform task load. The primary appliance performs delegation and evaluates cluster member suitability using an internal fitness score that is calculated by dividing the number of in-use platform task threads by the maximum number of allowed platform task threads.
The maximum number of allowed platform task threads can be adjusted using the Appliance/Settings API and adjusting the MaxPlatformTaskThreads value. By adjusting this number, you can tune task distribution.
IMPORTANT: Adjusting the MaxPlatformTaskThreads will impact SPP's available resources for handling access requests and may impact user experience. Best practice is to engage Professional Services if the value may need to be changed.
Increasing the maximum number of allowed platform task threads will decrease the fitness score thus increasing the number of tasks passed to that appliance.
The fitness score is cached and is recalculated in 8-minute intervals when the scheduler is not busy. When the scheduler is running tasks, the fitness score is calculated more frequently so the scheduler can dynamically adjust.
The selection of a Safeguard for Privileged Sessions (SPS) Appliance is primarily dependent on managed network rules. However, if there aren't any managed network rules or if the managed network rules result in more than one SPS appliances selected, a fitness score is used as the tie breaker. The fitness score is calculated based on the percentage of disk available minus the overall load average of the SPS appliance. (Load average is a Linux metric which provides a numerical indication of the overall resource capacity in use on the server.) The higher the fitness score, the more likely that the corresponding appliance will be selected.