To ensure password and SSH key consistency and individual accountability for privileged accounts, when an appliance loses consensus in the cluster, access requests are disabled. In the event of an extended network partition, the Appliance Administrator can either automatically or manually place an appliance in Offline Workflow Mode to run access request workflow on that appliance in isolation from the rest of the cluster. When the network issues are resolved and connectivity is reestablished, the Appliance Administrator can either automatically or manually resume online operations to merge audit logs, drop any in-flight access requests, and return the appliance to full participation in the cluster.

Offline workflow considerations
  • In Offline Workflow Mode, an appliance functions apart from the other members of the cluster. Users can request passwords and sessions.
  • Settings for Offline Workflow are set on an individual appliance.
  • Suspend/Restore account does not work in Offline Workflow mode.

Passwords and SSH keys in Offline Workflow Mode

  • In Offline Workflow Mode, the appliance is enabled to request, approve, and release passwords, SSH key, and sessions without a quorum, using cached policy data.
  • In Offline Workflow Mode, when policy requires change after check-in, the requirement is bypassed to allow for subsequent check out. In this case, a Access Request Password or SSH Key Reset By-passed Event is generated, stating: An access request subsequent check out is available as password [or SSH key] reset was by-passed.

  • Password and SSH key changes will be rescheduled and will possibly complete when network connectivity is restored even while the appliance is in Offline Workflow Mode.

  • Users may still request a password or SSH key from the primary or another replica on the cluster with consensus; password and SSH key check and changes works as usual. The result is that passwords or SSH keys may get out of sync on the appliance running Offline Workflow Mode. This is expected behavior and the password and SSH key will remain out of sync until the partition is healed.
  • On a network partition where one or more appliances are in Offline Workflow Mode, it is possible for two individuals to have the same password and SSH key at the same time. Tying actions back to a single responsible individual is not possible. It will still be possible to identify each person that had access to the password and SSH key at the time.

Policies in Offline Workflow Mode

  • Policy will be enforced as it existed at the time the appliance, now in Offline Workflow Mode, lost network connectivity to the rest of the cluster.

  • Policy requiring a password and SSH key change after check-in is bypassed and subsequent check-out from the appliance in Offline Workflow Mode is allowed.

  • Policy is Read-only. Therefore, update and delete configuration operations are not allowed on the appliance in Offline Workflow Mode.
  • Policy changes are only allowed if directed at an online primary within the cluster. Policy changes on the online primary do not affect the appliance in Offline Workflow Mode. Once the offline workflow appliance has resumed online operations the policy changes will be distributed.

Work flow in Offline Workflow Mode

  • Regular workflow approval rules apply.
  • Time-based constraints and emergency access apply.
  • For the few minutes the appliance is switching to or from Offline Workflow Mode, Application to Application and any command line password or SSH key-fetching operations will be suspended.
  • Platform tasks (including Suspend and Restore Accounts) are disabled in Offline Workflow Mode.

User experience: Enable Offline Workflow Mode

Users that are requesting a password and SSH key in Safeguard are returned to the Home page. Password and SSH key requests prior to the switch to Offline Workflow Mode are not displayed.

  • When the switch to Offline Workflow Mode starts, this message displays: Safeguard is switching to Offline Workflow Mode. Please wait until this process is complete before proceeding with any current work. The bottom of the Home page displays this information: (Switching to Offline Workflow Mode...) and Disconnected. If the user clicks Refresh, the banner is replaced with: The service is unavailable.
  • When the switch to Offline Workflow Mode is complete, a banner with this information is displayed: Safeguard is currently in Offline Workflow Mode. Previous access requests are temporarily unavailable. You may submit new requests to continue working in Offline Workflow Mode. The bottom of the Home page displays these messages: (Offline Workflow Mode) and the connection status: Connecting then Connected.

Administrators can view the workflow status on the Cluster View pane where a message like this displays: Offline Workflow Enabled (This appliance is running access workflow in isolation from the cluster.) For more information, see Cluster Management..

User experience: Resume Online Operations

When the switch to Resume Online Operations has begun, this message displays: Safeguard is returning to normal operations. Please wait until this process is complete before proceeding with any current work. The bottom of the Home page displays this information: (Returning to normal operations) and Disconnected.

Once online operations are restored, the bottom of the Home page displays this information: Connected.

Notifications

  • The Appliance Administrator is notified when an appliance has lost consensus (quorum) via the ApplianceStateChangedEvent.

    • A primary will change from Online to PrimaryNoQuorum.
    • A replica will change from Online to one of the following:
      • ReplicaNoQuorum (connected to primary, does not have quorum)
      • ReplicaDisconnected (disconnected from primary, does not have quorum)

      • ReplicaWithQuorum (disconnected from primary, has quorum)

      For more information, see Appliance states..

  • The following events can be configured for email notifications and are written to the audit log:
    • ClusterPrimaryQuorumLostEvent

    • ClusterPrimaryQuorumRestoredEvent

    • ClusterReplicaQuorumLostEvent

    • ClusterReplicaQuorumRestoredEvent

  • All access request notifications are still generated.
  • The Notification service identifies whether access workflow is available on an appliance via the IsPasswordRequestAvailable, IsSSHKeyRequsteAvailable, and IsSessionsRequestAvailable properties. The following API endpoint can be used to make this determination:

    https://<hostname or IP>/service/notification/v4/Status/Availability

Audit logs in Offline Workflow Mode

  • Prior to network connectivity being restored, everything that happens on the appliance running in Offline Workflow Mode is only audited on that appliance.

  • The audit logs merge when network connectivity is restored between the offline member and any other member in the cluster, even while in Offline Workflow Mode.
  • The audit data on any cluster member operating in Offline Workflow Mode will be lost unless the appliance is returned to the cluster using the resume online operations steps.
  • All cluster members that were capable of processing access and session requests must have network connectivity restored to the remainder of the cluster to ensure the cluster wide audit history is maintained.

Avoid modifications to the cluster configuration

  • It is recommended that no changes to cluster membership are made while an appliance is in Offline Workflow Mode. The online operations must be automatically or manually resumed before adding or removing other nodes to ensure the appliance can seamlessly reintegrate with the cluster.

    The Appliance Administrator is advised to resume the online operations as soon as possible for individual password or SSH key accountability, policy adherence, and audit integrity.

Cluster patching is not allowed

During a cluster patch, Offline Workflow Mode cannot be triggered manually or automatically on any of the clustered appliances.

Considerations to resume online operations

  • The network partition must be corrected before resuming online operations with full functionality.
  • You can resume online operations of an appliance in Offline Workflow Mode without a quorum. To resume online operations, it is highly recommended that network connectivity is restored between a majority of the cluster members, including the member in Offline Workflow Mode.

  • When resuming online operations, any access requests that are in flight on the appliance that is running in Offline Workflow Mode will be dropped.

  • While it is possible to resume online operations if the appliance is not connected, making access requests will no longer be available.

Automatic versus manual workflow