Chat now with support
Chat with Support

syslog-ng Store Box 6.0.5 - Administration Guide

Preface Introduction The concepts of SSB The Welcome Wizard and the first login Basic settings User management and access control Managing SSB Configuring message sources Storing messages on SSB Forwarding messages from SSB Log paths: routing and processing messages Configuring syslog-ng options Searching log messages Searching the internal messages of SSB Classifying messages with pattern databases The SSB RPC API Troubleshooting SSB Security checklist for configuring SSB Glossary

Managing a high availability SSB cluster

High availability (HA) clusters can stretch across long distances, such as nodes across buildings, cities or even continents. The goal of HA clusters is to support enterprise business continuity by providing location-independent failover and recovery.

To set up a high availability cluster, connect two SSB units with identical configurations in high availability mode. This creates a primary-secondary (active-backup, sometimes called master-slave) node pair. Should the primary node stop functioning, the secondary node takes over the functionality of the primary node. This way, the SSB servers are continuously accessible.

NOTE:

To use the management interface and high availability mode together, connect the management interface of both SSB nodes to the network, otherwise you will not be able to access SSB remotely when a takeover occurs.

The primary node shares all data with the secondary node using the HA network interface (labeled as 4 or HA on the SSB appliance). The disks of the primary and the secondary node must be synchronized for the HA support to operate correctly. Interrupting the connection between running nodes (unplugging the Ethernet cables, rebooting a switch or a router between the nodes, or disabling the HA interface) disables data synchronization and forces the secondary node to become active. This might result in data loss. You can find instructions to resolve such problems and recover an SSB cluster in Troubleshooting an SSB cluster.

NOTE:

HA functionality was designed for physical SSB units. If SSB is used in a virtual environment, use the fallback functionalities provided by the virtualization service instead.

On virtual SSB appliances, or if you have bought a physical SSB appliance without the high availability license option, the Basic Settings > High Availability menu item is not displayed anymore.

The Basic Settings > High Availability page provides information about the status of the HA cluster and its nodes.

Figure 55: Basic Settings > High Availability — Managing a high availability cluster

The following information is available about the cluster:

  • Status: Indicates whether the SSB nodes recognize each other properly and whether those are configured to operate in high availability mode.

    You can find the description of each HA status in Understanding SSB cluster statuses.

  • Current master: The MAC address of the high availability interface (4 or HA) of the node.

  • HA UUID: A unique identifier of the HA cluster. Only available in High Availability mode.

  • DRBD status: Indicates whether the SSB nodes recognize each other properly and whether those are configured to operate in high availability mode.

    You can find the description of each DRBD status in Understanding SSB cluster statuses.

  • DRBD sync rate limit: The maximum allowed synchronization speed between the master and the slave node.

    You can find more information about configuring the DRBD sync rate limit in Adjusting the synchronization speed.

The active (primary) SSB node is labeled as This node, this unit receives the incoming log messages and provides the web interface. The SSB unit labeled as Other node is the secondary node that is activated if the primary node becomes unavailable.

The following information is available about each node:

  • Node ID: The universally unique identifier (UUID) of the physical or virtual machine.

    NOTE:

    Due to backward compatibility, in the case of upgrades, the Node ID is the MAC address of the node's HA interface.

    For SSB clusters, the IDs of both nodes are included in the internal log messages of SSB.

  • Node HA state: Indicates whether the SSB nodes recognize each other properly and whether those are configured to operate in high availability mode.

    You can find the description of each HA status in Understanding SSB cluster statuses.

  • Node HA UUID: A unique identifier of the cluster. It is a software-generated identifier. Only available in High Availability mode.

  • DRBD status: The status of data synchronization between the nodes.

    You can find the description of each DRBD status in Understanding SSB cluster statuses.

  • Raid status: The status of the RAID device of the node.

  • Boot firmware version: Version number of the boot firmware.

    You can find more information about the boot firmware in Firmware in SSB.

  • HA link speed: The maximum allowed speed between the master and the slave node. The HA link's speed must exceed the DRBD sync rate limit, else the web UI might become unresponsive and data loss can occur.

    Leave this field on Auto negotiation unless specifically requested by the support team.

  • Interfaces for Heartbeat: Virtual interface used only to detect that the other node is still available, it is not used to synchronize data between the nodes (only heartbeat messages are transferred).

    You can find more information about configuring redundant heartbeat interfaces in Redundant heartbeat interfaces.

  • HA (Fix current): The IP address of the high availability (HA) interface. Clicking Fix current will set the IP address in question as a permanent IP address. This can be useful when automatic configuration is slow or fails to function properly for some reason.

    NOTE:

    When both nodes of a cluster boot up in parallel, the node with the 1.2.4.1 HA IP address will become the master node.

  • Next hop monitoring: IP addresses (usually next hop routers) to continuously monitor from both the primary and the secondary nodes using ICMP echo (ping) messages. If any of the monitored addresses becomes unreachable from the primary node while being reachable from the secondary node (in other words, more monitored addresses are accessible from the secondary node) then it is assumed that the primary node is unreachable and a forced takeover occurs – even if the primary node is otherwise functional.

    You can find more information about configuring next-hop monitoring in Next-hop router monitoring.

The following configuration and management options are available for HA clusters:

  • Set up a high availability cluster: You can find detailed instructions for setting up a HA cluster in "Installing two SSB units in HA mode" in the Installation Guide.

  • Adjust the DRBD (master-slave) synchronization speed: You can change the limit of the DRBD synchronization rate.

    You can find more information about configuring the DRBD synchronization speed in Adjusting the synchronization speed.

  • Enable asynchronous data replication: You can compensate for high network latency and bursts of high activity by enabling asynchronous data replication between the master and the slave node with the DRBD asynchronous mode option.

    You can find more information about configuring asynchronous data replication in Asynchronous data replication.

  • Configure redundant heartbeat interfaces: You can configure virtual interfaces for each HA node to monitor the availability of the other node.

    You can find more information about configuring redundant heartbeat interfaces in Redundant heartbeat interfaces.

  • Configure next-hop monitoring: You can provide IP addresses (usually next hop routers) to continuously monitor from both the primary and the secondary nodes using ICMP echo (ping) messages. If any of the monitored addresses becomes unreachable from the primary node while being reachable from the secondary node (in other words, more monitored addresses are accessible from the secondary node) then it is assumed that the primary node is unreachable and a forced takeover occurs – even if the primary node is otherwise functional.

    You can find more information about configuring next-hop monitoring in Next-hop router monitoring.

  • Reboot the HA cluster: To reboot both nodes, click Reboot Cluster. To prevent takeover, a token is placed on the secondary node. While this token persists, the secondary node halts its boot process to make sure that the primary node boots first. Following reboot, the primary removes this token from the secondary node, allowing it to continue with the boot process.

    If the token still persists on the secondary node following reboot, the Unblock Slave Node button is displayed. Clicking the button removes the token, and reboots the secondary node.

  • Reboot a node: Reboots the selected node.

    When rebooting the nodes of a cluster, reboot the other (secondary) node first to avoid unnecessary takeovers.

  • Shutdown a node: Forces the selected node to shutdown.

    When shutting down the nodes of a cluster, shut down the other (secondary) node first. When powering on the nodes, start the primary node first to avoid unnecessary takeovers.

  • Manual takeover: To activate the other node and disable the currently active node, click Activate slave.

    Activating the secondary node terminates all connections of SSB and might result in data loss. The secondary node becomes active after about 60 seconds, during which SSB cannot accept incoming messages. Enable disk-buffering on your syslog-ng clients and relays to prevent data loss in such cases.

Adjusting the synchronization speed

When operating two SSB units in High Availability mode, every incoming data copied from the master (active) node to the slave (passive) node. Since synchronizing data can take up significant system-resources, the maximal speed of the synchronization is limited, by default, to 10 Mbps. However, this means that synchronizing large amount of data can take very long time, so it is useful to increase the synchronization speed in certain situations — for example, when synchronizing the disks after converting a single node to a high availability cluster.

The Basic Settings > High Availability > DRBD status field indicates whether the latest data (including SSB configuration, log files, and so on) is available on both SSB nodes. For a description of each possible status, see Understanding SSB cluster statuses.

To change the limit of the DRBD synchronization rate, navigate to Basic Settings > High Availability, select DRBD sync rate limit, and select the desired value.

Set the sync rate carefully. A high value is not recommended if the load of SSB is very high, as increasing the resources used by the synchronization process may degrade the general performance of SSB. On the other hand, the HA link's speed must exceed the speed of the incoming logs, else the web UI might become unresponsive and data loss can occur.

If you experience bursts of high activity, consider turning on asynchronous data replication.

Asynchronous data replication

When a high availability SSB cluster is operating in a high-latency environment or during brief periods of high load, there is a risk of slowness, latency or package loss. To manage this, you can compensate latency with asynchronous data replication.

Asynchronous data replication is a method where local write operations on the primary node are considered complete when the local disk write is finished and the replication packet is placed in the local TCP send buffer. It does not impact application performance, and tolerates network latency, allowing the use of physically distant storage nodes. However, because data is replicated at some point after local acknowledgement, the remote storage nodes are slightly out of step: if the local node at the primary data center breaks down, data loss occurs.

To turn asynchronous data replication on, navigate to Basic Settings > High Availability, and enable DRBD asynchronous mode. You have to reboot the cluster (click Reboot cluster) for the change to take effect.

Under prolonged heavy load, asynchronous data replication might not be able to compensate for latency or for high packet loss ratio (over 1%). In this situation, stopping the slave machine is recommended to avoid data loss at the temporary expense of redundancy.

Redundant heartbeat interfaces

To avoid unnecessary takeovers and to minimize the chance of split-brain situations, you can configure additional heartbeat interfaces in SSB. These interfaces are used only to detect that the other node is still available, they are not used to synchronize data between the nodes (only heartbeat messages are transferred). For example, if the main HA interface breaks down, or is accidentally unplugged and the nodes can still access each other on the redundant HA interface, no takeover occurs, but no data is synchronized to the slave node until the main HA link is restored. Similarly, if connection on the redundant heartbeat interface is lost, but the main HA connection is available, no takeover occurs.

If a redundant heartbeat interface is configured, its status is displayed in the Basic Settings > High Availability > Redundant Heartbeat status field, and also in the HA > Redundant field of the System monitor. For a description of each possible status, see Understanding SSB cluster statuses.

The redundant heartbeat interface is a virtual interface with a virtual MAC address that uses an existing interface of SSB (for example, the external or the management interface). The MAC address of the virtual redundant heartbeat interface is displayed as HA MAC.

The MAC address of the redundant heartbeat interface is generated in a way that it cannot interfere with the MAC addresses of physical interfaces. Similarly, the HA traffic on the redundant heartbeat interface cannot interfere with any other traffic on the interface used.

If the nodes lose connection on the main HA interface, and after a time the connection is lost on the redundant heartbeat interfaces as well, the slave node becomes active. However, as the master node was active for a time when no data synchronization was possible between the nodes, this results in a split-brain situation which must be resolved before the HA functionality can be restored. For details, see Recovering from a split brain situation.

NOTE:

Even if redundant HA links are configured, if the dedicated HA link fails, the slave node will not be visible on the High Availability page anymore.

SSB nodes use UDP port 694 to send each other heartbeat signals.

This section describes how to configure a redundant heartbeat interface.

To configure a redundant heartbeat interface

  1. Navigate to Basic Settings > High Availability > Interfaces for Heartbeat.

  2. Select the interface you want to use as redundant heartbeat interface (for example External). Using an interface as a redundant heartbeat interface does not affect the original traffic of the interface.

    Figure 56: Basic Settings > High Availability > Interfaces for Heartbeat — Configuring redundant heartbeat interfaces

  3. Enter an IP address into the This node > Interface IP field of the selected interface. Note the following:

    • The two nodes must have different Interface IP.

    • If you do not use next hop monitoring on the redundant interface, you can use any Interface IP (even if otherwise it does not exist on that network).

    • If you use next hop monitoring on the redundant interface, the Interface IP address must be a real IP address that is visible from the other node.

    • If you use next hop monitoring on the redundant interface, the Interface IP must be accessible from the next-hop address, and vice-versa. For details on next hop monitoring, see Next-hop router monitoring.

  4. Enter an IP address into the Other node > Interface IP field of the selected interface. Note the following:

    • The two nodes must have different Interface IP.

    • If you do not use next hop monitoring on the redundant interface, you can use any Interface IP (even if otherwise it does not exist on that network).

    • If you use next hop monitoring on the redundant interface, the Interface IP address must be a real IP address that is visible from the other node.

    • If you use next hop monitoring on the redundant interface, the Interface IP must be accessible from the next-hop address, and vice-versa. For details on next hop monitoring, see Next-hop router monitoring.

  5. Repeat the previous steps to add additional redundant heartbeat interfaces if needed.

  6. Click Commit.

  7. Restart the nodes for the changes to take effect: click Reboot Cluster.

Related Documents

The document was helpful.

Select Rating

I easily found the information I needed.

Select Rating