Chat now with support
Chat with Support

syslog-ng Store Box 7.2.0 - Administration Guide

Preface Introduction The concepts of SSB The Welcome Wizard and the first login Basic settings User management and access control Managing SSB Configuring message sources Storing messages on SSB Forwarding messages from SSB Log paths: routing and processing messages Configuring syslog-ng options Searching log messages Searching the internal messages of SSB Classifying messages with pattern databases The SSB RPC API Monitoring SSB Troubleshooting SSB Security checklist for configuring SSB Glossary

Monitoring SSB's I/O

Disk I/O per partition
SNMP object: UCD-DISKIO-MIB::diskIOTable

Community (v2c) /

Context (v3)

Data and system
  • sda

    If the 15-minute load (for details, see Monitoring CPU load averages) is getting close to 90%, your system does not have enough resources and you probably need to purchase more syslog-ng Store Box(SSB) appliances. For assistance, contact our Support Team.

  • sdb

    NOTE: This is only available on SSB T1 appliances.

    If the 15-minute load (for details, see Monitoring CPU load averages) is getting close to 90%, your system does not have enough resources and you probably need to purchase more SSB appliances. For assistance, contact our Support Team.

For which systems and configurations is it applicable? Applicable for all configurations and systems.
Value change frequency Its value changes quite often, depending on the I/O load and its type.
Related issues and issue indicators When I/O load is too high, the system slows down.

Solution:

  • Reconsider your configuration settings.
  • Purchase a new SSB appliance.

  • For technical assistance, contact our Support Team.
Interfaces I/O by interface name
SNMP object: RFC1213-MIB::ifTable

Community (v2c) /

Context (v3)

Data and system

The following interfaces can be monitored (the type of traffic that can affect the load):

  • eth0 - external (network, redundant HA, next hop monitoring)

  • eth1 - management (network, redundant HA, next hop monitoring)

  • eth2 - internal (redundant, next hop monitoring)

  • eth3 - HA

If the load on an interface seems to be too high, check whether you have configured SSB in a way that affects that node. For example, if you do not use a management interface, the load on the external interface can be higher. Or, configuring next hop monitoring can also increase the load on an interface.

RFC1213-MIB:ifTable
For which systems and configurations is it applicable? Applicable for all configurations and systems.
Value change frequency Its value is continuously changing, depending on incoming logs and DRBD sync.
Related issues and issue indicators I/O load may become too high on network interfaces, which may result in log loss, slow sync, and HA in degraded mode.

Solution:

  • Reconsider your configuration settings.
  • Purchase a new SSB appliance.

  • For technical assistance, contact our Support Team.
RFC1213-MIB:ifTable - ETH 0, ETH3
For which systems and configurations is it applicable? Applicable for all configurations and systems.
Value change frequency Its value is continuously changing, depending on the number of incoming logs.
Related issues and issue indicators If the I/O load is too high, the network will not handle it, which may result in log loss.

Caution:

Hazard of data loss If the I/O load becomes too high for the network to handle, it may cause log loss. To avoid log loss, reconsider your configuration settings, Alternatively, reconsider your configuration settings, upgrading the capacity of your SSB appliance, purchasing more SSB appliances, or contact our Support Team.

Solution:

RFC1213-MIB:ifTable - ETH3
For which systems and configurations is it applicable? Only applicable for HA clusters.
Value change frequency Its value is continuously changing, depending on the number of incoming logs.
Related issues and issue indicators Network traffic load too high for the NIC to handle.
(It rarely ever happens.)

Solution:

Monitoring SSB statistics

SSB's version number
SNMP object: SSB-SNMP-MIB::ssbFirmwareVersion
Type:

String

Community (v2c) /

Context (v3)

Data
Short description: Current version of the syslog-ng Store Box(SSB).

Description: The current version number of SSB. This always changes after a successful upgrade.

Number of session files on SSB
SNMP object: SSB-SNMP-MIB::ssbHTTPSessions
Type: Integer32

Community (v2c) /

Context (v3)

Data
Short description: Number of recently active HTTP-based connections to SSB.

Description: The number of session files on SSB. These are generated as a result of the following events:

  • Accessing the web user interface of SSB.
  • Accessing a remote logspace.
  • Performing an RPC API call.
For which systems and configurations is it applicable? Applicable for all configurations and systems.
Value change frequency Its value is continuously changing, depending on the number of active connections.
Related issues and issue indicators If the returned value changes too often within a short period of time, it can indicate a brute force attack.

Solution:

  • Cooperate with your network administrator to fend off the external brute force attack.

Number of core files on SSB
SNMP object: SSB-SNMP-MIB::ssbCoreFiles
Type: Integer (number of)

Community (v2c) /

Context (v3)

Data
Short description: The number of core files in SSB's core firmware.

Description: If the value of this parameter is larger than 0, contact our Support Team.

For which systems and configurations is it applicable? Applicable for all configurations and systems, but unless a core file is generated, its returned value is 0.
Value change frequency Its value does not change often, only when a core file is generated.
Related issues and issue indicators Even a single core file indicates an issue. When more than one of them appear, it indicates a more serious issue.

Solution:

  • Check the state of syslog-ng/indexer.

  • Restart your syslog-ng application.

  • For technical assistance, contact our Support Team.
Available free space on SSB
SNMP object: SSB-SNMP-MIB::ssbUnusedLogStorageCapacity
Type: Integer (% percent)

Community (v2c) /

Context (v3)

Data
Short description: Ratio of free space on SSB compared to the Disk space fill up prevention limit.

Description: The available free space on SSB.

Caution:

Hazard of data loss If the value of this parameter is constantly close to 0%, fine-tune your configuration or purchase more SSB appliances. For assistance, contact our Support Team.

If the value of this parameter reaches 0%, SSB will stop receiving logs.

If you have an Archive Policy configured, archiving will start after the value of this parameter reaches 0%. Therefore, SSB might start receiving logs again after some time has passed.

Make sure that you always have enough free space.

The definition of "enough" varies based on your specific configuration settings, for example:

  • The disk size of your SSB appliance.
  • The size, number and frequency of your incoming logs.
  • Your Policies > Backup & Archive/Cleanup settings configuration.
  • Your Basic Settings > Management > Disk space fill up prevention limit configuration. For details, see Preventing disk space fill up
  • and so on
Example: Available free space on SSB

To calculate the available free space on SSB, the following formula is used:

[Disk capacity of the core partition] - [The free space above the Basic Settings > Management > Disk space fill up prevention limit] - [The space that is already in use].

For example:

  • Disk capacity of the core partition: This is always 100%
  • The free space above the Basic Settings > Management > Disk space fill up prevention limit: If SSB is configured to Disconnect clients when disks are 90 percent used, this value is 100% - 90% = 10%
  • The space that is already in use: 35%

Available free space on SSB = 100% - 10% - 35% = 55%

For which systems and configurations is it applicable? Applicable for all configurations and systems.
Value change frequency Its value is continuously decreasing, depending on available log storage capacity.
Related issues and issue indicators As the returned value approaches 0, the available log storage capacity is continuously decreasing.

Solution:

  • Archive your logs or store them in some other way (for example, forward your log messages to different logstores).
  • Consider upgrading the capacity of your SSB appliance or purchasing more SSB appliances (for more information, see Hardware specifications or contact our Sales Team).

Monitoring the HA cluster

The status of the HA cluster
SNMP object: SSB-SNMP-MIB::ssbHAClusterStatus
Type: String

Community (v2c) /

Context (v3)

Data
Short description: Status of the HA cluster.

Description: The status of the syslog-ng Store Box(SSB) cluster. For details, see Status.

For which systems and configurations is it applicable? Only applicable for HA clusters.
Value change frequency When the HA cluster functions properly, this SNMP object should be in ha status in the majority of the cases. The rest of the status returned values (for example, degraded) may also occur occasionally, but the ha status should be dominant as a rule.
Related issues and issue indicators If while in a HA cluster, the status returned value is not ha or sync, the HA cluster is in degraded mode.

Solution:

  • Check your HA network.
  • Reboot the Secondary node.
  • Reboot the HA cluster.
  • For technical assistance,contact our Support Team.
The status of the Redundant Heartbeat interface
SNMP object: SSB-SNMP-MIB::ssbHARedundantHeartbeatStatus
Type: String

Community (v2c) /

Context (v3)

Data
Short description: Status of the Redundant Heartbeat interface.

Description: The status of the Redundant Heartbeat interface. For details, see Redundant Heartbeat status.

For which systems and configurations is it applicable? Only applicable for HA clusters, but it only has a returned value if Redundant HA is configured.
Value change frequency When the cluster functions properly, it should be in ok returned value status in the majority of the cases. The rest of the status returned values (for example, degraded) may also occur occasionally, but the ok status should be dominant as a rule.
Related issues and issue indicators Sometimes this SNMP objects has an ok status, but the HAClusterStatus is not ok. The HA cluster will function properly in this case, too.
The synchronization progress of HA nodes
SNMP object: SSB-SNMP-MIB::ssbHASynchronizationProgress
Type: Integer32 (0..100 %)

Community (v2c) /

Context (v3)

Data
Short description: HA cluster synchronization progress (in percent). 100%, if the cluster is fully synchronized.

Description: This value can be important in the following cases:

  • When enabling HA mode the first time, after navigating to Basic Settings > High Availability and clicking Convert to Cluster, the synchronization process starts. This value will start at 0% and will gradually increase to 100%. When it reaches 100%, it means that the conversion has been finished and the nodes are now in HA status.

  • If one of your nodes becomes unavailable and you decide to reinstall SSB, you will have to rejoin your cluster again by navigating to Basic Settings > High Availability and clicking Join HA. This will start the synchronization progress from 0% again and will gradually increase to 100%. When it reaches 100%, it means that the join progress has been finished and the nodes are now in HA status again.

  • If a node becomes unavailable for a longer period and then gets joined again, it can be possible that the configuration of the two nodes become different. In this case, the two nodes start the synchronization process again so that the new changes are transferred to the previously unavailable node. This does not necessarily mean that the synchronization value will start at 0%, it is possible that it starts from a number somewhere between 0% and 100%.

For which systems and configurations is it applicable? Only applicable for HA clusters.
Value change frequency Following a conversion to a HA cluster, its returned value continuously increases till reaching 100%. After reaching 100%, its returned value rarely changes - or does not change at all.
Related issues and issue indicators When 100% has not yet been reached, but the process still does not change for a long time.

Solution:

  • Check your HA network.
  • Reboot the Secondary node.
  • Reboot the HA cluster.
  • For technical assistance, contact our Support Team.
Determining whether the HA node is the primary node
SNMP object: SSB-SNMP-MIB::ssbHAIsPrimary
Type: TruthValue (SNMP boolean value)

Community (v2c) /

Context (v3)

System
Short description: The current HA node is the primary node

Description: This information is only supplied on HA-cluster nodes and it is available on the SNMP community provided by the boot-firmwares (the ID-based communities on the Basic Settings > Monitoring > SNMP agent settings page).

You can monitor which node is the primary HA node, that is, which node is responsible for SSB's business logic. For example, HTTP configuration, log management (syslog-ng, archive, backup), and so on.

Monitoring hardware RAID

SNMP object: SSB-SNMP-MIB::ssbHardwareRaid
Type: This is a grouping node

Community (v2c) /

Context (v3)

System
Short description: Detailed information about hardware Raid devices

Description: Monitor syslog-ng Store Box(SSB)'s hardware RAID, which is responsible for providing disk availability (https://en.wikipedia.org/wiki/RAID#Hardware-based), for example, if a disk fails. It is used to monitor the status of the disks in an SSB appliance.

Available on SSB appliances (except T1), on the SNMP community provided by the boot-firmwares, that is, the ID-based communities on the Basic Settings > Monitoring > SNMP agent settings page.

RAID controller battery state
SNMP object: SSB-SNMP-MIB::ssbHardwareRaidBatteryState
Type: string

Community (v2c) /

Context (v3)

System
Short description: The battery state of the raid controller. The value of State from Cachevault_Info, or BBU_Info table of StorCLI.
For which systems and configurations is it applicable? When this particular hardware supports this particular (T4, T10, S, M) RAID type.
Value change frequency Its returned value is not supposed to change. When it does, it indicates an issue.
Related issues and issue indicators The issue generally occurs due to power outage or hardware error (for example, natural battery amortization).

Solution:

  • For technical assistance, contact our Support Team.
  • Check power supply (for example, check if the power cord is damaged, or if the machine is running from battery, and so on).
RAID controller firmware version
SNMP object: SSB-SNMP-MIB::ssbHardwareRaidControllerFirmwareVersion
Type: string

Community (v2c) /

Context (v3)

System
Short description: Version of the controller's firmware. This value is reported by StorCLI.
For which systems and configurations is it applicable? When this particular hardware does not support this particular (T4, T10, S, M) RAID type.
Value change frequency Not too often, only in case of RAID firmware update.
Hardware RAID status
SNMP object: SSB-SNMP-MIB::ssbHardwareRaidStatus
Type: string

Community (v2c) /

Context (v3)

System
Short description: Status of the hardware raid.
For which systems and configurations is it applicable? When this particular hardware supports this particular (T4, T10, S, M) RAID type.
Value change frequency Its value does not change too often.
Related issues and issue indicators A status returned value other than the optimal active indicates that the RAID is in degraded mode.
Hardware RAID synchronization progress
SNMP object: SSB-SNMP-MIB::ssbHardwareRaidSyncRate
Type: Integer32

Community (v2c) /

Context (v3)

System
Short description: Progress of hardware raid synchronization (in percent).
For which systems and configurations is it applicable? When this particular hardware supports this particular (T4, T10, S, M) RAID type.
Value change frequency When not syncing, it has no returned value. Otherwise, its value is continuously changing and may drop when resyncing.
Related issues and issue indicators When the progress does not change for a longer period of time, it indicates an issue.

Solution:

Related Documents

The document was helpful.

Select Rating

I easily found the information I needed.

Select Rating