syslog-ng Premium Edition 7.0.31 - Administration Guide

You can create a failure script that is executed when syslog-ng PE terminates abnormally, that is, when it exits with a non-zero exit code. For example, you can use this script to send an automatic email notification.

Prerequisites

The failure script must be the following file: /opt/syslog-ng/sbin/syslog-ng-failure, and must be executable.

To create a sample failure script

Create a file named /opt/syslog-ng/sbin/syslog-ng-failure with the following content:

#!/usr/bin/env bash
cat >>/tmp/test.txt <<EOF
$(date)
Name............$1
Chroot dir......$2
Pid file dir....$3
Pid file........$4
Cwd.............$5
Caps............$6
Reason..........$7
Argbuf..........$8
Restarting......$9

EOF

Make the file executable: chmod +x /opt/syslog-ng/sbin/syslog-ng-failure
Run the following command in the /opt/syslog-ng/sbin directory: ./syslog-ng --process-mode=safe-background; sleep 0.5; ps aux | grep './syslog-ng' | grep -v grep | awk '{print $2}' | xargs kill -KILL; sleep 0.5; cat /tmp/test.txt

The command starts syslog-ng PE in safe-background mode (which is needed to use the failure script) and then kills it. You should see that the relevant information is written into the /tmp/test.txt file, for example:
```
Thu May 18 12:08:58 UTC 2017
Name............syslog-ng
Chroot dir......NULL
Pid file dir....NULL
Pid file........NULL
Cwd.............NULL
Caps............NULL
Reason..........signalled
Argbuf..........9
Restarting......not-restarting
```

You should also see messages similar to the following in system syslog. The exact message depends on the signal (or the reason why syslog-ng PE stopped):

May 18 13:56:09 myhost supervise/syslog-ng[10820]: Daemon exited gracefully, not restarting; exitcode='0'
May 18 13:57:01 myhost supervise/syslog-ng[10996]: Daemon exited due to a deadlock/signal/failure, restarting; exitcode='131'
May 18 13:57:37 myhost supervise/syslog-ng[11480]: Daemon was killed, not restarting; exitcode='9'

The failure script should run on every non-zero exit event.

This section describes the most common error messages.

Destination queue full

Error message:

Destination queue full, dropping messages; queue_len='10000', 
log_fifo_size='10000', count='4', 
persist_name='afsocket_dd_qfile(stream,serverdown:514)'

Description:

This message indicates message loss.

Flow-control must be enabled in the log path. When flow-control is enabled, syslog-ng will stop reading messages from the sources of the log statement if the destinations are not able to process the messages at the required speed.

If flow-control is enabled, syslog-ng will only drop messages if the destination queues/window sizes are improperly sized.

Solution:

Enable flow-control in the log path.

If flow-control is disabled, syslog-ng will drop messages if the destination queues are full. Note that syslog-ng will drop messages even if the server is alive. If the remote server accepts logs at a slower rate than the sender syslog-ng receives them, the sender syslog-ng will fill up the destination queue, then drop the newer messages. Sometimes this error occurs only at a specific time interval, for example, only between 7:00 AM and 8:00 AM or between 16:00 PM and 17:00 PM when your users log in or log off and that generates a lot of messages within a short interval.

For more information, see Managing incoming and outgoing messages with flow-control.

Alert unknown CA

Error message:	SSL error while writing stream; tls_error='SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca'
Description:	This message indicates that the other (remote) side could not verify the certificate sent by syslog-ng.
Solution:	Check the logs on the remote site and identify why the receiving syslog-ng could not find the CA certificate that signed this certificate.

PEM routines:PEM_read_bio:no start line

Error message:

testuser@thor-x1:~/cert_no_start_line/certs$ openssl x509 -in cert.pem -text
								unable to load certificate
								140178126276248:error:0906D06C:PEM routines:PEM_read_bio:no start 
							line:pem_lib.c:701:Expecting: TRUSTED CERTIFICATE

Description:

The error message is displayed when using Transport Layer Security (TLS). The syslog-ng application uses OpenSSL for TLS and this message indicates that the certificate contains characters that OpenSSL cannot process.

The error occurs when the certificate comes from Windows and you want to use it on a Linux-based computer. On Windows, the end of line (EOL) character is different (\r\n) compared to Linux (\n).

To verify this, open the certificate in a text editor, for example, MCEdit. Notice the ^M characters as shown in the image below:

Figure 45: Example of OpenSSL character processing error

Solution:

On Windows, save the certificate using UTF-8, for example, using Notepad++.

NOTE: Windows Notepad is not able to save the file in normal UTF-8, even if you select it.
1. In Notepad++, from the menu, select Encoding.
2. Change the value from UTF-8-BOM to UTF-8.
3. Save.
On Linux, run dos2unix cert.pem. This will convert the file to a Linux-compatible style.

Alternatively, replace the EOL characters in the file manually.

TID is already used

Error message:

TID is already used; proto='0x202c6c0', 
								TID='61b6456d2f02052780d0d8930cbd043857c2463fcb6014b748b1450595a682', 
								client='10.140.35.9'
							Syslog connection closed;

Description:

When a client using Advanced Log Transport Protocol (ALTP) connects to the server for the first time, it generates a persistent ID and sends it to the server during the handshake process. This is the TID.

If the client loses the connection to the server silently, for example, the UTP cable is pulled from the host or other network issues happen, the server is unable to detect the connection loss.

If the client tries to reconnect within a short time interval, it will send the same TID. However, the server allows only one connection with the same TID. As the server “thinks” that it already has a live connection with this TID, it drops the new connection due to the duplicated TID.

Solution:

This error is eliminated automatically because the ALTP server will close the connection if there were no new messages from the client within the timeout frame. Once the timeout period of the ALTP server has passed, the client will be able to reconnect to the server (when the time_reopen() of the client has elapsed).

If this error message appears regularly, it means that your network may be unstable, and sometimes the client loses the connection to the server in an abnormal way.

syslog-ng Premium Edition 7.0.31 - Administration Guide

Running a failure script

Prerequisites

Stopping syslog-ng

Reporting bugs and finding help

Error messages

Destination queue full

Alert unknown CA

PEM routines:PEM_read_bio:no start line

TID is already used

Please select your product:

To serve you better, please complete the Purpose of your Chat:

Recommended Solutions for Your Problem

syslog-ng Premium Edition 7.0.31 - Administration Guide

Running a failure script

Prerequisites

Stopping syslog-ng

Reporting bugs and finding help

Error messages

Destination queue full

Alert unknown CA

PEM routines:PEM_read_bio:no start line

TID is already used