When syslog-ng crashes for some reason, it can create a core file that contains important troubleshooting information.
To enable core files
-
Core files are produced only if the maximum core file size ulimit is set to a high value in the init script of syslog-ng.Add the following line to the init script of syslog-ng:
ulimit -c unlimited
-
Verify that syslog-ng has permissions to write the directory it is started from, for example, /opt/syslog-ng/sbin/.
-
If syslog-ng crashes, it will create a core file in the directory syslog-ng was started from.
-
To test that syslog-ng can create a core file, you can create a crash manually. For this, determine the PID of syslog-ng (for example, using the ps -All|grep syslog-ng command), then issue the following command: kill -ABRT <syslog-ng pid>
This should create a core file in the current working directory.
To properly troubleshoot certain situations, it can be useful to trace which system calls syslog-ng PE performs. How this is performed depends on the platform running syslog-ng PE. In general, note the following points:
-
When syslog-ng PE is started, a supervisor process might stay in the foreground, while the actual syslog-ng daemon goes to the background. Always trace the background process.
-
Apart from the system calls, the time between two system calls can be important as well. Make sure that your tracing tool records the time information as well. For details on how to do that, refer to the manual page of your specific tool (for example, strace on Linux, or truss on Solaris and BSD).
-
Run your tracing tool in verbose mode, and if possible, set it to print long output strings, so the messages are not truncated.
-
When using strace, also record the output of lsof to see which files are accessed.
The following are examples for tracing system calls of syslog-ng on some platforms. The output is saved into the /tmp/syslog-ng-trace.txt file, sufficed with the PID of the related syslog-ng process.The path of the syslog-ng binary assumes that you have installed syslog-ng PE from the official syslog-ng PE binaries available at the One Identity website — native distribution-specific packages may use different paths.
-
Linux: strace -o /tmp/trace.txt -s256 -ff -ttT /opt/syslog-ng/sbin/syslog-ng -f /opt/syslog-ng/etc/syslog-ng.conf -Fdv
-
HP-UX: tusc -f -o /tmp/syslog-ng-trace.txt -T /opt/syslog-ng/sbin/syslog-ng
-
IBM AIX and Solaris: truss -f -o /tmp/syslog-ng-trace.txt -r all -w all -u libc:: /opt/syslog-ng/sbin/syslog-ng -d -d -d
TIP: To execute these commands on an already running syslog-ng PE process, use the -p <pid_of_syslog-ng> parameter.
You can create a failure script that is executed when syslog-ng PE terminates abnormally, that is, when it exits with a non-zero exit code. For example, you can use this script to send an automatic email notification.
Prerequisites
The failure script must be the following file: /opt/syslog-ng/sbin/syslog-ng-failure, and must be executable.
To create a sample failure script
-
Create a file named /opt/syslog-ng/sbin/syslog-ng-failure with the following content:
#!/usr/bin/env bash
cat >>/tmp/test.txt <<EOF
$(date)
Name............$1
Chroot dir......$2
Pid file dir....$3
Pid file........$4
Cwd.............$5
Caps............$6
Reason..........$7
Argbuf..........$8
Restarting......$9
EOF
-
Make the file executable: chmod +x /opt/syslog-ng/sbin/syslog-ng-failure
-
Run the following command in the /opt/syslog-ng/sbin directory: ./syslog-ng --process-mode=safe-background; sleep 0.5; ps aux | grep './syslog-ng' | grep -v grep | awk '{print $2}' | xargs kill -KILL; sleep 0.5; cat /tmp/test.txt
The command starts syslog-ng PE in safe-background mode (which is needed to use the failure script) and then kills it. You should see that the relevant information is written into the /tmp/test.txt file, for example:
Thu May 18 12:08:58 UTC 2017
Name............syslog-ng
Chroot dir......NULL
Pid file dir....NULL
Pid file........NULL
Cwd.............NULL
Caps............NULL
Reason..........signalled
Argbuf..........9
Restarting......not-restarting
-
You should also see messages similar to the following in system syslog. The exact message depends on the signal (or the reason why syslog-ng PE stopped):
May 18 13:56:09 myhost supervise/syslog-ng[10820]: Daemon exited gracefully, not restarting; exitcode='0'
May 18 13:57:01 myhost supervise/syslog-ng[10996]: Daemon exited due to a deadlock/signal/failure, restarting; exitcode='131'
May 18 13:57:37 myhost supervise/syslog-ng[11480]: Daemon was killed, not restarting; exitcode='9'
The failure script should run on every non-zero exit event.
To avoid problems, always use the init scripts to stop syslog-ng (/etc/init.d/syslog-ng stop), instead of using the kill command. This is especially true on Solaris and HP-UX systems, here use /etc/init.d/syslog stop.