Chat now with support
Chat with Support

syslog-ng Open Source Edition 3.33 - Administration Guide

Preface Introduction to syslog-ng The concepts of syslog-ng Installing syslog-ng The syslog-ng OSE quick-start guide The syslog-ng OSE configuration file source: Read, receive, and collect log messages
How sources work default-network-drivers: Receive and parse common syslog messages internal: Collecting internal messages file: Collecting messages from text files wildcard-file: Collecting messages from multiple text files linux-audit: Collecting messages from Linux audit logs network: Collecting messages using the RFC3164 protocol (network() driver) nodejs: Receiving JSON messages from nodejs applications mbox: Converting local email messages to log messages osquery: Collect and parse osquery result logs pipe: Collecting messages from named pipes pacct: Collecting process accounting logs on Linux program: Receiving messages from external applications python: writing server-style Python sources python-fetcher: writing fetcher-style Python sources snmptrap: Read Net-SNMP traps sun-streams: Collecting messages on Sun Solaris syslog: Collecting messages using the IETF syslog protocol (syslog() driver) system: Collecting the system-specific log messages of a platform systemd-journal: Collecting messages from the systemd-journal system log storage systemd-syslog: Collecting systemd messages using a socket tcp, tcp6, udp, udp6: Collecting messages from remote hosts using the BSD syslog protocol— OBSOLETE unix-stream, unix-dgram: Collecting messages from UNIX domain sockets stdin: Collecting messages from the standard input stream
destination: Forward, send, and store log messages
amqp: Publishing messages using AMQP collectd: sending metrics to collectd elasticsearch2: Sending messages directly to Elasticsearch version 2.0 or higher (DEPRECATED) elasticsearch-http: Sending messages to Elasticsearch HTTP Bulk API file: Storing messages in plain-text files graphite: Sending metrics to Graphite Sending logs to Graylog hdfs: Storing messages on the Hadoop Distributed File System (HDFS) Posting messages over HTTP http: Posting messages over HTTP without Java kafka: Publishing messages to Apache Kafka (Java implementation) kafka(): Publishing messages to Apache Kafka (C implementation, using the librdkafka client) loggly: Using Loggly logmatic: Using Logmatic.io mongodb: Storing messages in a MongoDB database mqtt() destination: sending messages from a local network to an MQTT broker network: Sending messages to a remote log server using the RFC3164 protocol (network() driver) osquery: Sending log messages to osquery's syslog table pipe: Sending messages to named pipes program: Sending messages to external applications pseudofile() python: writing custom Python destinations redis: Storing name-value pairs in Redis riemann: Monitoring your data with Riemann slack: Sending alerts and notifications to a Slack channel smtp: Generating SMTP messages (email) from logs snmp: Sending SNMP traps Splunk: Sending log messages to Splunk sql: Storing messages in an SQL database stomp: Publishing messages using STOMP Sumo Logic destinations: sumologic-http() and sumologic-syslog() syslog: Sending messages to a remote logserver using the IETF-syslog protocol syslog-ng(): Forward logs to another syslog-ng node tcp, tcp6, udp, udp6: Sending messages to a remote log server using the legacy BSD-syslog protocol (tcp(), udp() drivers) Telegram: Sending messages to Telegram unix-stream, unix-dgram: Sending messages to UNIX domain sockets usertty: Sending messages to a user terminal: usertty() destination Write your own custom destination in Java or Python Client-side failover
log: Filter and route log messages using log paths, flags, and filters Global options of syslog-ng OSE TLS-encrypted message transfer template and rewrite: Format, modify, and manipulate log messages parser: Parse and segment structured messages db-parser: Process message content with a pattern database (patterndb) Correlating log messages Enriching log messages with external data Statistics of syslog-ng Multithreading and scaling in syslog-ng OSE Troubleshooting syslog-ng Best practices and examples The syslog-ng manual pages Creative Commons Attribution Non-commercial No Derivatives (by-nc-nd) License Glossary

hdfs: Storing messages on the Hadoop Distributed File System (HDFS)

Starting with version 3.7, syslog-ng OSE can send plain-text log files to the Hadoop Distributed File System (HDFS), allowing you to store your log data on a distributed, scalable file system. This is especially useful if you have huge amounts of log messages that would be difficult to store otherwise, or if you want to process your messages using Hadoop tools (for example, Apache Pig).

For more information about the benefits of using syslog-ng as a data collection, processing, and filtering tool in a Hadoop environment, see the blog post Filling your data lake with log messages: the syslog-ng Hadoop (HDFS) destination.

Note the following limitations when using the syslog-ng OSE hdfs destination:

  • This destination is only supported on the Linux platform.

  • Since syslog-ng OSE uses the official Java HDFS client, the hdfs destination has significant memory usage (about 400MB).

  • You cannot set when log messages are flushed. Hadoop performs this action automatically, depending on its configured block size, and the amount of data received. There is no way for the syslog-ng OSE application to influence when the messages are actually written to disk. This means that syslog-ng OSE cannot guarantee that a message sent to HDFS is actually written to disk. When using flow-control, syslog-ng OSE acknowledges a message as written to disk when it passes the message to the HDFS client. This method is as reliable as your HDFS environment.

  • The log messages of the underlying client libraries are available in the internal() source of syslog-ng OSE.

Declaration:
@include "scl.conf"

hdfs(
    client-lib-dir("/opt/syslog-ng/lib/syslog-ng/java-modules/:<path-to-preinstalled-hadoop-libraries>")
    hdfs-uri("hdfs://NameNode:8020")
    hdfs-file("<path-to-logfile>")
);
Example: Storing logfiles on HDFS

The following example defines an hdfs destination using only the required parameters.

@include "scl.conf"

destination d_hdfs {
    hdfs(
        client-lib-dir("/opt/syslog-ng/lib/syslog-ng/java-modules/:/opt/hadoop/libs")
        hdfs-uri("hdfs://10.140.32.80:8020")
        hdfs-file("/user/log/logfile.txt")
    );
};

The hdfs() driver is actually a reusable configuration snippet configured to receive log messages using the Java language-binding of syslog-ng OSE. For details on using or writing such configuration snippets, see Reusing configuration blocks. You can find the source of the hdfs configuration snippet on GitHub. For details on extending syslog-ng OSE in Java, see the Getting started with syslog-ng development guide.

NOTE: If you delete all Java destinations from your configuration and reload syslog-ng, the JVM is not used anymore, but it is still running. If you want to stop JVM, stop syslog-ng and then start syslog-ng again.

Prerequisites

To send messages from syslog-ng OSE to HDFS, complete the following steps.

Steps:
  1. If you want to use the Java-based modules of syslog-ng OSE (for example, the Elasticsearch, HDFS, or Kafka destinations), you must compile syslog-ng OSE with Java support.

    • Download and install the Java Runtime Environment (JRE), 1.7 (or newer). You can use OpenJDK or Oracle JDK, other implementations are not tested.

    • Install gradle version 2.2.1 or newer.

    • Set LD_LIBRARY_PATH to include the libjvm.so file, for example:LD_LIBRARY_PATH=/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server:$LD_LIBRARY_PATH

      Note that many platforms have a simplified links for Java libraries. Use the simplified path if available. If you use a startup script to start syslog-ng OSE set LD_LIBRARY_PATH in the script as well.

    • If you are behind an HTTP proxy, create a gradle.properties under the modules/java-modules/ directory. Set the proxy parameters in the file. For details, see The Gradle User Guide.

  2. Download the Hadoop Distributed File System (HDFS) libraries (version 2.x) from http://hadoop.apache.org/releases.html.

  3. Extract the HDFS libraries into a temporary directory, then collect the various .jar files into a single directory (for example, /opt/hadoop/lib/) where syslog-ng OSE can access them. You must specify this directory in the syslog-ng OSE configuration file. The files are located in the various lib directories under the share/ directory of the Hadoop release package. (For example, in Hadoop 2.7, required files are common/hadoop-common-2.7.0.jar, common/libs/*.jar, hdfs/hadoop-hdfs-2.7.0.jar, hdfs/lib/*, but this may change between Hadoop releases, so it is easier to copy every .jar file into a single directory.

How syslog-ng OSE interacts with HDFS

The syslog-ng OSE application sends the log messages to the official HDFS client library, which forwards the data to the HDFS nodes. The way syslog-ng OSE interacts with HDFS is described in the following steps.

  1. After syslog-ng OSE is started and the first message arrives to the hdfs destination, the hdfs destination tries to connect to the HDFS NameNode. If the connection fails, syslog-ng OSE will repeatedly attempt to connect again after the period set in time-reopen() expires.

  2. syslog-ng OSE checks if the path to the logfile exists. If a directory does not exist syslog-ng OSE automatically creates it. syslog-ng OSE creates the destination file (using the filename set in the syslog-ng OSE configuration file, with a UUID suffix to make it unique, for example, /usr/hadoop/logfile.txt.3dc1c59e-ab3b-4b71-9e81-93db477ed9d9) and writes the message into the file. After the file is created, syslog-ng OSE will write all incoming messages into the hdfs destination.

    NOTE: When the hdfs-append-enabled() option is set to true, syslog-ng OSE will not assign a new UUID suffix to an existing file, because it is then possible to open a closed file and append data to that.

    NOTE:

    You cannot set when log messages are flushed. Hadoop performs this action automatically, depending on its configured block size, and the amount of data received. There is no way for the syslog-ng OSE application to influence when the messages are actually written to disk. This means that syslog-ng OSE cannot guarantee that a message sent to HDFS is actually written to disk. When using flow-control, syslog-ng OSE acknowledges a message as written to disk when it passes the message to the HDFS client. This method is as reliable as your HDFS environment.

  3. If the HDFS client returns an error, syslog-ng OSE attempts to close the file, then opens a new file and repeats sending the message (trying to connect to HDFS and send the message), as set in the retries() parameter. If sending the message fails for retries() times, syslog-ng OSE drops the message.

  4. The syslog-ng OSE application closes the destination file in the following cases:

    • syslog-ng OSE is reloaded

    • syslog-ng OSE is restarted

    • The HDFS client returns an error.

  5. If the file is closed and you have set an archive directory, syslog-ng OSE moves the file to this directory. If syslog-ng OSE cannot move the file for some reason (for example, syslog-ng OSE cannot connect to the HDFS NameNode), the file remains at its original location, syslog-ng OSE will not try to move it again.

Storing messages with MapR-FS

The syslog-ng OSE application is also compatible with MapR File System (MapR-FS). MapR-FS provides better performance, reliability, efficiency, maintainability, and ease of use compared to the default Hadoop Distributed Files System (HDFS). To use MapR-FS with syslog-ng OSE, complete the following steps:

  1. Install MapR libraries. Instead of the official Apache HDFS libraries, MapR uses different libraries. The supported version is MapR 4.x.

    1. Download the libraries from the Maven Repository and Artifacts for MapR or get it from an already existing MapR installation.

    2. Install MapR. If you do not know how to install MapR, follow the instructions on the MapR website.

  2. In a default MapR installation, the required libraries are installed in the following path: /opt/mapr/lib.

    Enter the path where MapR was installed in the class-path option of the hdfs destination, for example:

    class-path("/opt/mapr/lib/")

    If the libraries were downloaded from the Maven Repository, the following additional libraries will be requiered. Note that the version numbers in the filenames can be different in the various Hadoop releases:commons-collections-3.2.1.jar, commons-logging-1.1.3.jar, hadoop-auth-2.5.1.jar, log4j-1.2.15.jar, slf4j-api-1.7.5.jar, commons-configuration-1.6.jar, guava-13.0.1.jar, hadoop-common-2.5.1.jar, maprfs-4.0.2-mapr.jar, slf4j-log4j12-1.7.5.jar, commons-lang-2.5.jar, hadoop-0.20.2-dev-core.jar, json-20080701.jar, protobuf-java-2.5.0.jar, zookeeper-3.4.5-mapr-1406.jar.

  3. Configure the hdfs destination in syslog-ng OSE.

    Example: Storing logfiles with MapR-FS

    The following example defines an hdfs destination for MapR-FS using only the required parameters.

    @include "scl.conf"
    
    destination d_mapr {
        hdfs(
            client-lib-dir("/opt/syslog-ng/lib/syslog-ng/java-modules/:/opt/mapr/lib/")
            hdfs-uri("maprfs://10.140.32.80")
            hdfs-file("/user/log/logfile.txt")
        );
    };
    
Related Documents

The document was helpful.

Select Rating

I easily found the information I needed.

Select Rating