Starting with version 5.3, syslog-ng PE can send plain-text log files to the Hadoop Distributed File System (HDFS), allowing you to store your log data on a distributed, scalable file system. This is especially useful if you have huge amount of log messages that would be difficult to store otherwise, or if you want to process your messages using Hadoop tools (for example, Apache Pig).
|
NOTE:
In order to use this destination, syslog-ng Premium Edition must run in server mode. Typically, only the central syslog-ng Premium Edition server uses this destination. For details on the server mode, see the section called “Server mode”. |
Note the following limitations when using the syslog-ng PE hdfs
destination:
This destination is only supported on the Linux platforms that use the linux glibc2.11
installer, including: Red Hat ES 7, Ubuntu 14.04 (Trusty Tahr).
Since syslog-ng PE uses the official Java HDFS client, the hdfs
destination has significant memory usage (about 400MB).
The syslog-ng PE application always creates a new file if the previous has been closed. Appending data to existing files is not supported.
Macros are not supported in the file path and the filename. You can use only simple file paths for your log files, for example, /usr/hadoop/logfile.txt
.
You cannot set when log messages are flushed. Hadoop performs this action automatically, depending on its configured block size, and the amount of data received. There is no way for the syslog-ng PE application to influence when the messages are actually written to disk. This means that syslog-ng PE cannot guarantee that a message sent to HDFS is actually written to disk. When using flow-control or RLTP™, syslog-ng PE acknowledges a message as written to disk when it passes the message to the HDFS client. This method is as reliable as your HDFS environment.
The log messages of the underlying client libraries are available in the internal()
source of syslog-ng PE.
Declaration:
@module mod-java @include "scl.conf" hdfs( client_lib_dir("/opt/syslog-ng/lib/syslog-ng/java-modules/:<path-to-preinstalled-hadoop-libraries>") hdfs_uri("hdfs://NameNode:8020") hdfs_file("<path-to-logfile>") );
Example 7.10. Storing logfiles on HDFS
The following example defines an hdfs
destination using only the required parameters.
@module mod-java @include "scl.conf" destination d_hdfs { hdfs( client_lib_dir("/opt/syslog-ng/lib/syslog-ng/java-modules/:/opt/hadoop/libs") hdfs_uri("hdfs://10.140.32.80:8020") hdfs_file("/user/log/logfile.txt") ); };
To install the software required for the hdfs
destination, see Procedure 7.3, “Prerequisites”.
For details on how the hdfs
destination works, see Procedure 7.4, “How syslog-ng PE interacts with HDFS”.
For details on using MapR-FS, see Procedure 7.5, “Storing messages with MapR-FS”.
For the list of options, see the section called “HDSF destination options”.
Procedure 7.3. Prerequisites
To send messages from syslog-ng PE to HDFS, complete the following steps.
Steps:
If you want to use the Java-based modules of syslog-ng PE (for example, the Elasticsearch, HDFS, or Kafka destinations), download and install the Java Runtime Environment (JRE), 1.7 (or newer).
The Java-based modules of syslog-ng PE are tested and supported when using the Oracle implementation of Java. Other implementations are untested and unsupported, they may or may not work as expected.
Download the Hadoop Distributed File System (HDFS) libraries (version 2.x) from http://hadoop.apache.org/releases.html.
Extract the HDFS libraries into a temporary directory, then collect the various .jar
files into a single directory (for example, /opt/hadoop/lib/
) where syslog-ng PE can access them. You must specify this directory in the syslog-ng PE configuration file. The files are located in the various lib
directories under the share/
directory of the Hadoop release package. (For example, in Hadoop 2.7, required files are common/hadoop-common-2.7.0.jar
, common/libs/*.jar
, hdfs/hadoop-hdfs-2.7.0.jar
, hdfs/lib/*
, but this may change between Hadoop releases, so it is easier to copy every .jar
file into a single directory.
Procedure 7.4. How syslog-ng PE interacts with HDFS
The syslog-ng PE application sends the log messages to the official HDFS client library, which forwards the data to the HDFS nodes. The way how syslog-ng PE interacts with HDFS is described in the following steps.
After syslog-ng PE is started and the first message arrives to the hdfs
destination, the hdfs
destination tries to connect to the HDFS NameNode. If the connection fails, syslog-ng PE will repeatedly attempt to connect again after the period set in time-reopen()
expires.
syslog-ng PE checks if the path to the logfile exists. If a directory does not exist syslog-ng PE automatically creates it. syslog-ng PE creates the destination file (using the filename set in the syslog-ng PE configuration file, with a UUID suffix to make it unique, for example, /usr/hadoop/logfile.txt.3dc1c59e-ab3b-4b71-9e81-93db477ed9d9
) and writes the message into the file. After the file is created, syslog-ng PE will write all incoming messages into the hdfs
destination.
|
NOTE:
You cannot set when log messages are flushed. Hadoop performs this action automatically, depending on its configured block size, and the amount of data received. There is no way for the syslog-ng PE application to influence when the messages are actually written to disk. This means that syslog-ng PE cannot guarantee that a message sent to HDFS is actually written to disk. When using flow-control or RLTP™, syslog-ng PE acknowledges a message as written to disk when it passes the message to the HDFS client. This method is as reliable as your HDFS environment. |
If the HDFS client returns an error, syslog-ng PE attempts to close the file, then opens a new file and repeats sending the message (trying to connect to HDFS and send the message), as set in the retries()
parameter. If sending the message fails for retries()
times, syslog-ng PE drops the message.
The syslog-ng PE application closes the destination file in the following cases:
syslog-ng PE is reloaded
syslog-ng PE is restarted
The HDFS client returns an error.
If the file is closed and you have set an archive directory, syslog-ng PE moves the file to this directory. If syslog-ng PE cannot move the file for some reason (for example, syslog-ng PE cannot connect to the HDFS NameNode), the file remains at its original location, syslog-ng PE will not try to move it again.
Procedure 7.5. Storing messages with MapR-FS
The syslog-ng PE application is also compatible with MapR File System (MapR-FS), starting from version 5.4, syslog-ng Premium Edition is MapR certified. MapR-FS provides better performance, reliability, efficiency, maintainability, and ease of use compared to the default Hadoop Distributed Files System (HDFS). To use MapR-FS with syslog-ng PE, complete the following steps:
Install MapR libraries. Instead of the official Apache HDFS libraries, MapR uses different libraries. The supported version is MapR 4.x.
Download the libraries from the Maven Repository and Artifacts for MapR or get it from an already existing MapR installation.
Install MapR. If you do not know how to install MapR, follow the instructions on the MapR website.
In a default MapR installation, the required libraries are installed in the following path: /opt/mapr/lib
.
Enter the path where MapR was installed in the class-path
option of the hdfs
destination, for example:
class_path("/opt/mapr/lib/")
If the libraries were downloaded from the Maven Repository, the following additional libraries will be requiered. Note that the version numbers in the filenames can be different in the various Hadoop releases:commons-collections-3.2.1.jar
, commons-logging-1.1.3.jar
, hadoop-auth-2.5.1.jar
, log4j-1.2.15.jar
, slf4j-api-1.7.5.jar
, commons-configuration-1.6.jar
, guava-13.0.1.jar
, hadoop-common-2.5.1.jar
, maprfs-4.0.2-mapr.jar
, slf4j-log4j12-1.7.5.jar
, commons-lang-2.5.jar
, hadoop-0.20.2-dev-core.jar
, json-20080701.jar
, protobuf-java-2.5.0.jar
, zookeeper-3.4.5-mapr-1406.jar
.
Configure the hdfs
destination in syslog-ng PE.
Example 7.11. Storing logfiles with MapR-FS
The following example defines an hdfs
destination for MapR-FS using only the required parameters.
@module mod-java @include "scl.conf" destination d_mapr { hdfs( client_lib_dir("/opt/syslog-ng/lib/syslog-ng/java-modules/:/opt/mapr/lib/") hdfs_uri(maprfs://10.140.32.80") hdfs_file("/user/log/logfile.txt") ); };
The hdfs
destination stores the log messages in files on the Hadoop Distributed File System (HDFS). The hdfs
destination has the following options.
The following options are required: hdfs_file()
, hdfs_uri()
. Note that to use hdfs
, you must add the following lines to the beginning of your syslog-ng PE configuration:
@module mod-java @include "scl.conf"
Type: | string |
Default: | N/A |
Description: Include the path to the directory where you copied the required libraries (see Procedure 7.3, “Prerequisites”), for example, client_lib_dir(/user/share/hadoop/lib)
.
Description: This option enables putting outgoing messages into the disk buffer of the destination to avoid message loss in case of a system failure on the destination side. It has the following options:
reliable() | |
Type: | yes|no |
Default: | no |
Description: If set to |
disk-buf-size() | |
Type: | number (bytes) |
Default: | |
Description: This is a required option. The maximum size of the disk-buffer in bytes. The minimum value is 1048576 bytes. If you set a smaller value, the minimum value will be used automatically. It replaces the old log-disk-fifo-size() option. |
mem-buf-length() | |
Type: | number (messages) |
Default: | 10000 |
Description: Use this option if the option reliable() is set to no . This option contains the number of messages stored in overflow queue. It replaces the old log-fifo-size() option. It inherits the value of the global log-fifo-size() option if provided. If it is not provided, the default value is 10000 messages. Note that this option will be ignored if the option reliable() is set to yes . |
mem-buf-size() | |
Type: | number (bytes) |
Default: | 163840000 |
Description: Use this option if the option reliable() is set to yes . This option contains the size of the messages in bytes that is used in the memory part of the disk buffer. It replaces the old log-fifo-size() option. It does not inherit the value of the global log-fifo-size() option, even if it is provided. Note that this option will be ignored if the option reliable() is set to no . |
quot-size() | |
Type: | number (messages) |
Default: | 64 |
Description: The number of messages stored in the output buffer of the destination. |
Options reliable()
and disk-buf-size()
are required options.
Example 7.12. Examples for using disk-buffer()
In the following case reliable disk-buffer() is used.
destination d_demo { network( "127.0.0.1" port(3333) disk-buffer( mem-buf-size(10000) disk-buf-size(2000000) reliable(yes) dir("/tmp/disk-buffer") ) ); };
In the following case normal disk-buffer() is used.
destination d_demo { network( "127.0.0.1" port(3333) disk-buffer( mem-buf-length(10000) disk-buf-size(2000000) reliable(no) dir("/tmp/disk-buffer") ) ); };
Type: | number (digits of fractions of a second) |
Default: | Value of the global option (which defaults to 0) |
Description: The syslog-ng application can store fractions of a second in the timestamps according to the ISO8601 format. The frac-digits()
parameter specifies the number of digits stored. The digits storing the fractions are padded by zeros if the original timestamp of the message specifies only seconds. Fractions can always be stored for the time the message was received. Note that syslog-ng can add the fractions to non-ISO8601 timestamps as well.
Type: | string |
Default: | N/A |
Description: The path where syslog-ng PE will move the closed log files. If syslog-ng PE cannot move the file for some reason (for example, syslog-ng PE cannot connect to the HDFS NameNode), the file remains at its original location. For example, hdfs_archive_dir("/usr/hdfs/archive/")
.
Type: | string |
Default: | N/A |
Description: The path and name of the log file. For example, hdfs_file("/usr/hdfs/mylogfile.txt")
. syslog-ng PE checks if the path to the logfile exists. If a directory does not exist syslog-ng PE automatically creates it.
Macros are not supported in the file path and the filename. You can use only simple file paths for your log files, for example, /usr/hadoop/logfile.txt
.
Type: | number |
Default: | 255 |
Description: The maximum length of the filename. This filename (including the UUID that syslog-ng PE appends to it) cannot be longer than what the file system permits. If the filename is longer than the value of hdfs_max_filename_length
, syslog-ng PE will automatically truncate the filename. For example, hdfs_max_filename_length("255")
.
Type: | string |
Default: | N/A |
Description: The list of Hadoop resources to load, separated by semicolons. For example, hdfs_resources("/home/user/hadoop/core-site.xml;/home/user/hadoop/hdfs-site.xml")
.
Type: | string |
Default: | N/A |
Description: The URI of the HDFS NameNode is in hdfs://IPaddress:port
or hdfs://hostname:port
format. When using MapR-FS, the URI of the MapR-FS NameNode is in maprfs://IPaddress
or maprfs://hostname
format, for example: maprfs://10.140.32.80
. The IP address of the node can be IPv4 or IPv6. For example, hdfs_uri("hdfs://10.140.32.80:8020")
. The IPv6 address must be enclosed in square brackets ([]) as specified by RFC 2732, for example, hdfs_uri("hdfs://[FEDC:BA98:7654:3210:FEDC:BA98:7654:3210]:8020")
.
Type: | number (messages) |
Default: | Use global setting. |
Description: The number of messages that the output queue can store.
Accepted values: | drop-message|drop-property|fallback-to-string|silently-drop-message|silently-drop-property|silently-fallback-to-string |
Default: | Use the global setting (which defaults to drop-message ) |
Description: Controls what happens when type-casting fails and syslog-ng PE cannot convert some data to the specified type. By default, syslog-ng PE drops the entire message and logs the error. Currently the value-pairs()
option uses the settings of on-error()
.
drop-message
: Drop the entire message and log an error message to the internal()
source. This is the default behavior of syslog-ng PE.
drop-property
: Omit the affected property (macro, template, or message-field) from the log message and log an error message to the internal()
source.
fallback-to-string
: Convert the property to string and log an error message to the internal()
source.
silently-drop-message
: Drop the entire message silently, without logging the error.
silently-drop-property
: Omit the affected property (macro, template, or message-field) silently, without logging the error.
silently-fallback-to-string
: Convert the property to string silently, without logging the error.
Type: | number (of attempts) |
Default: | 3 |
Description: The number of times syslog-ng PE attempts to send a message to this destination. If syslog-ng PE could not send a message, it will try again until the number of attempts reaches retries
, then drops the message.
Type: | string |
Default: | A format conforming to the default logfile format. |
Description: Specifies a template defining the logformat to be used in the destination. Macros are described in the section called “Macros of syslog-ng PE”. Please note that for network destinations it might not be appropriate to change the template as it changes the on-wire format of the syslog protocol which might not be tolerated by stock syslog receivers (like syslogd
or syslog-ng itself). For network destinations make sure the receiver can cope with the custom format defined.
Type: | number (messages per second) |
Default: | 0 |
Description: Sets the maximum number of messages sent to the destination per second. Use this output-rate-limiting functionality only when using disk-buffer as well to avoid the risk of losing messages. Specifying 0
or a lower value sets the output limit to unlimited.
Type: | name of the timezone, or the timezone offset |
Default: | unspecified |
Description: Convert timestamps to the timezone specified by this option. If this option is not set, then the original timezone information in the message is used. Converting the timezone changes the values of all date-related macros derived from the timestamp, for example, HOUR
. For the complete list of such macros, see the section called “Date-related macros”.
The timezone can be specified as using the name of the (for example time-zone("Europe/Budapest")
), or as the timezone offset in +/-HH:MM format (for example +01:00
). On Linux and UNIX platforms, the valid timezone names are listed under the /usr/share/zoneinfo
directory.
Type: | rfc3164, bsd, rfc3339, iso |
Default: | Use the global option (which defaults to rfc3164) |
Description: Override the global timestamp format (set in the global ts-format()
parameter) for the specific destination. For details, see the section called “A note on timezones and timestamps”.
© ALL RIGHTS RESERVED. Terms of Use Privacy Cookie Preference Center