To send messages from syslog-ng OSE to HDFS, complete the following steps.
If you want to use the Java-based modules of syslog-ng OSE (for example, the Elasticsearch, HDFS, or Kafka destinations), you must compile syslog-ng OSE with Java support.
Download and install the Java Runtime Environment (JRE), 1.7 (or newer).
Install gradle version 2.2.1 or newer.
Set LD_LIBRARY_PATH to include the libjvm.so file, for example:LD_LIBRARY_PATH=/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server:$LD_LIBRARY_PATH
Note that many platforms have a simplified links for Java libraries. Use the simplified path if available. If you use a startup script to start syslog-ng OSE set LD_LIBRARY_PATH in the script as well.
If you are behind an HTTP proxy, create a gradle.properties under the modules/java-modules/ directory. Set the proxy parameters in the file. For details, see The Gradle User Guide.
Download the Hadoop Distributed File System (HDFS) libraries (version 2.x) from http://hadoop.apache.org/releases.html.
Extract the HDFS libraries into a temporary directory, then collect the various .jar files into a single directory (for example, /opt/hadoop/lib/) where syslog-ng OSE can access them. You must specify this directory in the syslog-ng OSE configuration file. The files are located in the various lib directories under the share/ directory of the Hadoop release package. (For example, in Hadoop 2.7, required files are common/hadoop-common-2.7.0.jar, common/libs/*.jar, hdfs/hadoop-hdfs-2.7.0.jar, hdfs/lib/*, but this may change between Hadoop releases, so it is easier to copy every .jar file into a single directory.
The syslog-ng OSE application sends the log messages to the official HDFS client library, which forwards the data to the HDFS nodes. The way syslog-ng OSE interacts with HDFS is described in the following steps.
After syslog-ng OSE is started and the first message arrives to the hdfs destination, the hdfs destination tries to connect to the HDFS NameNode. If the connection fails, syslog-ng OSE will repeatedly attempt to connect again after the period set in time-reopen() expires.
syslog-ng OSE checks if the path to the logfile exists. If a directory does not exist syslog-ng OSE automatically creates it. syslog-ng OSE creates the destination file (using the filename set in the syslog-ng OSE configuration file, with a UUID suffix to make it unique, for example, /usr/hadoop/logfile.txt.3dc1c59e-ab3b-4b71-9e81-93db477ed9d9) and writes the message into the file. After the file is created, syslog-ng OSE will write all incoming messages into the hdfs destination.
|
NOTE:
When the hdfs-append-enabled() option is set to true, syslog-ng OSE will not assign a new UUID suffix to an existing file, because it is then possible to open a closed file and append data to that. |
|
NOTE:
You cannot set when log messages are flushed. Hadoop performs this action automatically, depending on its configured block size, and the amount of data received. There is no way for the syslog-ng OSE application to influence when the messages are actually written to disk. This means that syslog-ng OSE cannot guarantee that a message sent to HDFS is actually written to disk. When using flow-control, syslog-ng OSE acknowledges a message as written to disk when it passes the message to the HDFS client. This method is as reliable as your HDFS environment. |
If the HDFS client returns an error, syslog-ng OSE attempts to close the file, then opens a new file and repeats sending the message (trying to connect to HDFS and send the message), as set in the retries() parameter. If sending the message fails for retries() times, syslog-ng OSE drops the message.
The syslog-ng OSE application closes the destination file in the following cases:
syslog-ng OSE is reloaded
syslog-ng OSE is restarted
The HDFS client returns an error.
If the file is closed and you have set an archive directory, syslog-ng OSE moves the file to this directory. If syslog-ng OSE cannot move the file for some reason (for example, syslog-ng OSE cannot connect to the HDFS NameNode), the file remains at its original location, syslog-ng OSE will not try to move it again.
The syslog-ng OSE application is also compatible with MapR File System (MapR-FS). MapR-FS provides better performance, reliability, efficiency, maintainability, and ease of use compared to the default Hadoop Distributed Files System (HDFS). To use MapR-FS with syslog-ng OSE, complete the following steps:
Install MapR libraries. Instead of the official Apache HDFS libraries, MapR uses different libraries. The supported version is MapR 4.x.
Download the libraries from the Maven Repository and Artifacts for MapR or get it from an already existing MapR installation.
Install MapR. If you do not know how to install MapR, follow the instructions on the MapR website.
In a default MapR installation, the required libraries are installed in the following path: /opt/mapr/lib.
Enter the path where MapR was installed in the class-path option of the hdfs destination, for example:
class-path("/opt/mapr/lib/")
If the libraries were downloaded from the Maven Repository, the following additional libraries will be requiered. Note that the version numbers in the filenames can be different in the various Hadoop releases:commons-collections-3.2.1.jar, commons-logging-1.1.3.jar, hadoop-auth-2.5.1.jar, log4j-1.2.15.jar, slf4j-api-1.7.5.jar, commons-configuration-1.6.jar, guava-13.0.1.jar, hadoop-common-2.5.1.jar, maprfs-4.0.2-mapr.jar, slf4j-log4j12-1.7.5.jar, commons-lang-2.5.jar, hadoop-0.20.2-dev-core.jar, json-20080701.jar, protobuf-java-2.5.0.jar, zookeeper-3.4.5-mapr-1406.jar.
Configure the hdfs destination in syslog-ng OSE.
The following example defines an hdfs destination for MapR-FS using only the required parameters.
@module mod-java @include "scl.conf" destination d_mapr { hdfs( client-lib-dir("/opt/syslog-ng/lib/syslog-ng/java-modules/:/opt/mapr/lib/") hdfs-uri("maprfs://10.140.32.80") hdfs-file("/user/log/logfile.txt") ); };
Version
|
NOTE:
If you configure Kerberos authentication for a hdfs() destination, it affects all hdfs() destinations. Kerberos and non-Kerberos hdfs() destinations cannot be mixed in a syslog-ng OSE configuration. This means that if one hdfs() destination uses Kerberos authentication, you have to configure all other hdfs() destinations to use Kerberos authentication too. Failing to do so results in non-Kerberos hdfs() destinations being unable to authenticate to the HDFS server. |
|
NOTE:
If you want to configure your hdfs() destination to stop using Kerberos authentication, namely, to remove Kerberos-related options from the hdfs() destination configuration, make sure to restart syslog-ng OSE for the changes to take effect. |
You have configured your Hadoop infrastructure to use Kerberos authentication.
You have a keytab file and a principal for the host running syslog-ng OSE. For details, see the Kerberos documentation.
You have installed and configured the Kerberos client packages on the host running syslog-ng OSE. (That is, Kerberos authentication works for the host, for example, from the command line using the kinit user@REALM -k -t <keytab_file> command.)
destination d_hdfs { hdfs(client-lib-dir("/hdfs-libs/lib") hdfs-uri("hdfs://hdp-kerberos.syslog-ng.example:8020") kerberos-keytab-file("/opt/syslog-ng/etc/hdfs.headless.keytab") kerberos-principal("hdfs-hdpkerberos@MYREALM") hdfs-file("/var/hdfs/test.log")); };
© 2024 One Identity LLC. ALL RIGHTS RESERVED. Terms of Use Privacy Cookie Preference Center