parser: Parse and segment structured messages
The filters and default macros of syslog-ng work well on the headers and metainformation of the log messages, but are rather limited when processing the content of the messages. Parsers can segment the content of the messages into name-value pairs, and these names can be used as user-defined macros. Subsequent filtering or other type of processing of the message can use these custom macros to refer to parts of the message. Parsers are global objects most often used together with filters and rewrite rules.
The syslog-ng OSE application provides the following possibilities to parse the messages, or parts of the messages:
-
By default, syslog-ng OSE parses every message as a syslog message. To disable message parsing, use the flags(no-parse) option of the source. To explicitly parse a message as a syslog message, use the syslog parser. For details, see Parsing syslog messages.
-
To segment a message into columns using a CSV-parser, see Parsing messages with comma-separated and similar values.
-
To segment a message consisting of whitespace or comma-separated key=value pairs (for example, Postfix log messages), see Parsing key=value pairs.
-
To parse JSON-formatted messages, see JSON parser.
-
To parse XML-formatted messages, see XML parser.
-
To identify and parse the messages using a pattern database, see db-parser: Process message content with a pattern database (patterndb).
-
To parse a specially-formatted date or timestamp, see Parsing dates and timestamps.
-
To write a custom parser in Python or Hy, see Python parser.
-
To parse the tags sent by another syslog-ng host. For details, see Parsing tags.
The syslog-ng OSE application provides built-in parsers for the following application logs:
-
Apache HTTP server access logs. For details, see Apache access log parser.
-
Cisco devices. For details, see Cisco parser.
-
Messages formatted using the enterprise-wide message model (EWMM) of syslog-ng OSE. For details, see Parsing enterprise-wide message model (EWMM) messages.
-
Iptables logs. For details, see iptables parser.
-
Linux Audit (auditd) logs. For details, see Linux audit parser.
-
Netskope log messages. For details, see Netskope parser.
-
osquery result logs. For details, see osquery: Collect and parse osquery result logs.
-
SNMP traps of the Net-SNMP's snmptrapd application. For details, see snmptrap: Read Net-SNMP traps.
-
sudo logs. For details, see Sudo parser.
-
Websense Content Gateway (Raytheon|Websense, now Forcepoint) log messages. For details, see Websense parser.
By default, syslog-ng OSE parses every message using the syslog-parser as a syslog message, and fills the macros with values of the message. The syslog-parser does not discard messages: the message cannot be parsed as a syslog message, the entire message (including its header) is stored in the $MSG macro. If you do not want to parse the message as a syslog message, use the flags(no-parse) option of the source.
You can also use the syslog-parser to explicitly parse a message, or a part of a message as a syslog message (for example, after rewriting the beginning of a message that does not comply with the syslog standards).
Example: Using junctions
For example, suppose that you have a single network source that receives log messages from different devices, and some devices send messages that are not RFC-compliant (some routers are notorious for that). To solve this problem in earlier versions of syslog-ng OSE, you had to create two different network sources using different IP addresses or ports: one that received the RFC-compliant messages, and one that received the improperly formatted messages (for example, using the flags(no-parse) option). Using junctions this becomes much more simple: you can use a single network source to receive every message, then use a junction and two channels. The first channel processes the RFC-compliant messages, the second everything else. At the end, every message is stored in a single file. The filters used in the example can be host() filters (if you have a list of the IP addresses of the devices sending non-compliant messages), but that depends on your environment.
log {
source {
syslog(
ip(10.1.2.3)
transport("tcp")
flags(no-parse)
);
};
junction {
channel {
filter(f_compliant_hosts);
parser {
syslog-parser();
};
};
channel {
filter(f_noncompliant_hosts);
};
};
destination {
file("/var/log/messages");
};
};
Since every channel receives every message that reaches the junction, use the flags(final) option in the channels to avoid the unnecessary processing the messages multiple times:
log {
source {
syslog(
ip(10.1.2.3)
transport("tcp")
flags(no-parse)
);
};
junction {
channel {
filter(f_compliant_hosts);
parser {
syslog-parser();
};
flags(final);
};
channel {
filter(f_noncompliant_hosts);
flags(final);
};
};
destination {
file("/var/log/messages");
};
};
Note that syslog-ng OSE has several parsers that you can use to parse non-compliant messages. You can even write a custom syslog-ng parser in Python. For details, see parser: Parse and segment structured messages.
Note that by default, the syslog-parser attempts to parse the message as an RFC3164-formatted (BSD-syslog) message. To parse the message as an RFC5424-formatted message, use the flags(syslog-protocol) option in the parser.
syslog-parser(flags(syslog-protocol));
The syslog-parser() has the following options:
default-facility()
Type: |
facility string |
Default: |
kern |
Description: This parameter assigns a facility value to the messages received from the file source if the message does not specify one.
default-priority()
Type: |
priority string |
Default: |
|
Description: This parameter assigns an emergency level to the messages received from the file source if the message does not specify one. For example, default-priority(warning).
drop-invalid()
Type: |
yes or no |
Values: |
yes|no |
Default: |
no |
Description: This option determines how the syslog-parser() affects messages when parsing fails.
If you set drop-invalid() to yes, syslog-parser() will drop the message if the parsing fails.
If you set drop-invalid() to no, the parsing error triggers syslog-parser() to rewrite and extend the original log message with the following additional information:
- It prepends the following message to the contents of the $MESSAGE field: Error processing log message.
- It sets the contents of the $PROGRAM field to syslog-ng.
- It sets the contents of the facility field to syslog.
- It sets the contents of the severity field to error.
NOTE: With the drop-invalid(no) option syslog-parser() will work in the same way as the sources which receive syslog-protocol/BSD-format messages.
Example: enabling the drop-invalid() option
parser p_syslog { syslog-parser(drop-invalid(yes)); };
flags()
Type: |
assume-utf8, empty-lines, expect-hostname, kernel, no-hostname, no-multi-line, no-parse, sanitize-utf8, store-legacy-msghdr, store-raw-message, syslog-protocol, validate-utf8 |
Default: |
empty set |
Description: Specifies the log parsing options of the source.
-
assume-utf8: The assume-utf8 flag assumes that the incoming messages are UTF-8 encoded, but does not verify the encoding. If you explicitly want to validate the UTF-8 encoding of the incoming message, use the validate-utf8 flag.
-
empty-lines: Use the empty-lines flag to keep the empty lines of the messages. By default, syslog-ng OSE removes empty lines automatically.
-
expect-hostname: If the expect-hostname flag is enabled, syslog-ng OSE will assume that the log message contains a hostname and parse the message accordingly. This is the default behavior for TCP sources. Note that pipe sources use the no-hostname flag by default.
-
guess-timezone: Attempt to guess the timezone of the message if this information is not available in the message. Works when the incoming message stream is close to real time, and the timezone information is missing from the timestamp.
-
kernel: The kernel flag makes the source default to the LOG_KERN | LOG_NOTICE priority if not specified otherwise.
-
no-header: The no-header flag triggers syslog-ng OSE to parse only the PRI field of incoming messages, and put the rest of the message contents into $MSG.
Its functionality is similar to that of the no-parse flag, except the no-header flag does not skip the PRI field.
NOTE: Essentially, the no-header flag signals syslog-ng OSE that the syslog header is not present (or does not adhere to the conventions / RFCs), so the entire message (except from the PRI field) is put into $MSG.
Example: using the no-header flag with the syslog-parser() parser
The following example illustrates using the no-header flag with the syslog-parser() parser:
parser p_syslog {
syslog-parser(
flags(no-header)
);
};
-
no-hostname: Enable the no-hostname flag if the log message does not include the hostname of the sender host. That way syslog-ng OSE assumes that the first part of the message header is ${PROGRAM} instead of ${HOST}. For example:
source s_dell {
network(
port(2000)
flags(no-hostname)
);
};
-
no-multi-line: The no-multi-line flag disables line-breaking in the messages: the entire message is converted to a single line. Note that this happens only if the underlying transport method actually supports multi-line messages. Currently the file() and pipe() drivers support multi-line messages.
-
no-parse: By default, syslog-ng OSE parses incoming messages as syslog messages. The no-parse flag completely disables syslog message parsing and processes the complete line as the message part of a syslog message. The syslog-ng OSE application will generate a new syslog header (timestamp, host, and so on) automatically and put the entire incoming message into the MESSAGE part of the syslog message (available using the ${MESSAGE} macro). This flag is useful for parsing messages not complying to the syslog format.
If you are using the flags(no-parse) option, then syslog message parsing is completely disabled, and the entire incoming message is treated as the ${MESSAGE} part of a syslog message. In this case, syslog-ng OSE generates a new syslog header (timestamp, host, and so on) automatically. Note that even though flags(no-parse) disables message parsing, some flags can still be used, for example, the no-multi-line flag.
-
dont-store-legacy-msghdr: By default, syslog-ng stores the original incoming header of the log message. This is useful if the original format of a non-syslog-compliant message must be retained (syslog-ng automatically corrects minor header errors, for example, adds a whitespace before msg in the following message: Jan 22 10:06:11 host program:msg). If you do not want to store the original header of the message, enable the dont-store-legacy-msghdr flag.
-
sanitize-utf8: When using the sanitize-utf8 flag, syslog-ng OSE converts non-UTF-8 input to an escaped form, which is valid UTF-8.
-
store-raw-message: Save the original message as received from the client in the ${RAWMSG} macro. You can forward this raw message in its original form to another syslog-ng node using the syslog-ng() destination, or to a SIEM system, ensuring that the SIEM can process it. Available only in 3.16 and later.
-
syslog-protocol: The syslog-protocol flag specifies that incoming messages are expected to be formatted according to the new IETF syslog protocol standard (RFC5424), but without the frame header. Note that this flag is not needed for the syslog driver, which handles only messages that have a frame header.
-
validate-utf8: The validate-utf8 flag enables encoding-verification for messages formatted according to the new IETF syslog standard (for details, see IETF-syslog messages). If the BOM character is missing, but the message is otherwise UTF-8 compliant, syslog-ng automatically adds the BOM character to the message.
template()
Synopsis: |
template("${<macroname>}") |
Description: The macro that contains the part of the message that the parser will process. It can also be a macro created by a previous parser of the log path. By default, the parser processes the entire message (${MESSAGE}).
The syslog-ng OSE application can separate parts of log messages (that is, the contents of the ${MESSAGE} macro) at delimiter characters or strings to named fields (columns). One way to achieve this is to use a csv (comma-separated-values) parser (for other methods and possibilities, see the other sections of parser: Parse and segment structured messages. The parsed fields act as user-defined macros that can be referenced in message templates, file- and tablenames, and so on.
Parsers are similar to filters: they must be defined in the syslog-ng OSE configuration file and used in the log statement. You can also define the parser inline in the log path.
NOTE: The order of filters, rewriting rules, and parsers in the log statement is important, as they are processed sequentially.
To create a csv-parser(), you have to define the columns of the message, the separator characters or strings (also called delimiters, for example, semicolon or tabulator), and optionally the characters that are used to escape the delimiter characters (quote-pairs()).
Declaration:
parser <parser_name> {
csv-parser(
columns(column1, column2, ...)
delimiters(chars("<delimiter_characters>"), strings("<delimiter_strings>"))
);
};
Column names work like macros.
Names starting with a dot (for example, .example) are reserved for use by syslog-ng OSE. If you use such a macro name as the name of a parsed value, it will attempt to replace the original value of the macro (note that only soft macros can be overwritten, see Hard versus soft macros for details). To avoid such problems, use a prefix when naming the parsed values, for example, prefix(my-parsed-data.)
Example: Segmenting hostnames separated with a dash
The following example separates hostnames like example-1 and example-2 into two parts.
parser p_hostname_segmentation {
csv-parser(columns("HOSTNAME.NAME", "HOSTNAME.ID")
delimiters("-")
flags(escape-none)
template("${HOST}"));
};
destination d_file {
file("/var/log/messages-${HOSTNAME.NAME:-examplehost}");
};
log {
source(s_local);
parser(p_hostname_segmentation);
destination(d_file);
};
Example: Parsing Apache log files
The following parser processes the log of Apache web servers and separates them into different fields. Apache log messages can be formatted like:
"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %T %v"
Here is a sample message:
192.168.1.1 - - [31/Dec/2007:00:17:10 +0100] "GET /cgi-bin/example.cgi HTTP/1.1" 200 2708 "-" "curl/7.15.5 (i4 86-pc-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8c zlib/1.2.3 libidn/0.6.5" 2 example.mycompany
To parse such logs, the delimiter character is set to a single whitespace (delimiters(" ")). Whitespaces between quotes and brackets are ignored (quote-pairs('""[]')).
parser p_apache {
csv-parser(
columns("APACHE.CLIENT_IP", "APACHE.IDENT_NAME", "APACHE.USER_NAME",
"APACHE.TIMESTAMP", "APACHE.REQUEST_URL", "APACHE.REQUEST_STATUS",
"APACHE.CONTENT_LENGTH", "APACHE.REFERER", "APACHE.USER_AGENT",
"APACHE.PROCESS_TIME", "APACHE.SERVER_NAME")
flags(escape-double-char,strip-whitespace)
delimiters(" ")
quote-pairs('""[]')
);
};
The results can be used for example, to separate log messages into different files based on the APACHE.USER_NAME field. If the field is empty, the nouser name is assigned.
log {
source(s_local);
parser(p_apache);
destination(d_file);
};
destination d_file {
file("/var/log/messages-${APACHE.USER_NAME:-nouser}");
};
Example: Segmenting a part of a message
Multiple parsers can be used to split a part of an already parsed message into further segments. The following example splits the timestamp of a parsed Apache log message into separate fields.
parser p_apache_timestamp {
csv-parser(
columns("APACHE.TIMESTAMP.DAY", "APACHE.TIMESTAMP.MONTH", "APACHE.TIMESTAMP.YEAR", "APACHE.TIMESTAMP.HOUR", "APACHE.TIMESTAMP.MIN", "APACHE.TIMESTAMP.SEC", "APACHE.TIMESTAMP.ZONE")
delimiters("/: ")
flags(escape-none)
template("${APACHE.TIMESTAMP}")
);
};
log {
source(s_local);
parser(p_apache);
parser(p_apache_timestamp);
destination(d_file);
};