Filters and substitution rewrite rules can use regular expressions. In regular expressions, the characters ()[].*?+^$|\
are used as special symbols. Depending on how you want to use these characters and which quotation mark you use, these characters must be used differently, as summarized below.
Strings between single quotes ('string'
) are treated literally and are not interpreted at all, you do not have to escape special characters. For example the output of '\x41'
is \x41
(characters as follows: backslash, x
(letter), 4
(number), 1
(number)). This makes writing and reading regular expressions much more simple: it is recommended to use single quotes when writing regular expressions.
When enclosing strings between double-quotes ("string"
), the string is interpreted and you have to escape special characters, that is, to precede them with a backslash (\
) character if they are meant literally. For example the output of the "\x41"
is simply the letter a
. Therefore special characters like \
(backslash) or "
(quotation mark) must be escaped (\\
and \"
). The following expressions are interpreted: \a
, \n
, \r
, \t
, \v
. For example, the \$40
expression matches the $40
string. Backslashes have to be escaped as well if they are meant literally, for example, the \\d
expression matches the \d
string.
|
TIP:
If you use single quotes, you do not need to escape the backslash, for example |
Enclosing alphanumeric strings between double-quotes ("string"
) is not necessary, you can just omit the double-quotes. For example when writing filters, match("sometext")
and match(sometext)
will both match for the sometext
string.
|
NOTE:
Only strings containing alphanumerical characters can be used without quotes or double quotes. If the string contains whitespace or any special characters ( When using the |
By default, all regular expressions are case sensitive. To disable the case sensitivity of the expression, add the flags(ignore-case)
option to the regular expression.
filter demo_regexp_insensitive { host("system" flags(ignore-case)); };
The regular expressions can use up to 255 regexp matches (${1} ... ${255}
), but only from the last filter and only if the flags("store-matches")
flag was set for the filter. For case-insensitive searches, use the flags("ignore-case")
option.
By default, syslog-ng uses POSIX-style regular expressions. To use other expression types, add the type()
option after the regular expression.
The syslog-ng PE application supports the following expression types:
Description: Use POSIX regular expressions. If the type()
parameter is not specified, syslog-ng uses POSIX regular expressions by default.
Posix regular expressions have the following flag options:
global: Usable only in rewrite rules: match for every occurrence of the expression, not only the first one.
ignore-case: Disable case-sensitivity.
store-matches: Store the matches of the regular expression into the $0, ... $255
variables. The $0
stores the entire match, $1
is the first group of the match (parentheses), and so on. Matches from the last filter expression can be referenced in regular expressions.
Example 14.19. Using Posix regular expressions
filter f_message { message("keyword" flags("utf8" "ignore-case") ); };
Description: Use Perl Compatible Regular Expressions (PCRE). Starting with syslog-ng PE version 3.1, PCRE expressions are supported on every platform.
PCRE regular expressions have the following flag options:
global: Usable only in rewrite rules: match for every occurrence of the expression, not only the first one.
ignore-case: Disable case-sensitivity.
store-matches: Store the matches of the regular expression into the $0, ... $255
variables. The $0
stores the entire match, $1
is the first group of the match (parentheses), and so on. Named matches (also called named subpatterns), for example (?<name>...)
, are stored as well. Matches from the last filter expression can be referenced in regular expressions.
unicode: Use Unicode support for UTF-8 matches: UTF-8 character sequences are handled as single characters.
utf8: An alias for the unicode
flag.
Example 14.20. Using PCRE regular expressions
rewrite r_rewrite_subst {subst("a*", "?", value("MESSAGE") type("pcre") flags("utf8" "global")); };
Description: Match the strings literally, without regular expression support. By default, only identical strings are matched. For partial matches, use the flags("prefix")
or the flags("substring")
flags.
Description: Match the strings against a pattern containing '*' and '?' wildcards, without regular expression and character range support. The advantage of glob patterns to regular expressions is that globs can be processed much faster.
matches an arbitrary string, including an empty string
matches an arbitrary character
|
NOTE:
|
The host()
, match()
, and program()
filter functions and some other syslog-ng objects accept regular expressions as parameters. But evaluating general regular expressions puts a high load on the CPU, which can cause problems when the message traffic is very high. Often the regular expression can be replaced with simple filter functions and logical operators. Using simple filters and logical operators, the same effect can be achieved at a much lower CPU load.
Example 14.21. Optimizing regular expressions in filters
Suppose you need a filter that matches the following error message logged by the xntpd
NTP daemon:
xntpd[1567]: time error -1159.777379 is too large (set clock manually);
The following filter uses regular expressions and matches every instance and variant of this message.
filter f_demo_regexp { program("demo_program") and match("time error .* is too large .* set clock manually"); };
Segmenting the match()
part of this filter into separate match()
functions greatly improves the performance of the filter.
filter f_demo_optimized_regexp { program("demo_program") and match("time error") and match("is too large") and match("set clock manually"); };
The filters and default macros of syslog-ng work well on the headers and metainformation of the log messages, but are rather limited when processing the content of the messages. Parsers can segment the content of the messages into name-value pairs, and these names can be used as user-defined macros. Subsequent filtering or other type of processing of the message can use these custom macros to refer to parts of the message. Parsers are global objects most often used together with filters and rewrite rules.
syslog-ng PE provides the following possibilities to parse the messages, or parts of the messages:
To segment a message into columns using a CSV-parser, see the section called “Parsing messages with comma-separated and similar values”.
To segment a message consisting of whitespace or comma-separated key=value
pairs (for example, Postfix log messages), see the section called “Parsing key=value
pairs”.
To parse JSON-formatted messages, see the section called “The JSON parser”.
To identify and parse the messages using a pattern database, see Chapter 16, Processing message content with a pattern database.
The syslog-ng PE application can separate parts of log messages (that is, the contents of the ${MSG} macro) at delimiter characters or strings to named fields (columns). One way to achieve this is to use a csv (comma-separated-values) parser (for other methods and possibilities, see the other sections of Chapter 15, Parsing and segmenting structured messages. The parsed fields act as user-defined macros that can be referenced in message templates, file- and tablenames, and so on.
Parsers are similar to filters: they must be defined in the syslog-ng PE configuration file and used in the log statement. You can also define the parser inline in the log path.
|
NOTE:
The order of filters, rewriting rules, and parsers in the log statement is important, as they are processed sequentially. |
To create a csv-parser()
, you have to define the columns of the message, the separator characters or strings (also called delimiters, for example, semicolon or tabulator), and optionally the characters that are used to escape the delimiter characters (quote-pairs()
).
Declaration:
parser <parser_name> { csv-parser( columns(column1, column2, ...) delimiters(chars("<delimiter_characters>"), strings("<delimiter_string1>")) ); };
Column names work like macros.
Names starting with a dot (for example, .example
) are reserved for use by syslog-ng PE. If you use such a macro name as the name of a parsed value, it will attempt to replace the original value of the macro (note that only soft macros can be overwritten, see the section called “Hard vs. soft macros” for details).
Example 15.1. Segmenting hostnames separated with a dash
The following example separates hostnames like example-1
and example-2
into two parts.
parser p_hostname_segmentation { csv-parser(columns("HOSTNAME.NAME", "HOSTNAME.ID") delimiters("-") flags(escape-none) template("${HOST}")); }; destination d_file { file("/var/log/messages-${HOSTNAME.NAME:-examplehost}"); }; log { source(s_local); parser(p_hostname_segmentation); destination(d_file);};
Example 15.2. Parsing Apache log files
The following parser processes the log of Apache web servers and separates them into different fields. Apache log messages can be formatted like:
"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %T %v"
Here is a sample message:
192.168.1.1 - - [31/Dec/2007:00:17:10 +0100] "GET /cgi-bin/example.cgi HTTP/1.1" 200 2708 "-" "curl/7.15.5 (i4 86-pc-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8c zlib/1.2.3 libidn/0.6.5" 2 example.balabit
To parse such logs, the delimiter character is set to a single whitespace (delimiters(" ")
). Whitespaces between quotes and brackets are ignored (quote-pairs('""[]')
).
parser p_apache { csv-parser(columns("APACHE.CLIENT_IP", "APACHE.IDENT_NAME", "APACHE.USER_NAME", "APACHE.TIMESTAMP", "APACHE.REQUEST_URL", "APACHE.REQUEST_STATUS", "APACHE.CONTENT_LENGTH", "APACHE.REFERER", "APACHE.USER_AGENT", "APACHE.PROCESS_TIME", "APACHE.SERVER_NAME") flags(escape-double-char,strip-whitespace) delimiters(" ") quote-pairs('""[]') ); };
The results can be used for example to separate log messages into different files based on the APACHE.USER_NAME field. If the field is empty, the nouser
name is assigned.
log { source(s_local); parser(p_apache); destination(d_file);}; }; destination d_file { file("/var/log/messages-${APACHE.USER_NAME:-nouser}"); };
Example 15.3. Segmenting a part of a message
Multiple parsers can be used to split a part of an already parsed message into further segments. The following example splits the timestamp of a parsed Apache log message into separate fields.
parser p_apache_timestamp { csv-parser(columns("APACHE.TIMESTAMP.DAY", "APACHE.TIMESTAMP.MONTH", "APACHE.TIMESTAMP.YEAR", "APACHE.TIMESTAMP.HOUR", "APACHE.TIMESTAMP.MIN", "APACHE.TIMESTAMP.MIN", "APACHE.TIMESTAMP.ZONE") delimiters("/: ") flags(escape-none) template("${APACHE.TIMESTAMP}")); }; log { source(s_local); parser(p_apache); parser(p_apache_timestamp); destination(d_file); };
Further examples:
For an example on using the greedy
option, see Example 15.4, “Adding the end of the message to the last column”.
The syslog-ng PE application can separate parts of log messages (that is, the contents of the ${MSG} macro) at delimiter characters or strings to named fields (columns). One way to achieve this is to use a csv (comma-separated-values) parser (for other methods and possibilities, see the other sections of Chapter 15, Parsing and segmenting structured messages. The parsed fields act as user-defined macros that can be referenced in message templates, file- and tablenames, and so on.
Parsers are similar to filters: they must be defined in the syslog-ng PE configuration file and used in the log statement. You can also define the parser inline in the log path.
|
NOTE:
The order of filters, rewriting rules, and parsers in the log statement is important, as they are processed sequentially. |
To create a csv-parser()
, you have to define the columns of the message, the separator characters or strings (also called delimiters, for example, semicolon or tabulator), and optionally the characters that are used to escape the delimiter characters (quote-pairs()
).
Declaration:
parser <parser_name> { csv-parser( columns(column1, column2, ...) delimiters(chars("<delimiter_characters>"), strings("<delimiter_string1>")) ); };
Column names work like macros.
Names starting with a dot (for example, .example
) are reserved for use by syslog-ng PE. If you use such a macro name as the name of a parsed value, it will attempt to replace the original value of the macro (note that only soft macros can be overwritten, see the section called “Hard vs. soft macros” for details).
Synopsis: | columns("PARSER.COLUMN1", "PARSER.COLUMN2", ...) |
Description: Specifies the name of the columns to separate messages to. These names will be automatically available as macros. The values of these macros do not include the delimiters.
Description: The delimiter is the character or string that separates the columns in the message. If you specify multiple characters using the delimiters(chars("<delimiter_characters>"))
option, every character will be treated as a delimiter. To separate the columns at the tabulator (tab character), specify \t
. For example, to separate the text at every hyphen (-) and colon (:) character, use delimiters(chars("-:"))
, Note that the delimiters will not be included in the column values.
If you have to use a string as a delimiter, list your string delimiters in the delimiters(strings("<delimiter_string1>", "<delimiter_string2>", ...)")
format.
If you use more than one delimiter, note the following points:
syslog-ng PE will split the message at the nearest possible delimiter. The order of the delimiters in the configuration file does not matter.
You can use both string delimiters and character delimiters in a parser.
The string delimiters can include characters that are also used as character delimiters.
If a string delimiter and a character delimiter both match at the same position of the message, syslog-ng PE uses the string delimiter.
Synopsis: | drop-invalid, escape-none, escape-backslash, escape-double-char, greedy, strip-whitespace |
Description: Specifies various options for parsing the message. The following flags are available:
drop-invalid: When the drop-invalid
option is set, the parser does not process messages that do not match the parser. For example, a message does not match the parser if it has less columns than specified in the parser, or it has more columns but the greedy
flag is not enabled. Using the drop-invalid
option practically turns the parser into a special filter, that matches messages that have the predefined number of columns (using the specified delimiters).
|
TIP:
Messages dropped as invalid can be processed by a |
escape-backslash: The parsed message uses the backslash (\
) character to escape quote characters.
escape-double-char: The parsed message repeats the quote character when the quote character is used literally. For example, to escape a comma (,
), the message contains two commas (,,
).
escape-none: The parsed message does not use any escaping for using the quote character literally.
greedy: The greedy
option assigns the remainder of the message to the last column, regardless of the delimiter characters set. You can use this option to process messages where the number of columns varies.
Example 15.4. Adding the end of the message to the last column
If the greedy
option is enabled, the syslog-ng application adds the not-yet-parsed part of the message to the last column, ignoring any delimiter characters that may appear in this part of the message.
For example, you receive the following comma-separated message: example 1, example2, example3
, and you segment it with the following parser:
csv-parser(columns("COLUMN1", "COLUMN2", "COLUMN3") delimiters(","));
The COLUMN1
, COLUMN2
, and COLUMN3
variables will contain the strings example1
, example2
, and example3
, respectively. If the message looks like example 1, example2, example3, some more information
, then any text appearing after the third comma (that is, some more information
) is not parsed, and possibly lost if you use only the variables to reconstruct the message (for example, to send it to different columns of an SQL table).
Using the greedy
flag will assign the remainder of the message to the last column, so that the COLUMN1
, COLUMN2
, and COLUMN3
variables will contain the strings example1
, example2
, and example3, some more information
.
csv-parser(columns("COLUMN1", "COLUMN2", "COLUMN3") delimiters(",") flags(greedy));
strip-whitespace: The strip-whitespace
flag removes leading and trailing whitespaces from all columns.
Synopsis: | quote-pairs('<quote_pairs>') |
Description: List quote-pairs between single quotes. Delimiter characters or strings enclosed between quote characters are ignored. Note that the beginning and ending quote character does not have to be identical, for example [}
can also be a quote-pair. For an example of using quote-pairs()
to parse Apache log files, see Example 15.2, “Parsing Apache log files”.
Synopsis: | template("${<macroname>}") |
Description: The macro that contains the part of the message that the parser will process. It can also be a macro created by a previous parser of the log path. By default, this is empty and the parser processes the entire message (${MESSAGE}
). For examples, see Example 15.1, “Segmenting hostnames separated with a dash” and Example 15.3, “Segmenting a part of a message”.
The syslog-ng PE application can separate a message consisting of whitespace or comma-separated key=value
pairs (for example, Postfix log messages) into name-value pairs. You can also specify other separator character instead of the equal sign, for example, colon (:
) to parse MySQL log messages. For details on using value-pairs in syslog-ng PE see the section called “Structuring macros, metadata, and other value-pairs”.
You can refer to the separated parts of the message using the key of the value as a macro. For example, if the message contains KEY1=value1,KEY2=value2
, you can refer to the values as ${KEY1}
and ${KEY2}
.
|
NOTE:
If a log message contains the same key multiple times (for example, |
|
Caution:
If the names of keys in the message is the same as the names of syslog-ng PE soft macros, the value from the parsed message will overwrite the value of the macro. For example, the Hard macros cannot be modified, so they will not be overwritten. For details on the macro types, see the section called “Hard vs. soft macros”. The parser discards message sections that are not The names of the keys can contain only the following characters: numbers (0-9), letters (a-z,A-Z), underscore (_), dot (.), hyphen (-). Other special characters are not permitted. |
To parse key=value
pairs, define a parser that has the kv-parser()
option. Defining the prefix is optional. By default, the parser will process the ${MESSAGE}
part of the log message. You can also define the parser inline in the log path.
Declaration:
parser parser_name { kv-parser( prefix() ); };
Example 15.5. Using a key=value
parser
In the following example, the source is a log message consisting of comma-separated key=value
pairs, for example, a Postfix log message:
Jun 20 12:05:12 mail.example.com <info> postfix/qmgr[35789]: EC2AC1947DA: from=<me@example.com>, size=807, nrcpt=1 (queue active)
The kv-parser inserts the ".kv.
" prefix before all extracted name-value pairs. The destination is a file, that uses the format-json
template function. Every name-value pair that begins with a dot (".
") character will be written to the file (dot-nv-pairs
). The log line connects the source, the destination and the parser.
source s_kv { network(port(21514)); }; destination d_json { file("/tmp/test.json" template("$(format-json --scope dot-nv-pairs)\n")); }; parser p_kv { kv-parser (prefix(".kv.")); }; log { source(s_kv); parser(p_kv); destination(d_json); };
You can also define the parser inline in the log path.
source s_kv { network(port(21514)); }; destination d_json { file("/tmp/test.json" template("$(format-json --scope dot-nv-pairs)\n")); }; log { source(s_kv); parser { kv-parser (prefix(".kv.")); }; destination(d_json); };
You can set the separator character between the key and the value to parse for example key:value
pairs, like MySQL logs:
Mar 7 12:39:25 myhost MysqlClient[20824]: SYSTEM_USER:'oscar', MYSQL_USER:'my_oscar', CONNECTION_ID:23, DB_SERVER:'127.0.0.1', DB:'--', QUERY:'USE test;'
parser p_mysql { kv-parser(value-separator(":") prefix(".mysql."));
The kv-parser
has the following options.
Synopsis: | prefix() |
Description: Insert a prefix before the name part of the parsed name-value pairs to help further processing. For example:
To insert the my-parsed-data.
prefix, use the prefix(my-parsed-data.)
option.
To refer to a particular data that has a prefix, use the prefix in the name of the macro, for example, ${my-parsed-data.name} .
If you forward the parsed messages using the IETF-syslog protocol, you can insert all the parsed data into the SDATA part of the message using the prefix(.SDATA.my-parsed-data.)
option.
Names starting with a dot (for example, .example
) are reserved for use by syslog-ng PE. If you use such a macro name as the name of a parsed value, it will attempt to replace the original value of the macro (note that only soft macros can be overwritten, see the section called “Hard vs. soft macros” for details). To avoid such problems, use a prefix when naming the parsed values, for example, prefix(my-parsed-data.)
For example, to insert the postfix
prefix when parsing Postfix log messages, use the prefix(.postfix.)
option.
Synopsis: | template("${<macroname>}") |
Description: The macro that contains the part of the message that the parser will process. It can also be a macro created by a previous parser of the log path. By default, the parser processes the entire message (${MESSAGE}
).
JavaScript Object Notation (JSON) is a text-based open standard designed for human-readable data interchange. It is used primarily to transmit data between a server and web application, serving as an alternative to XML. It is described in RFC 4627. The syslog-ng PE application can separate parts of incoming JSON-encoded log messages to name-value pairs. For details on using value-pairs in syslog-ng PE see the section called “Structuring macros, metadata, and other value-pairs”.
You can refer to the separated parts of the JSON message using the key of the JSON object as a macro. For example, if the JSON contains {"KEY1":"value1","KEY2":"value2"}
, you can refer to the values as ${KEY1}
and ${KEY2}
. If the JSON content is structured, syslog-ng PE converts it to dot-notation-format. For example, to access the value of the following structure {"KEY1": {"KEY2": "VALUE"}}
, use the ${KEY1.KEY2}
macro.
|
Caution:
If the names of keys in the JSON content are the same as the names of syslog-ng PE soft macros, the value from the JSON content will overwrite the value of the macro. For example, the Hard macros cannot be modified, so they will not be overwritten. For details on the macro types, see the section called “Hard vs. soft macros”. |
|
NOTE:
The JSON parser currently supports only integer, double and string values when interpreting JSON structures. As syslog-ng does not handle different data types internally, the JSON parser converts all JSON data to string values. In case of boolean types, the value is converted to 'TRUE' or 'FALSE' as their string representation. The JSON parser discards messages if it cannot parse them as JSON messages, so it acts as a JSON-filter as well. |
To create a JSON parser, define a parser that has the json-parser()
option. Defining the prefix and the marker are optional. By default, the parser will process the ${MESSAGE}
part of the log message. To process other parts of a log message with the JSON parser, use the template()
option. You can also define the parser inline in the log path.
Declaration:
parser parser_name { json-parser( marker() prefix() ); };
Example 15.6. Using a JSON parser
In the following example, the source is a JSON encoded log message. The syslog parser is disabled, so that syslog-ng PE does not parse the message: flags(no-parse)
. The json-parser inserts ".json.
" prefix before all extracted name-value pairs. The destination is a file, that uses the format-json
template function. Every name-value pair that begins with a dot (".
") character will be written to the file (dot-nv-pairs
). The log line connects the source, the destination and the parser.
source s_json { network(port(21514) flags(no-parse)); }; destination d_json { file("/tmp/test.json" template("$(format-json --scope dot-nv-pairs)\n")); }; parser p_json { json-parser (prefix(".json.")); }; log { source(s_json); parser(p_json); destination(d_json); };
You can also define the parser inline in the log path.
source s_json { network(port(21514) flags(no-parse)); }; destination d_json { file("/tmp/test.json" template("$(format-json --scope dot-nv-pairs)\n")); }; log { source(s_json); parser p_json { json-parser (prefix(".json.")); }; destination(d_json); };
The JSON parser has the following options.
Synopsis: | extract-prefix() |
Description: Extract only the specified subtree from the JSON message. Use the dot-notation to specify the subtree. The rest of the message will be ignored. For example, assuming that the incoming object is named msg
, the json-parser(extract-prefix("foo.bar[5]"));
syslog-ng PE parser is equivalent to the msg.foo.bar[5]
javascript code. Note that the resulting expression must be a JSON object, so that syslog-ng PE can extract its members into name-value pairs.
This feature also works when the top-level object is an array, because you can use an array index at the first indirection level, for example: json-parser(extract-prefix("[5]"))
, which is equivalent to msg[5]
.
Synopsis: | marker() |
Description: Use a marker in case of mixed log messages, to identify JSON encoded messages for the parser.
Some logging implementations require a marker to be set before the JSON payload. The JSON parser is able to find these markers and parse the message only if it is present.
Example 15.7. Using the marker option in JSON parser
This json parser parses log messages which use the "@cee:" marker in front of the json payload. It inserts ".cee.
" in front of the name of name-value pairs, so later on it is easier to find name-value pairs that were parsed using this parser. (For details on selecting name-value pairs, see the section called “value-pairs()”.)
parser { json-parser( marker("@cee:") prefix(".cee.") ); };
Synopsis: | prefix() |
Description: Insert a prefix before the name part of the name-value pairs to help further processing. For example, if you forward the parsed messages using the IETF-syslog protocol, you can insert all the parsed data into the SDATA part of the message using the prefix(.SDATA.json.)
.
Names starting with a dot (for example, .example
) are reserved for use by syslog-ng PE. If you use such a macro name as the name of a parsed value, it will attempt to replace the original value of the macro (note that only soft macros can be overwritten, see the section called “Hard vs. soft macros” for details). To avoid such problems, use a prefix when naming the parsed values, for example, prefix(my-parsed-data.)
© 2024 One Identity LLC. ALL RIGHTS RESERVED. Terms of Use Privacy Cookie Preference Center