Filters and substitution rewrite rules can use regular expressions. In regular expressions, the characters ()[].*?+^$|\ are used as special symbols. Depending on how you want to use these characters and which quotation mark you use, these characters must be used differently, as summarized below.
-
Strings between single quotes ('string') are treated literally and are not interpreted at all, you do not have to escape special characters. For example, the output of '\x41' is \x41 (characters as follows: backslash, x(letter), 4(number), 1(number)). This makes writing and reading regular expressions much more simple: it is recommended to use single quotes when writing regular expressions.
-
When enclosing strings between double-quotes ("string"), the string is interpreted and you have to escape special characters, that is, to precede them with a backslash (\) character if they are meant literally. For example, the output of the "\x41" is simply the letter a. Therefore special characters like \(backslash) or "(quotation mark) must be escaped (\\ and \"). The following expressions are interpreted: \a, \n, \r, \t, \v. For example, the \$40 expression matches the $40 string. Backslashes have to be escaped as well if they are meant literally, for example, the \\d expression matches the \d string.
TIP: If you use single quotes, you do not need to escape the backslash, for example, match("\\.") is equivalent to match('\.').
-
Enclosing alphanumeric strings between double-quotes ("string") is not necessary, you can just omit the double-quotes. For example, when writing filters, match("sometext") and match(sometext) will both match for the sometext string.
NOTE: Only strings containing alphanumerical characters can be used without quotes or double quotes. If the string contains whitespace or any special characters (()[].*?+^$|\ or ;:#), you must use quotes or double quotes.
When using the ;:# characters, you must use quotes or double quotes, but escaping them is not required.
By default, all regular expressions are case sensitive. To disable the case sensitivity of the expression, add the flags(ignore-case) option to the regular expression.
filter demo_regexp_insensitive { host("system" flags(ignore-case)); };
The regular expressions can use up to 255 regexp matches (${1} ... ${255}), but only from the last filter and only if the flags("store-matches") flag was set for the filter. For case-insensitive searches, use the flags("ignore-case") option.
By default, syslog-ng uses PCRE-style regular expressions. To use other expression types, add the type() option after the regular expression.
The syslog-ng PE application supports the following expression types:
pcre
Description: Use Perl Compatible Regular Expressions (PCRE). Starting with syslog-ng PE version 3.1, PCRE expressions are supported on every platform. If the type() parameter is not specified, syslog-ng uses PCRE regular expressions by default.
PCRE regular expressions have the following flag options:
global
Usable only in rewrite rules: match for every occurrence of the expression, not only the first one.
ignore-case
Disable case-sensitivity.
store-matches:
Store the matches of the regular expression into the $0, ... $255 variables. The $0 stores the entire match, $1 is the first group of the match (parentheses), and so on. Named matches (also called named subpatterns), for example, (?<name>...), are stored as well. Matches from the last filter expression can be referenced in regular expressions.
unicode
Use Unicode support for UTF-8 matches: UTF-8 character sequences are handled as single characters.
utf8
An alias for the unicode flag.
Example: Using PCRE regular expressions
rewrite r_rewrite_subst
{subst("a*", "?", value("MESSAGE") flags("utf8" "global")); };
string
Description: Match the strings literally, without regular expression support. By default, only identical strings are matched. For partial matches, use the flags("prefix") or the flags("substring") flags.
glob
Description: Match the strings against a pattern containing '*' and '?' wildcards, without regular expression and character range support. The advantage of glob patterns to regular expressions is that globs can be processed much faster.
-
* matches an arbitrary string, including an empty string
-
? matches an arbitrary character
-
The wildcards can match the / character.
-
You cannot use the * and ? literally in the pattern.
The host(), match(), and program() filter functions and some other syslog-ng objects accept regular expressions as parameters. But evaluating general regular expressions puts a high load on the CPU, which can cause problems when the message traffic is very high. Often the regular expression can be replaced with simple filter functions and logical operators. Using simple filters and logical operators, the same effect can be achieved at a much lower CPU load.
Example: Optimizing regular expressions in filters
Suppose you need a filter that matches the following error message logged by the xntpd NTP daemon:
xntpd[1567]: time error -1159.777379 is too large (set clock manually);
The following filter uses regular expressions and matches every instance and variant of this message.
filter f_demo_regexp {
program("demo_program") and
match("time error .* is too large .* set clock manually"); };
Segmenting the match() part of this filter into separate match() functions greatly improves the performance of the filter.
filter f_demo_optimized_regexp {
program("demo_program") and
match("time error") and
match("is too large") and
match("set clock manually"); };
parser: Parse and segment structured messages
The filters and default macros of syslog-ng work well on the headers and metainformation of the log messages, but are rather limited when processing the content of the messages. Parsers can segment the content of the messages into name-value pairs, and these names can be used as user-defined macros. Subsequent filtering or other type of processing of the message can use these custom macros to refer to parts of the message. Parsers are global objects most often used together with filters and rewrite rules.
The syslog-ng PE application provides the following possibilities to parse the messages, or parts of the messages:
-
By default, syslog-ng PE parses every message as a syslog message. To disable message parsing, use the flags(no-parse) option of the source. To explicitly parse a message as a syslog message, use the syslog parser. For details, see Parsing syslog messages.
-
To segment a message into columns using a CSV-parser, see Parsing messages with comma-separated and similar values.
-
To segment a message consisting of whitespace or comma-separated key=value pairs (for example, Postfix log messages), see Parsing key=value pairs.
-
To parse JSON-formatted messages, see JSON parser.
-
To parse XML-formatted messages, see XML parser.
-
To identify and parse the messages using a pattern database, see Processing message content with a pattern database.
-
To parse a specially-formatted date or timestamp, see Parsing dates and timestamps.
-
To write a custom parser in Python or Hy, see Python parser.
The syslog-ng PE application provides built-in parsers for the following application logs: