Chat now with support
Chat with Support

We are currently preforming website maintenance, any feature requiring sign-in is temporarily unavailable, if you have an issue requiring immediate assistance please call Technical Support.

syslog-ng Premium Edition 6.0.17 - Administration Guide

Preface Chapter 1. Introduction to syslog-ng Chapter 2. The concepts of syslog-ng Chapter 3. Installing syslog-ng Chapter 4. The syslog-ng PE quick-start guide Chapter 5. The syslog-ng PE configuration file Chapter 6. Collecting log messages — sources and source drivers Chapter 7. Sending and storing log messages — destinations and destination drivers Chapter 8. Routing messages: log paths, reliability, and filters Chapter 9. Global options of syslog-ng PE Chapter 10. TLS-encrypted message transfer Chapter 11. FIPS-compliant syslog-ng Chapter 12.  Reliable Log Transfer Protocol™ Chapter 13. Reliability and minimizing the loss of log messages Chapter 14. Manipulating messages Chapter 15. Parsing and segmenting structured messages Chapter 16. Processing message content with a pattern database Chapter 17. Statistics and metrics of syslog-ng Chapter 18. Multithreading and scaling in syslog-ng PE Chapter 19. Troubleshooting syslog-ng Chapter 20. Best practices and examples

Parsing key=value pairs

Parsing key=value pairs

The syslog-ng PE application can separate a message consisting of whitespace or comma-separated key=value pairs (for example, Postfix log messages) into name-value pairs. You can also specify other separator character instead of the equal sign, for example, colon (:) to parse MySQL log messages. For details on using value-pairs in syslog-ng PE see the section called “Structuring macros, metadata, and other value-pairs”.

You can refer to the separated parts of the message using the key of the value as a macro. For example, if the message contains KEY1=value1,KEY2=value2, you can refer to the values as ${KEY1} and ${KEY2}.

NOTE:

If a log message contains the same key multiple times (for example, key1=value1, key2=value2, key1=value3, key3=value4, key1=value5), then syslog-ng PE stores only the last (rightmost) value for the key. Using the previous example, syslog-ng PE will store the following pairs: key1=value5, key2=value2, key3=value4..

Caution:

If the names of keys in the message is the same as the names of syslog-ng PE soft macros, the value from the parsed message will overwrite the value of the macro. For example, the PROGRAM=value1, MESSAGE=value2 content will overwrite the ${PROGRAM} and ${MESSAGE} macros. To avoid overwriting such macros, use the prefix() option.

Hard macros cannot be modified, so they will not be overwritten. For details on the macro types, see the section called “Hard vs. soft macros”.

The parser discards message sections that are not key=value pairs, even if they appear between key=value pairs that can be parsed.

The names of the keys can contain only the following characters: numbers (0-9), letters (a-z,A-Z), underscore (_), dot (.), hyphen (-). Other special characters are not permitted.

To parse key=value pairs, define a parser that has the kv-parser() option. Defining the prefix is optional. By default, the parser will process the ${MESSAGE} part of the log message. You can also define the parser inline in the log path.

Declaration: 

parser parser_name {
    kv-parser(
        prefix()
    );
};

Example 15.5. Using a key=value parser

In the following example, the source is a log message consisting of comma-separated key=value pairs, for example, a Postfix log message:

Jun 20 12:05:12 mail.example.com <info> postfix/qmgr[35789]: EC2AC1947DA: from=<me@example.com>, size=807, nrcpt=1 (queue active)

The kv-parser inserts the ".kv." prefix before all extracted name-value pairs. The destination is a file, that uses the format-json template function. Every name-value pair that begins with a dot (".") character will be written to the file (dot-nv-pairs). The log line connects the source, the destination and the parser.

source s_kv {
    network(port(21514));
};

destination d_json {
    file("/tmp/test.json"
        template("$(format-json --scope dot-nv-pairs)\n"));
};

parser p_kv {
    kv-parser (prefix(".kv."));
};

log {
    source(s_kv);
    parser(p_kv);
    destination(d_json);
};

You can also define the parser inline in the log path.

source s_kv {
    network(port(21514));
};

destination d_json {
    file("/tmp/test.json"
        template("$(format-json --scope dot-nv-pairs)\n"));
};

log {
    source(s_kv);
    parser {
        kv-parser (prefix(".kv."));
    };
    destination(d_json);
};

You can set the separator character between the key and the value to parse for example key:value pairs, like MySQL logs:

Mar  7 12:39:25 myhost MysqlClient[20824]: SYSTEM_USER:'oscar', MYSQL_USER:'my_oscar', CONNECTION_ID:23, DB_SERVER:'127.0.0.1', DB:'--', QUERY:'USE test;'
parser p_mysql { kv-parser(value-separator(":") prefix(".mysql."));

Options of key=value parsers

The kv-parser has the following options.

prefix()
Synopsis: prefix()

Description: Insert a prefix before the name part of the parsed name-value pairs to help further processing. For example:

  • To insert the my-parsed-data. prefix, use the prefix(my-parsed-data.) option.

  • To refer to a particular data that has a prefix, use the prefix in the name of the macro, for example, ${my-parsed-data.name} .

  • If you forward the parsed messages using the IETF-syslog protocol, you can insert all the parsed data into the SDATA part of the message using the prefix(.SDATA.my-parsed-data.) option.

Names starting with a dot (for example, .example) are reserved for use by syslog-ng PE. If you use such a macro name as the name of a parsed value, it will attempt to replace the original value of the macro (note that only soft macros can be overwritten, see the section called “Hard vs. soft macros” for details). To avoid such problems, use a prefix when naming the parsed values, for example, prefix(my-parsed-data.)

For example, to insert the postfix prefix when parsing Postfix log messages, use the prefix(.postfix.) option.

template()
Synopsis: template("${<macroname>}")

Description: The macro that contains the part of the message that the parser will process. It can also be a macro created by a previous parser of the log path. By default, the parser processes the entire message (${MESSAGE}).

value-separator()
Synopsis: value-separator()

Description: Specifies the character that separates the keys from the values. Default value: =

For example, to parse key:value pairs, use kv-parser(value-separator(":"));

The JSON parser

JavaScript Object Notation (JSON) is a text-based open standard designed for human-readable data interchange. It is used primarily to transmit data between a server and web application, serving as an alternative to XML. It is described in RFC 4627. The syslog-ng PE application can separate parts of incoming JSON-encoded log messages to name-value pairs. For details on using value-pairs in syslog-ng PE see the section called “Structuring macros, metadata, and other value-pairs”.

You can refer to the separated parts of the JSON message using the key of the JSON object as a macro. For example, if the JSON contains {"KEY1":"value1","KEY2":"value2"}, you can refer to the values as ${KEY1} and ${KEY2}. If the JSON content is structured, syslog-ng PE converts it to dot-notation-format. For example, to access the value of the following structure {"KEY1": {"KEY2": "VALUE"}}, use the ${KEY1.KEY2} macro.

Caution:

If the names of keys in the JSON content are the same as the names of syslog-ng PE soft macros, the value from the JSON content will overwrite the value of the macro. For example, the {"PROGRAM":"value1","MESSAGE":"value2"} JSON content will overwrite the ${PROGRAM} and ${MESSAGE} macros. To avoid overwriting such macros, use the prefix() option.

Hard macros cannot be modified, so they will not be overwritten. For details on the macro types, see the section called “Hard vs. soft macros”.

NOTE:

The JSON parser currently supports only integer, double and string values when interpreting JSON structures. As syslog-ng does not handle different data types internally, the JSON parser converts all JSON data to string values. In case of boolean types, the value is converted to 'TRUE' or 'FALSE' as their string representation.

The JSON parser discards messages if it cannot parse them as JSON messages, so it acts as a JSON-filter as well.

To create a JSON parser, define a parser that has the json-parser() option. Defining the prefix and the marker are optional. By default, the parser will process the ${MESSAGE} part of the log message. To process other parts of a log message with the JSON parser, use the template() option. You can also define the parser inline in the log path.

Declaration: 

parser parser_name {
    json-parser(
    marker()
    prefix()
    );
};

Example 15.6. Using a JSON parser

In the following example, the source is a JSON encoded log message. The syslog parser is disabled, so that syslog-ng PE does not parse the message: flags(no-parse). The json-parser inserts ".json." prefix before all extracted name-value pairs. The destination is a file, that uses the format-json template function. Every name-value pair that begins with a dot (".") character will be written to the file (dot-nv-pairs). The log line connects the source, the destination and the parser.

source s_json {
    network(port(21514) flags(no-parse));
};

destination d_json {
    file("/tmp/test.json"
        template("$(format-json --scope dot-nv-pairs)\n"));
};

parser p_json {
    json-parser (prefix(".json."));
};

log {
    source(s_json);
    parser(p_json);
    destination(d_json);
};

You can also define the parser inline in the log path.

source s_json {
    network(port(21514) flags(no-parse));
};

destination d_json {
    file("/tmp/test.json"
        template("$(format-json --scope dot-nv-pairs)\n"));
};

log {
    source(s_json);
    parser p_json {
        json-parser (prefix(".json."));
    };
    destination(d_json);
};

Options of JSON parsers

The JSON parser has the following options.

extract-prefix()
Synopsis: extract-prefix()

Description: Extract only the specified subtree from the JSON message. Use the dot-notation to specify the subtree. The rest of the message will be ignored. For example, assuming that the incoming object is named msg, the json-parser(extract-prefix("foo.bar[5]"));syslog-ng PE parser is equivalent to the msg.foo.bar[5] javascript code. Note that the resulting expression must be a JSON object, so that syslog-ng PE can extract its members into name-value pairs.

This feature also works when the top-level object is an array, because you can use an array index at the first indirection level, for example: json-parser(extract-prefix("[5]")), which is equivalent to msg[5].

marker()
Synopsis: marker()

Description: Use a marker in case of mixed log messages, to identify JSON encoded messages for the parser.

Some logging implementations require a marker to be set before the JSON payload. The JSON parser is able to find these markers and parse the message only if it is present.

Example 15.7. Using the marker option in JSON parser

This json parser parses log messages which use the "@cee:" marker in front of the json payload. It inserts ".cee." in front of the name of name-value pairs, so later on it is easier to find name-value pairs that were parsed using this parser. (For details on selecting name-value pairs, see the section called “value-pairs()”.)

parser {
        json-parser(
            marker("@cee:")
            prefix(".cee.")
        );
    };

prefix()
Synopsis: prefix()

Description: Insert a prefix before the name part of the name-value pairs to help further processing. For example, if you forward the parsed messages using the IETF-syslog protocol, you can insert all the parsed data into the SDATA part of the message using the prefix(.SDATA.json.) .

Names starting with a dot (for example, .example) are reserved for use by syslog-ng PE. If you use such a macro name as the name of a parsed value, it will attempt to replace the original value of the macro (note that only soft macros can be overwritten, see the section called “Hard vs. soft macros” for details). To avoid such problems, use a prefix when naming the parsed values, for example, prefix(my-parsed-data.)

template()
Synopsis: template("${<macroname>}")

Description: The macro that contains the part of the message that the parser will process. It can also be a macro created by a previous parser of the log path. By default, this is empty and the parser processes the entire message (${MESSAGE}).

Chapter 16. Processing message content with a pattern database

Classifying log messages

The syslog-ng application can compare the contents of the received log messages to predefined message patterns. By comparing the messages to the known patterns, syslog-ng is able to identify the exact type of the messages, and sort them into message classes. The message classes can be used to classify the type of the event described in the log message. The message classes can be customized, and for example can label the messages as user login, application crash, file transfer, and so on events.

To find the pattern that matches a particular message, syslog-ng uses a method called longest prefix match radix tree. This means that syslog-ng creates a tree structure of the available patterns, where the different characters available in the patterns for a given position are the branches of the tree.

To classify a message, syslog-ng selects the first character of the message (the text of message, not the header), and selects the patterns starting with this character, other patterns are ignored for the rest of the process. After that, the second character of the message is compared to the second character of the selected patterns. Again, matching patterns are selected, and the others discarded. This process is repeated until a single pattern completely matches the message, or no match is found. In the latter case, the message is classified as unknown, otherwise the class of the matching pattern is assigned to the message.

To make the message classification more flexible and robust, the patterns can contain pattern parsers: elements that match on a set of characters. For example, the NUMBER parser matches on any integer or hexadecimal number (for example 1, 123, 894054, 0xFFFF, and so on). Other pattern parsers match on various strings and IP addresses. For the details of available pattern parsers, see the section called “Using pattern parsers”.

The functionality of the pattern database is similar to that of the logcheck project, but it is much easier to write and maintain the patterns used by syslog-ng, than the regular expressions used by logcheck. Also, it is much easier to understand syslog-ng pattens than regular expressions.

Pattern matching based on regular expressions is computationally very intensive, especially when the number of patterns increases. The solution used by syslog-ng can be performed real-time, and is independent from the number of patterns, so it scales much better. The following patterns describe the same message: Accepted password for bazsi from 10.50.0.247 port 42156 ssh2

A regular expression matching this message from the logcheck project: Accepted (gssapi(-with-mic|-keyex)?|rsa|dsa|password|publickey|keyboard-interactive/pam) for [^[:space:]]+ from [^[:space:]]+ port [0-9]+( (ssh|ssh2))?

A syslog-ng database pattern for this message: Accepted @QSTRING:auth_method: @ for@QSTRING:username: @from @QSTRING:client_addr: @port @NUMBER:port:@ ssh2

For details on using pattern databases to classify log messages, see the section called “Using pattern databases”.

The structure of the pattern database

The pattern database is organized as follows:

Figure 16.1. The structure of the pattern database

The structure of the pattern database

  • The pattern database consists of rulesets. A ruleset consists of a Program Pattern and a set of rules: the rules of a ruleset are applied to log messages if the name of the application that sent the message matches the Program Pattern of the ruleset. The name of the application (the content of the ${PROGRAM} macro) is compared to the Program Patterns of the available rulesets, and then the rules of the matching rulesets are applied to the message.

  • The Program Pattern can be a string that specifies the name of the appliation or the beginning of its name (for example, to match for sendmail, the program pattern can be sendmail, or just send), and the Program Pattern can contain pattern parsers. Note that pattern parsers are completely independent from the syslog-ng parsers used to segment messages. Additionally, every rule has a unique identifier: if a message matches a rule, the identifier of the rule is stored together with the message.

  • Rules consist of a message pattern and a class. The Message Pattern is similar to the Program Pattern, but is applied to the message part of the log message (the content of the ${MESSAGE} macro). If a message pattern matches the message, the class of the rule is assigned to the message (for example, Security, Violation, and so on).

  • Rules can also contain additional information about the matching messages, such as the description of the rule, an URL, name-value pairs, or free-form tags.

  • Patterns can consist of literals (keywords, or rather, keycharacters) and pattern parsers.

    NOTE:

    If the ${PROGRAM} part of a message is empty, rules with an empty Program Pattern are used to classify the message.

    If the same Program Pattern is used in multiple rulesets, the rules of these rulesets are merged, and every rule is used to classify the message. Note that message patterns must be unique within the merged rulesets, but the currently only one ruleset is checked for uniqueness.

How pattern matching works

Figure 16.2. Applying patterns

Applying patterns

The followings describe how patterns work. This information applies to program patterns and message patterns alike, even though message patterns are used to illustrate the procedure.

Patterns can consist of literals (keywords, or rather, keycharacters) and pattern parsers. Pattern parsers attempt to parse a sequence of characters according to certain rules.

NOTE:

Wildcards and regular expressions cannot be used in patterns. The @ character must be escaped, that is, to match for this character, you have to write @@ in your pattern. This is required because pattern parsers of syslog-ng are enclosed between @ characters.

When a new message arrives, syslog-ng attempts to classify it using the pattern database. The available patterns are organized alphabetically into a tree, and syslog-ng inspects the message character-by-character, starting from the beginning. This approach ensures that only a small subset of the rules must be evaluated at any given step, resulting in high processing speed. Note that the speed of classifying messages is practically independent from the total number of rules.

For example, if the message begins with the Apple string, only patterns beginning with the character A are considered. In the next step, syslog-ng selects the patterns that start with Ap, and so on, until there is no more specific pattern left.

Note that literal matches take precedence over pattern parser matches: if at a step there is a pattern that matches the next character with a literal, and another pattern that would match it with a parser, the pattern with the literal match is selected. Using the previous example, if at the third step there is the literal pattern Apport and a pattern parser Ap@STRING@, the Apport pattern is matched. If the literal does not match the incoming string (for example, Apple), syslog-ng attempts to match the pattern with the parser. However, if there are two or more parsers on the same level, only the first one will be applied, even if it does not perfectly match the message.

If there are two parsers at the same level (for example, Ap@STRING@ and Ap@QSTRING@), it is random which pattern is applied (technically, the one that is loaded first). However, if the selected parser cannot parse at least one character of the message, the other parser is used. But having two different parsers at the same level is extremely rare, so the impact of this limitation is much less than it appears.

Artificial ignorance

Artificial ignorance is a method to detect anomalies. When applied to log analysis, it means that you ignore the regular, common log messages - these are the result of the regular behavior of your system, and therefore are not too interesting. However, new messages that have not appeared in the logs before can sign important events, and should be therefore investigated. "By definition, something we have never seen before is anomalous" (Marcus J. Ranum). 

The syslog-ng application can classify messages using a pattern database: messages that do not match any pattern are classified as unknown. This provides a way to use artificial ignorance to review your log messages. You can periodically review the unknown messages — syslog-ng can send them to a separate destination, and add patterns for them to the pattern database. By reviewing and manually classifying the unknown messages, you can iteratively classify more and more messages, until only the really anomalous messages show up as unknown.

Obviously, for this to work, a large number of message patterns are required. The radix-tree matching method used for message classification is very effective, can be performed very fast, and scales very well. Basically the time required to perform a pattern matching is independent from the number of patterns in the database. For sample pattern databases, see the section called “Downloading sample pattern databases”.

Using pattern databases

To classify messages using a pattern database, include a db-parser() statement in your syslog-ng configuration file using the following syntax:

Declaration: 

parser <identifier> {db-parser(file("<database_filename>"));};

Note that using the parser in a log statement only performs the classification, but does not automatically do anything with the results of the classification.

Example 16.1. Defining pattern databases

The following statement uses the database located at /opt/syslog-ng/var/db/patterndb.xml.

parser pattern_db {
            db-parser(
                file("/opt/syslog-ng/var/db/patterndb.xml")
            );
            };

To apply the patterns on the incoming messages, include the parser in a log statement:

log {
        source(s_all);
        parser(pattern_db);
        destination( di_messages_class);
        };

NOTE:

The default location of the pattern database file is /opt/syslog-ng/var/run/patterndb.xml. The file option of the db-parser() statement can be used to specify a different file, thus different db-parser statements can use different pattern databases. Later versions of syslog-ng will be able to dynamically generate a main database from separate pattern database files.

Example 16.2. Using classification results

The following destination separates the log messages into different files based on the class assigned to the pattern that matches the message (for example Violation and Security type messages are stored in a separate file), and also adds the ID of the matching rule to the message:

destination di_messages_class {
        file("/var/log/messages-${.classifier.class}"
        template("${.classifier.rule_id};${S_UNIXTIME};${SOURCEIP};${HOST};${PROGRAM};${PID};${MSG}\n")
        template-escape(no)
    );
};

For details on how to create your own pattern databases see the section called “The syslog-ng pattern database format”.

Using parser results in filters and templates

The results of message classification and parsing can be used in custom filters and templates, for example, in file and database templates. The following built-in macros allow you to use the results of the classification:

  • The .classifier.class macro contains the class assigned to the message (for example violation, security, or unknown).

  • The .classifier.rule_id macro contains the identifier of the message pattern that matched the message.

Example 16.3. Using classification results for filtering messages

To filter on a specific message class, create a filter that checks the .classifier_class macro, and use this filter in a log statement.

filter fi_class_violation {
                    match("violation"
                    value(".classifier.class")
                    type("string")
                    );
                    };
log {
                    source(s_all);
                    parser(pattern_db);
                    filter(fi_class_violation);
                    destination(di_class_violation);
                    };

Filtering on the unknown class selects messages that did not match any rule of the pattern database. Routing these messages into a separate file allows you to periodically review new or unknown messages.

To filter on messages matching a specific classification rule, create a filter that checks the .classifier.rule_id macro. The unique identifier of the rule (for example e1e9c0d8-13bb-11de-8293-000c2922ed0a) is the id attribute of the rule in the XML database.

filter fi_class_rule {
                    match("e1e9c0d8-13bb-11de-8293-000c2922ed0a"
                    value(".classifier.rule_id")
                    type("string")
                    );
                    };

Pattern database rules can assign tags to messages. These tags can be used to select tagged messages using the tags() filter function.

NOTE:

The syslog-ng PE application automatically adds the class of the message as a tag using the .classifier.<message-class> format. For example, messages classified as "system" receive the .classifier.system tag. Use the tags() filter function to select messages of a specific class.

filter f_tag_filter {tags(".classifier.system");};

The message-segments parsed by the pattern parsers can also be used as macros as well. To accomplish this, you have to add a name to the parser, and then you can use this name as a macro that refers to the parsed value of the message.

Example 16.4. Using pattern parsers as macros

For example, you want to parse messages of an application that look like "Transaction: <type>.", where <type> is a string that has different values (for example refused, accepted, incomplete, and so on). To parse these messages, you can use the following pattern:

'Transaction: @ESTRING::.@'

Here the @ESTRING@ parser parses the message until the next full stop character. To use the results in a filter or a filename template, include a name in the parser of the pattern, for example:

'Transaction: @ESTRING:TRANSACTIONTYPE:.@'

After that, add a custom template to the log path that uses this template. For example, to select every accepted transaction, use the following custom filter in the log path:

match("accepted" value("TRANSACTIONTYPE"));

NOTE:

The above macros can be used in database columns and filename templates as well, if you create custom templates for the destination or logspace.

Use a consistent naming scheme for your macros, for example, APPLICATIONNAME_MACRONAME.

Downloading sample pattern databases

To simplify the building of pattern databases, Balabit has released (and will continue to release) sample databases. You can download sample pattern databases from the PatternDB GitHub page.

Note that these pattern databases are only samples and experimental databases. They are not officially supported, and may or may not work in your environment.

The syslog-ng pattern databases are available under the Creative Commons Attribution-Share Alike 3.0 (CC by-SA) license. This includes every pattern database written by community contributors or the Balabit staff. It means that:

  • You are free to use and modify the patterns for your needs.

  • If you redistribute the pattern databases, you must distribute your modifications under the same license.

  • If you redistribute the pattern databases, you must make it obvious that the source of the original syslog-ng pattern databases is the PatternDb GitHub page.

For legal details, the full text of the license is available here.

If you create patterns that are not available in the GitHub repository, consider sharing them with us and the syslog-ng community, and send them to the syslog-ng mailing list, or to the following e-mail address:

Related Documents