Chat now with support
Chat with Support

We are currently preforming website maintenance, any feature requiring sign-in is temporarily unavailable, if you have an issue requiring immediate assistance please call Technical Support.

syslog-ng Premium Edition 6.0.17 - Administration Guide

Preface Chapter 1. Introduction to syslog-ng Chapter 2. The concepts of syslog-ng Chapter 3. Installing syslog-ng Chapter 4. The syslog-ng PE quick-start guide Chapter 5. The syslog-ng PE configuration file Chapter 6. Collecting log messages — sources and source drivers Chapter 7. Sending and storing log messages — destinations and destination drivers Chapter 8. Routing messages: log paths, reliability, and filters Chapter 9. Global options of syslog-ng PE Chapter 10. TLS-encrypted message transfer Chapter 11. FIPS-compliant syslog-ng Chapter 12.  Reliable Log Transfer Protocol™ Chapter 13. Reliability and minimizing the loss of log messages Chapter 14. Manipulating messages Chapter 15. Parsing and segmenting structured messages Chapter 16. Processing message content with a pattern database Chapter 17. Statistics and metrics of syslog-ng Chapter 18. Multithreading and scaling in syslog-ng PE Chapter 19. Troubleshooting syslog-ng Chapter 20. Best practices and examples

Chapter 14. Manipulating messages

This chapter explains the methods that you can use to customize, reformat, and modify log messages using syslog-ng Premium Edition.

Customizing message format

The following sections describe how to customize the names of logfiles, and also how to use templates, macros, and template functions.

Formatting messages, filenames, directories, and tablenames

The syslog-ng PE application can dynamically create filenames, directories, or names of database tables using macros that help you organize your log messages. Macros refer to a property or a part of the log message, for example, the ${HOST} macro refers to the name or IP address of the client that sent the log message, while ${DAY} is the day of the month when syslog-ng has received the message. Using these macros in the path of the destination log files allows you for example to collect the logs of every host into separate files for every day.

A set of macros can be defined as a template object and used in multiple destinations.

Another use of macros and templates is to customize the format of the syslog message, for example, to add elements of the message header to the message text.

NOTE:

If a message uses the IETF-syslog format (RFC5424), only the text of the message can be customized (that is, the $MESSAGE part of the log), the structure of the header is fixed.

Templates and macros

The syslog-ng PE application allows you to define message templates, and reference them from every object that can use a template. Templates can include strings, macros (for example date, the hostname, and so on), and template functions. For example, you can use templates to create standard message formats or filenames. For a list of macros available in syslog-ng Premium Edition, see the section called “Macros of syslog-ng PE. For the macros of the syslog-ng Agent for Windows application, see Administration Guide for syslog-ng Agent for Windows. Fields from the structured data (SD) part of messages using the new IETF-syslog standard can also be used as macros.

Declaration: 

template <template-name> {
    template("<template-expression>") <template-escape(yes)>;
};

Template objects have a single option called template-escape(), which is disabled by default (template-escape(no)). This behavior is useful when the messages are passed to an application that cannot handle escaped characters properly. Enabling template escaping (template-escape(yes)) causes syslog-ng to escape the ', ", and backslash characters from the messages.

NOTE:

In versions 2.1 and earlier, the template-escape() option was enabled by default.

Macros can be included by prefixing the macro name with a $ sign, just like in Bourne compatible shells. Although using braces around macro names is not mandatory, and the "$MSG" and "${MSG}" formats are equivalent, using the "${MSG}" format is recommended for clarity.

To use a literal $ character in a template, you have to escape it. In syslog-ng PE versions 4.0-4.2, use a backslash (\$). In version 5.0 and later, use $$.

NOTE:

To use a literal @ character in a template, use @@.

Default values for macros can also be specified by appending the :- characters and the default value of the macro. If a message does not contain the field referred to by the macro, or it is empty, the default value will be used when expanding the macro. For example, if a message does not contain a hostname, the following macro can specify a default hostname.

${HOST:-default_hostname}

NOTE:

For the macros of the syslog-ng Agent for Windows application, see Administration Guide for syslog-ng Agent for Windows.

By default, syslog-ng sends messages using the following template: ${ISODATE} ${HOST} ${MSGHDR}${MSG}\n. (The ${MSGHDR}${MSG} part is written together because the ${MSGHDR} macro includes a trailing whitespace.)

NOTE:

Earlier versions of syslog-ng used templates and scripts to send log messages into SQL databases. Starting from version 2.1, syslog-ng natively supports direct database access using the sql() destination. For details, see the section called “sql() destination options”.

Example 14.1. Using templates and macros

The following template (t_demo_filetemplate) adds the date of the message and the name of the host sending the message to the beginning of the message text. The template is then used in a file destination: messages sent to this destination (d_file) will use the message format defined in the template.

template t_demo_filetemplate {
             template("${ISODATE} ${HOST} ${MSG}\n"); template-escape(no); };
destination d_file {
             file("/var/log/messages" template(t_demo_filetemplate)); };

Templates can also be used inline, if they are used only at a single location. The following destination is equivalent with the previous example:

destination d_file {
                file ("/var/log/messages"
                        template("${ISODATE} ${HOST} ${MSG}\n") template-escape(no) );
                };

The following file destination uses macros to daily create separate logfiles for every client host.

destination d_file {
        file("/var/log/${YEAR}.${MONTH}.${DAY}/${HOST}.log");
};

NOTE:

Macros can be used to format messages, and also in the name of destination files or database tables. However, they cannot be used in sources as wildcards, for example, to read messages from files or directories that include a date in their name.

Date-related macros

The macros related to the date of the message (for example: ${ISODATE}, ${HOUR}, and so on) have two further variants each:

  • S_ prefix, for example, ${S_DATE}: The ${S_DATE} macro represents the date found in the log message, that is, when the message was sent by the original application.

    Caution:

    To use the S_ macros, the keep-timestamp() option must be enabled (this is the default behavior of syslog-ng PE).

  • R_ prefix, for example, ${R_DATE}: ${R_DATE} is the date when syslog-ng PE has received the message.

The ${DATE} macro equals the ${S_DATE} macro.

The values of the date-related macros are calculated using the original timezone information of the message. To convert it to a different timezone, use the time-zone() option. You can set the time-zone() option as a global option, or per destination. For sources, it applies only if the original message does not contain timezone information. Converting the timezone changes the values of the following date-related macros (macros MSEC and USEC are not changed):

  • AMPM

  • DATE

  • DAY

  • FULLDATE

  • HOUR

  • HOUR12

  • ISODATE

  • MIN

  • MONTH

  • MONTH_ABBREV

  • MONTH_NAME

  • MONTH_WEEK

  • SEC

  • STAMP

  • TZ

  • TZOFFSET

  • UNIXTIME

  • WEEK

  • WEEK_DAY

  • WEEK_DAY_ABBREV

  • WEEK_DAY_NAME

  • YEAR

  • YEAR_DAY

Hard vs. soft macros

Hard macros contain data that is directly derived from the log message, for example, the ${MONTH} macro derives its value from the timestamp. Hard macros are read-only. Soft macros (sometimes also called name-value pairs) are either built-in macros automatically generated from the log message (for example, ${HOST}), or custom user-created macros generated by using the syslog-ng pattern database or a CSV-parser. In contrast to hard macros, soft macros are writable and can be modified within syslog-ng PE, for example, using rewrite rules.

Hard and soft macros are rather similar and often treated as equivalent. Macros are most commonly used in filters and templates, which does not modify the value of the macro, so both soft and hard macros can be used. However, it is not possible to change the values of hard macros in rewrite rules or via any other means.

The following macros in syslog-ng PE are hard macros and cannot be modified: BSDTAG, CONTEXT_ID, DATE, DAY, FACILITY_NUM, FACILITY, FULLDATE, HOUR, ISODATE, LEVEL_NUM, LEVEL, MIN, MONTH_ABBREV, MONTH_NAME, MONTH, MONTH_WEEK, PRIORITY, PRI, RCPTID, SDATA, SEC, SEQNUM, SOURCEIP, STAMP, TAG, TAGS, TZOFFSET, TZ, UNIXTIME, WEEK_DAY_ABBREV, WEEK_DAY_NAME, WEEK_DAY, WEEK, YEAR_DAY, YEAR.

The following macros can be modified:FULLHOST_FROM, FULLHOST, HOST_FROM, HOST, LEGACY_MSGHDR, MESSAGE, MSG,MSGID, MSGONLY, PID, PROGRAM, SOURCE. Custom values created using rewrite rules or parsers can be modified as well, just like stored matches of regular expressions ($0 ... $255).

Macros of syslog-ng PE

The following macros are available in syslog-ng PE.

Caution:

These macros are available when syslog-ng PE successfully parses the incoming message as a syslog message, or you use some other parsing method and map the parsed values to these macros.

If you are using the flags(no-parse) option, then syslog message parsing is completely disabled, and the entire incoming message is treated as the ${MESSAGE} part of a syslog message. In this case, syslog-ng PE generates a new syslog header (timestamp, host, and so on) automatically. Note that since flags(no-parse) disables message parsing, it interferes with other flags, for example, disables flags(no-multi-line).

AMPM

Description: Typically used together with the ${HOUR12} macro, ${AMPM} returns the period of the day: AM for hours before mid day and PM for hours after mid day. In reference to a 24-hour clock format, AM is between 00:00-12:00 and PM is between 12:00-24:00. 12AM is midnight. Available in syslog-ng PE 3.2 and later.

APP_NAME

Description: An alias for the APPLICATION_NAME macro.

APPLICATION_NAME

Description:

  • At event container: Name of the application the message came from

  • At file as the name of creator (default value): syslog-ng-agent

BSDDATE, R_BSDDATE, S_BSDDATE

Description: Date of the message in BSD timestamp format (month/day/hour/minute/second, each expressed in two digits). This is the original syslog time stamp without year information, for example Jun 13 15:58:00. If possible, it is recommended to use ISODATE for timestamping.

BSDTAG

Description: Facility/priority information in the format used by the FreeBSD syslogd: a priority number followed by a letter that indicates the facility. The priority number can range from 0 to 7. The facility letter can range from A to Y, where A corresponds to facility number zero (LOG_KERN), B corresponds to facility 1 (LOG_USER), and so on.

Custom macros

Description: CSV parsers and pattern databases can also define macros from the content of the messages, for example, a pattern database rule can extract the username from a login message and create a macro that references the username. For details on using custom macros created with CSV parsers and pattern databases, see Chapter 15, Parsing and segmenting structured messages and the section called “Using parser results in filters and templates”, respectively.

DATE, R_DATE, S_DATE

Description: Date of the message using the BSD-syslog style timestamp format (month/day/hour/minute/second, each expressed in two digits). This is the original syslog time stamp without year information, for example: Jun 13 15:58:00.

DAY, R_DAY, S_DAY

Description: The day the message was sent.

FACILITY

Description: The name of the facility (for example, kern) that sent the message.

FACILITY_NUM

Description: The numerical code of the facility (for example, 0) that sent the message.

FILE_FACILITY

Description: The facility that sent the message.

FILE_LEVEL

Description: Importance level of the message represented as a number: 6 - Success, 5 - Informational, 4- Warning, or 3 - Error).

FILE_MESSAGE

Description: The content of the message.

FILE_MSG

Description: The content of the message. This is an alias of the FILE_MESSAGE macro.

FILE_NAME

Description: Name of the log file (including its path) from where the syslog-ng PE received the message.

FULLDATE, R_FULLDATE, S_FULLDATE

Description: A nonstandard format for the date of the message using the same format as ${DATE}, but including the year as well, for example: 2006 Jun 13 15:58:00.

FULLHOST

Description: The name of the source host where the message originates from.

  • If the message traverses several hosts and the chain-hostnames() option is on, the first host in the chain is used.

  • If the keep-hostname() option is disabled (keep-hostname(no)), the value of the $FULLHOST macro will be the DNS hostname of the host that sent the message to syslog-ng PE (that is, the DNS hostname of the last hop). In this case the $FULLHOST and $FULLHOST_FROM macros will have the same value.

  • If the keep-hostname() option is enabled (keep-hostname(yes)), the value of the $FULLHOST macro will be the hostname retrieved from the log message. That way the name of the original sender host can be used, even if there are log relays between the sender and the server.

    NOTE:

    The use-dns(), use-fqdn(), normalize-hostnames(), and dns-cache() options will have no effect if the keep-hostname() option is enabled (keep-hostname(yes)) and the message contains a hostname.

For details on using name resolution in syslog-ng PE, see the section called “Using name resolution in syslog-ng”.

FULLHOST_FROM

Description: The FQDN of the host that sent the message to syslog-ng as resolved by syslog-ng using DNS. If the message traverses several hosts, this is the last host in the chain.

The syslog-ng PE application uses the following procedure to determine the value of the $FULLHOST_FROM macro:

  1. The syslog-ng PE application takes the IP address of the host sending the message.

  2. If the use-dns() option is enabled, syslog-ng PE attempts to resolve the IP address to a hostname. If it succeeds, the returned hostname will be the value of the $FULLHOST_FROM macro. This value will be the FQDN of the host if the use-fqdn() option is enabled, but only the hostname if use-fqdn() is disabled.

  3. If the use-dns() option is disabled, or the address resolution fails, the ${FULLHOST_FROM} macro will return the IP address of the sender host.

For details on using name resolution in syslog-ng PE, see the section called “Using name resolution in syslog-ng”.

HOUR, R_HOUR, S_HOUR

Description: The hour of day the message was sent.

HOUR12, R_HOUR12, S_HOUR12

Description: The hour of day the message was sent in 12-hour clock format. See also the ${AMPM} macro. 12AM is midnight. Available in syslog-ng PE 3.2 and later.

HOST

Description: The name of the source host where the message originates from.

  • If the message traverses several hosts and the chain-hostnames() option is on, the first host in the chain is used.

  • If the keep-hostname() option is disabled (keep-hostname(no)), the value of the $HOST macro will be the DNS hostname of the host that sent the message to syslog-ng PE (that is, the DNS hostname of the last hop). In this case the $HOST and $HOST_FROM macros will have the same value.

  • If the keep-hostname() option is enabled (keep-hostname(yes)), the value of the $HOST macro will be the hostname retrieved from the log message. That way the name of the original sender host can be used, even if there are log relays between the sender and the server.

    NOTE:

    The use-dns(), use-fqdn(), normalize-hostnames(), and dns-cache() options will have no effect if the keep-hostname() option is enabled (keep-hostname(yes)) and the message contains a hostname.

For details on using name resolution in syslog-ng PE, see the section called “Using name resolution in syslog-ng”.

HOST_FROM

Description: The FQDN of the host that sent the message to syslog-ng as resolved by syslog-ng using DNS. If the message traverses several hosts, this is the last host in the chain.

The syslog-ng PE application uses the following procedure to determine the value of the $HOST_FROM macro:

  1. The syslog-ng PE application takes the IP address of the host sending the message.

  2. If the use-dns() option is enabled, syslog-ng PE attempts to resolve the IP address to a hostname. If it succeeds, the returned hostname will be the value of the $HOST_FROM macro. This value will be the FQDN of the host if the use-fqdn() option is enabled, but only the hostname if use-fqdn() is disabled.

  3. If the use-dns() option is disabled, or the address resolution fails, the ${HOST_FROM} macro will return the IP address of the sender host.

For details on using name resolution in syslog-ng PE, see the section called “Using name resolution in syslog-ng”.

ISODATE, R_ISODATE, S_ISODATE

Description: Date of the message in the ISO 8601 compatible standard timestamp format (yyyy-mm-ddThh:mm:ss+-ZONE), for example: 2006-06-13T15:58:00.123+01:00. If possible, it is recommended to use ${ISODATE} for timestamping. Note that syslog-ng can produce fractions of a second (for example milliseconds) in the timestamp by using the frac-digits() global or per-destination option.

LEVEL_NUM

Description: The priority (also called severity) of the message, represented as a numeric value, for example, 3. For the textual representation of this value, use the ${LEVEL} macro. See the section called “PRIORITY or LEVEL” for details.

MIN, R_MIN, S_MIN

Description: The minute the message was sent.

MONTH, R_MONTH, S_MONTH

Description: The month the message was sent as a decimal value, prefixed with a zero if smaller than 10.

MONTH_ABBREV, R_MONTH_ABBREV, S_MONTH_ABBREV

Description: The English abbreviation of the month name (3 letters).

MONTH_NAME, R_MONTH_NAME, S_MONTH_NAME

Description: The English name of the month name.

MONTH_WEEK, R_MONTH_WEEK, S_MONTH_WEEK

Description: The number of the week in the given month (0-5). The week with numerical value 1 is the first week containing a Monday. The days of month before the first Monday are considered week 0. For example, if a 31-day month begins on a Sunday, then the 1st of the month is week 0, and the end of the month (the 30th and 31st) is week 5.

MONTHNAME, R_MONTHNAME, S_MONTHNAME

Description: The English name of the month the message was sent, abbreviated to three characters (for example Jan, Feb, and so on).

MSEC, R_MSEC, S_MSEC

Description: The millisecond the message was sent.

Available in syslog-ng PE version 4 F2 and later.

MSG or MESSAGE

Description: Text contents of the log message without the program name and pid. Note that this has changed in syslog-ng version 3.0: in earlier versions this macro included the program name and the pid. In syslog-ng 3.0, the ${MSG} macro became equivalent with the ${MSGONLY} macro. The program name and the pid together are available in the ${MSGHDR} macro.

If you are using the flags(no-parse) option, then syslog message parsing is completely disabled, and the entire incoming message is treated as the ${MESSAGE} part of a syslog message. In this case, syslog-ng PE generates a new syslog header (timestamp, host, and so on) automatically. Note that since flags(no-parse) disables message parsing, it interferes with other flags, for example, disables flags(no-multi-line).

MSGHDR

Description: The name and the PID of the program that sent the log message in PROGRAM[PID]: format. Includes a trailing whitespace. Note that the macro returns an empty value if both the PROGRAM and PID fields of the message are empty.

MSGID

Description: A string specifying the type of the message in IETF-syslog (RFC5424-formatted) messages. For example, a firewall might use the ${MSGID} "TCPIN" for incoming TCP traffic and the ${MSGID} "TCPOUT" for outgoing TCP traffic. By default, syslog-ng PE does not specify this value, but uses a dash (-) character instead. If an incoming message includes the ${MSGID} value, it is retained and relayed without modification.

MSGONLY

Description: Message contents without the program name or pid.

OSUPTIME

Description: The time elapsed since the computer running syslog-ng PE has booted. The value of this macro is an integer containing the time in 1/100th of the second. Note that currently syslog-ng PE can access this data only on Linux platforms. On other platforms the macro contains the time elapsed since syslog-ng PE was started.

Note that syslog-ng PE evaluates the macro every time it is processed, so even if you use the same macro for the same message, its value can be different. For example, if you use it in a filter and in a destination filename, their values will be different even for the same message.

Available in syslog-ng PE version 6.0.5 and later.

PID

Description: The PID of the program sending the message.

PRI

Description: The priority and facility encoded as a 2 or 3 digit decimal number as it is present in syslog messages.

PRIORITY or LEVEL

Description: The priority (also called severity) of the message, for example, error. For the textual representation of this value, use the ${LEVEL} macro. See the section called “PRIORITY or LEVEL” for details.

PROCESS_ID

Description: PID of the application the message came from.

PROGRAM

Description: The name of the program sending the message. Note that the content of the ${PROGRAM} variable may not be completely trusted as it is provided by the client program that constructed the message.

RCPTID

Description: This is disabled by default due to performance issues. To enable it, add the next option to the global options: use-uniqid(yes);. For details, see also the section called “use-uniqid()”. A unique ID for messages generated at reception time on the receiving host. It facilitates defining relationships between messages that are potentially distributed to different files on the same host, or different hosts.

Example 14.2. Using ${RCPTID} macro

Using the following template statement in the configuration: template demo_template { template("${DATE} ${HOST} ${PROG}: ${MSG} ID:${RCPTID}\n"); }; the outgoing message will be the following: <133>Feb 25 14:09:07 webserver syslogd: restart. ID:1.


SDATA, .SDATA.SDID.SDNAME, STRUCTURED_DATA

Description: The syslog-ng application automatically parses the STRUCTURED-DATA part of IETF-syslog messages, which can be referenced in macros. The ${SDATA} macro references the entire STRUCTURED-DATA part of the message, while structured data elements can be referenced using the ${.SDATA.SDID.SDNAME} macro.Available only in syslog-ng Premium Edition 4.0 and later.

NOTE:

When using STRUCTURED-DATA macros, consider the following:

  • When referencing an element of the structured data, the macro must begin with the dot (.) character. For example, ${.SDATA.timeQuality.isSynced}.

  • The SDID and SDNAME parts of the macro names are case sensitive: ${.SDATA.timeQuality.isSynced} is not the same as ${.SDATA.TIMEQUALITY.ISSYNCED}.

Example 14.3. Using SDATA macros

For example, if a log message contains the following structured data: [exampleSDID@0 iut="3" eventSource="Application" eventID="1011"][examplePriority@0 class="high"] you can use macros like: ${.SDATA.exampleSDID@0.eventSource} — this would return the Application string in this case.


SEC, R_SEC, S_SEC

Description: The second the message was sent.

SEQNUM

Description: The ${SEQNUM} macro contains a sequence number for the log message. The value of the macro depends on the scenario, and can be one of the following:

  • If syslog-ng PE receives a message via the IETF-syslog protocol that includes a sequence ID, this ID is automatically available in the ${SEQNUM} macro.

  • If the message is a Cisco IOS log message using the extended timestamp format, then syslog-ng PE stores the sequence number from the message in this macro. If you forward this message the IETF-syslog protocol, syslog-ng PE includes the sequence number received from the Cisco device in the ${.SDATA.meta.sequenceId} part of the message.

    NOTE:

    To enable sequence numbering of log messages on Cisco devices, use the following command on the device (available in IOS 10.0 and later): service sequence-numbers. For details, see the manual of your Cisco device.

  • For locally generated messages (that is, for messages that are received from a local source, and not from the network), syslog-ng PE calculates a sequence number when sending the message to a destination (it is not calculated for relayed messages).

    • The sequence number is not global, but per-destination. Essentially, it counts the number of messages sent to the destination.

    • This sequence number increases by one for every message sent to the destination, and is not lost even when syslog-ng PE is reloaded or restarted.

    • This sequence number is added to every message that uses the IETF-syslog protocol (${.SDATA.meta.sequenceId}), and can be added to BSD-syslog messages using the ${SEQNUM} macro.

NOTE:

If you need a sequence number for every log message that syslog-ng PE receives, use the RCPTID macro.

SOURCE

Description: The identifier of the source statement in the syslog-ng PE configuration file that received the message. For example, if syslog-ng PE received the log message from the source s_local { internal(); }; source statement, the value of the ${SOURCE} macro is s_local. This macro is mainly useful for debugging and troubleshooting purposes.

SOURCEIP

Description: IP address of the host that sent the message to syslog-ng. (That is, the IP address of the host in the ${FULLHOST_FROM} macro.) Please note that when a message traverses several relays, this macro contains the IP of the last relay.

STAMP, R_STAMP, S_STAMP

Description: A timestamp formatted according to the ts-format() global or per-destination option.

SYSUPTIME

Description: The time elapsed since the syslog-ng PE instance was started (that is, the uptime of the syslog-ng PE process). The value of this macro is an integer containing the time in 1/100th of the second. For the uptime of the host running syslog-ng PE see the section called “OSUPTIME”.

Note that syslog-ng PE evaluates the macro every time it is processed, so even if you use the same macro for the same message, its value can be different. For example, if you use it in a filter and in a destination filename, their values will be different even for the same message.

Available in syslog-ng PE version 4 F1 and later.

TAG

Description: The priority and facility encoded as a 2 digit hexadecimal number.

TAGS

Description: A comma-separated list of the tags assigned to the message. Available only in syslog-ng Premium Edition 3.2 and later.

NOTE:

Note that the tags are not part of the log message and are not automatically transferred from a client to the server. For example, if a client uses a pattern database to tag the messages, the tags are not transferred to the server. A way of transferring the tags is to explicitly add them to the log messages using a template and the ${TAGS} macro, or to add them to the structured metadata part of messages when using the IETF-syslog message format.

When sent as structured metadata, it is possible to reference to the list of tags on the central server, and for example, to add them to a database column.

TZ, R_TZ, S_TZ

Description: An alias of the ${TZOFFSET} macro.

TZOFFSET, R_TZOFFSET, S_TZOFFSET

Description: The time-zone as hour offset from GMT, for example: -07:00. In syslog-ng 1.6.x this used to be -0700 but as ${ISODATE} requires the colon it was added to ${TZOFFSET} as well.

UNIXTIME, R_UNIXTIME, S_UNIXTIME

Description: Standard UNIX timestamp, represented as the number of seconds since 1970-01-01T00:00:00.

UNIQID

Description: A globally unique ID generated from the HOSTID and the RCPTID in the format of HOSTID@RCPTID. For details, see the section called “use-uniqid()” and the section called “RCPTID”.

Available in syslog-ng PE version 5 F2 and later.

USEC, R_USEC, S_USEC

Description: The microsecond the message was sent.

Available in syslog-ng PE version 4 F2 and later.

YEAR, R_YEAR, S_YEAR

Description: The year the message was sent.

WEEK, R_WEEK, S_WEEK

Description: The week number of the year, prefixed with a zero for the first nine week of the year. (The first Monday in the year marks the first week.)

WEEK_ABBREV, R_WEEK_ABBREV, S_WEEK_ABBREV

Description: The 3-letter English abbreviation of the name of the day the message was sent, for example Thu.

WEEK_DAY, R_WEEK_DAY, S_WEEK_DAY

Description: The day of the week as a numerical value (1-7).

WEEKDAY, R_WEEKDAY, S_WEEKDAY

Description: These macros are deprecated, use ${WEEK_ABBREV}, ${R_WEEK_ABBREV}, ${S_WEEK_ABBREV} instead. The 3-letter name of the day of week the message was sent, for example Thu.

WEEK_DAY_NAME, R_WEEK_DAY_NAME, S_WEEK_DAY_NAME

Description: The English name of the day.

Using template functions

A template function is a transformation: it modifies the way macros or name-value pairs are expanded. Template functions can be used in template definitions, or when macros are used in the configuration of syslog-ng PE. Template functions use the following syntax:

$(function-name parameter1 parameter2 parameter3 ...)

For example, the $(echo) template function simply returns the value of the macro it receives as a parameter, thus $(echo ${HOST}) is equivalent to ${HOST}.

The parameters of template functions are separated by a whitespace character. If you want to use a longer string or multiple macros as a single parameter, enclose the parameter in double-quotes or apostrophes. For example:

$(echo "${HOST} ${PROGRAM} ${PID}")

Template functions can be nested into each other, so the parameter of a template function can be another template function, like:

$(echo $(echo ${HOST}))

For details on using template functions, see the descriptions of the individual template functions in the section called “Template functions of syslog-ng PE.

Template functions of syslog-ng PE

The following template functions are available in syslog-ng PE.

echo

Syntax:

$(echo argument)

Description: Returns the value of its argument. Using $(echo ${HOST}) is equivalent to ${HOST}.

format-cef-extension

syslog-ng PE version 5 F6 includes a new template function (format-cef-extension) to format name-value pairs as ArcSight Common Event Format extensions. Note that the template function only formats the selected name-value pairs, it does not provide any mapping. There is no special support for creating the prefix part of a Common Event Format (CEF) message. Note that the order of the elements is random. For details on the CEF extension escaping rules format, see the ArcSight Common Event Format.

You can use the value-pairs that syslog-ng PE stores about the log message as CEF fields. Using value-pairs, you can:

  • select which value-pairs to use as CEF fields,

  • add custom value-pairs as CEF fields,

  • rename value-pairs, and so on.

For details, see the section called “Structuring macros, metadata, and other value-pairs”. Note that the syntax of format-* template functions is different from the syntax of value-pairs(): these template functions use a syntax similar to command lines.

Using the format-cef-extension template function, has the following prerequisites:

  • Load the the cef module in your configuration:

    @module cef
  • Set the on-error global option to drop-property, otherwise if the name of a name-value pair includes an invalid character, syslog-ng PE drops the entire message. (Key name in CEF extensions can contain only the A-Z, a-z and 0-9 characters.)

    options {
       on-error("drop-property");
    };
  • The log messages must be encoded in UTF-8. Use the encoding() option or the validate-utf8 flag in the message source.

Example 14.4. Using the format-cef-extension template function

The following example selects every available information about the log message, except for the date-related macros (R_* and S_*), selects the .SDATA.meta.sequenceId macro, and defines a new value-pair called MSGHDR that contains the program name and PID of the application that sent the log message (since you will use the template-function in a template, you must escape the double-quotes).

$(format-cef-extension --scope syslog,all_macros,selected_macros \
  --exclude R_* --exclude S_* --key .SDATA.meta.sequenceId \
  --pair MSGHDR=\"$PROGRAM[$PID]: \")

The following example selects every value-pair that has a name beginning with .cef., but removes the .cef. prefix from the key names.

template("$(format-cef-extension --subkeys .cef.)\n")

The following example shows how to use this template function to store log messages in CEF format:

destination d_cef_extension {
  file("/var/log/messages.cef" template("${ISODATE} ${HOST} $(format-cef-extension --scope selected_macros --scope nv_pairs)\n"));
};

format-json

Syntax:

$(format-json parameters)

Description: The format-json template function receives value-pairs as parameters and converts them into JavaScript Object Notation (JSON) format. Including the template function in a message template allows you to store selected information about a log message (that is, its content, macros, or other metadata) in JSON format. Note that the input log message does not have to be in JSON format to use format-json, you can reformat any incoming message as JSON.

You can use the value-pairs that syslog-ng PE stores about the log message as JSON fields. Using value-pairs, you can:

  • select which value-pairs to use as JSON fields,

  • add custom value-pairs as JSON fields,

  • rename value-pairs, and so on.

For details, see the section called “Structuring macros, metadata, and other value-pairs”. Note that the syntax of format-json is different from the syntax of value-pairs(): format-json uses a syntax similar to command lines.

NOTE:

By default, syslog-ng PE handles every message field as a string. For details on how to send selected fields as other types of data (for example, handle the PID as a number), see the section called “Specifying data types in value-pairs”.

Example 14.5. Using the format-json template function

The following example selects every available information about the log message, except for the date-related macros (R_* and S_*), selects the .SDATA.meta.sequenceId macro, and defines a new value-pair called MSGHDR that contains the program name and PID of the application that sent the log message (since you will use the template-function in a template, you must escape the double-quotes).

$(format-json --scope syslog,all_macros,selected_macros \
  --exclude R_* --exclude S_* --key .SDATA.meta.sequenceId \
  --pair MSGHDR=\"$PROGRAM[$PID]: \")

The following example shows how to use this template function to store log messages in JSON format:

destination d_json {
  file("/var/log/messages.json" template("$(format_json --scope selected_macros --scope nv_pairs)\n"));
};

NOTE:

In case of syslog-ng macros starting with a dot (for example ".SDATA.meta.sequenceID") an empty key name is added at the top level of the JSON structure. You can work around this by adding --shift 1 as a parameter to the template function. For example in case of ".SDATA.meta.sequenceID", an empty key name is added at the top level of the JSON structure:

{"":
    {"SDATA" :
        {"meta" :
            {"sequenceID": "123"}
        }
    }
}
format-welf

This template function converts value-pairs into the WebTrends Enhanced Log file Format (WELF). The WELF format is a comma-separated list of name=value elements. Note that the order of the elements is random. If the value contains whitespace, it is enclosed in double-quotes, for example, name="value". For details on the WELF format, see https://www3.trustwave.com/support/kb/article.aspx?id=10899.

To select which value-pairs to convert, use the command-line syntax of the value-pairs() option. For details on selecting value-pairs, see the section called “value-pairs()”.

Example 14.6. Using the format-welf() template function

The following example selects every available information about the log message, except for the date-related macros (R_* and S_*), selects the .SDATA.meta.sequenceId macro, and defines a new value-pair called MSGHDR that contains the program name and PID of the application that sent the log message (since you will use the template-function in a template, you must escape the double-quotes).

$(format-welf --scope syslog,all_macros,selected_macros \
  --exclude R_* --exclude S_* --key .SDATA.meta.sequenceId \
  --pair MSGHDR=\"$PROGRAM[$PID]: \")

The following example shows how to use this template function to store log messages in WELF format:

destination d_welf {
  file("/var/log/messages.welf" template("$(format-welf --scope selected_macros --scope nv_pairs)\n"));
};

grep

Syntax:

$(grep condition value-to-select)

Description: The grep template function is useful when using a pattern database to correlate related log messages. The grep template function can be used to filter the messages of the same context when the index of the particular message is not known.

Example 14.7. Using the grep template function

The following example selects the message of the context that has a username name-value pair with the root value, and returns the value of the auth_method name-value pair.

$(grep ("${username}" == "root") ${auth_method})

It is possible to specify multiple name-value pairs as parameters, separated with commas. If multiple messages match the condition of grep, these will be returned also separated by commas. This can be used for example to collect the e-mail recipients from postfix messages.

hash

Syntax:

$(<method> [opts] $arg1 $arg2 $arg3...)

Options:

--length N, -l N

Truncate the hash to the first N characters.

Description: Calculates a hash of the string or macro received as argument using the specified hashing method. If you specify multiple arguments, effectively you receive the hash of the first argument salted with the subsequent arguments.

<method> can be one of md5, md4, sha1, sha256, sha512 and "hash", which is equivalent to sha256. Macros are expected as arguments, and they are concatenated without the use of additional characters.

This template function can be used for anonymizing sensitive parts of the log message (for example username) that were parsed out using PatternDB before storing or forwarding the message. This way, the ability of correlating messages along this value is retained.

Also, using this template, quasi-unique IDs can be generated for data, using the --length option. This way, IDs will be shorter than a regular hash, but there is a very small possibility of them not being as unique as a non-truncated hash.

Example 14.8. Using the $(hash) template function

The following example calculates the SHA1 hash of the hostname of the message:

$(sha1 $HOST)

The following example calculates the SHA256 hash of the hostname, using the salted string to salt the result:

$(sha1 $HOST salted)

To use shorter hashes, set the --length:

$(sha1 --length 6 $HOST)

To replace the hostname with its hash, use a rewrite rule:

rewrite r_rewrite_hostname{set("$(sha1 $HOST)", value("HOST"));};

Example 14.9. Anonymizing IP addresses

The following example replaces every IPv4 address in the MESSAGE part with its SHA-1 hash:

rewrite pseudonymize_ip_addresses_in_message {
  subst (
    "((([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])[.]){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]))",
    "$(sha1 $0)",
    value("MSG"),
    flags(global)
  );
};

if

Syntax:

$(if (<condition>) <true template> <false template>)

Description: Returns the value of the <true template> parameter if the <condition> is true. If the <condition> is false, the value of <false template> is returned.

Example 14.10. Using pattern databases and the if template function

The following example returns violation if the username name-value pair of a message processed with pattern database is root, and system otherwise.

$(if ('${username}' == 'root') 'violation' 'system')

This can be used to set the class of a message in pattern database rules based on the condition.

<value name="username">$(if ("${username}" == "root") "violation" "system")</value>

Since template functions can be embedded into each other, it is possible to use another template function as the template of the first one. For example, the following expression returns root if the username is root, admin if the username is joe, and normal user otherwise.

<value name="username">    $(if ("${username}" == "root")
        "root"
        $(if ("${username}" == "joe") "admin" "normal user")
        "normal user")</value>

indent-multi-line

Syntax:

$(indent-multi-line parameter)

Description: This template function makes it possible to write multi-line log messages into a file. The first line is written like a regular message, subsequent lines are indented with a tab, in compliance with RFC822.

Example 14.11. Using the indent-multi-line template function

The following example writes multi-line messages into a text file.

destination d_file {
        file ("/var/log/messages"
                template("${ISODATE} ${HOST} $(indent-multi-line ${MESSAGE})\n") );
};

ipv4-to-int

Syntax:

$(ipv4-to-int parameter)

Description: Converts the specified IPv4 address to its numeric representation. The numerical value of an IPv4 address is calculated by treating the IP address as a 4-byte hexadecimal value. For example, the 192.168.1.1 address equals to: 192=C0, 168=A8, 1=01, 1=01, or C0A80101, which is 3232235777 in decimal representation.

NOTE:

This template function is available only if the convertfuncs module has been loaded. By default, syslog-ng PE loads every available module.

By default, syslog-ng PE loads every available module. For details, see the section called “Loading modules”

Numeric operations

Syntax:

$(<operator> <first_operand> <second_operand>)

Description:

This template function performs simple numerical operations (like addition or multiplication) on integer numbers or macros containing integer numbers (for example, ${LEVEL_NUM} or ${YEAR}), and returns the result of the operation. The values used in the operation are not modified, that is, macros retain their original values. If one of the operands is not an integer, syslog-ng PE will not execute the template function, and return the NaN (Not-a-Number) value.

Available only in syslog-ng PE4 F1 and later.

Operator Operation:
+ Addition: returns the value of <first_operand>+<second_operand>
- Subtraction: returns the value of <first_operand>-<second_operand>
* Multiplication: returns the value of <first_operand>*<second_operand>
/ Division: returns the value of <first_operand>/<second_operand>
% Modulus (remainder): returns the remainder of <first_operand>/<second_operand>

Caution:

The output of this template function is always an integer. If the return value would be a floating-point number, the floating part is simply omitted, for example, 5.2 becomes 5, 5.8 becomes 5, -5.8 becomes -5.

It is also possible to nest numerical operations.

Example 14.12. Using numerical template functions

The following template function returns the facility of the log message multiplied by 8:

$(* ${FACILITY_NUM} 8)

Template functions can be nested into each other: the following template function returns the facility of the log message multiplied by 8, then adds this value to the severity of the message (the result is actually the priority of the message):

$(+ ${LEVEL_NUM} $(* ${FACILITY_NUM} 8))

strip

Syntax:

$(strip "<macro>")

Description: Deletes whitespaces from the beginning and the end of a macro. You can specify multiple macros separated with whitespace in a single template function, for example:

$(strip "${MESSAGE}" "${PROGRAM}")

Available in syslog-ng PE version 6.0.6 and later.

Modifying messages

The syslog-ng application can rewrite parts of the messages using rewrite rules. Rewrite rules are global objects similar to parsers and filters and can be used in log paths. The syslog-ng application has two methods to rewrite parts of the log messages: substituting (setting) a part of the message to a fix value, and a general search-and-replace mode.

Substitution completely replaces a specific part of the message that is referenced using a built-in or user-defined macro.

General rewriting searches for a string in the entire message (or only a part of the message specified by a macro) and replaces it with another string. Optionally, this replacement string can be a template that contains macros.

Rewriting messages is often used in conjunction with message parsing Chapter 15, Parsing and segmenting structured messages.

Rewrite rules are similar to filters: they must be defined in the syslog-ng configuration file and used in the log statement. You can also define the rewrite rule inline in the log path.

NOTE:

The order of filters, rewriting rules, and parsers in the log statement is important, as they are processed sequentially.

Replacing message parts

To replace a part of the log message, you have to:

  • define a string or regular expression to find the text to replace

  • define a string to replace the original text (macros can be used as well)

  • select the field of the message that the rewrite rule should process

Substitution rules can operate on any soft macros, for example MESSAGE, PROGRAM, or any user-defined macros created using parsers. Hard macros cannot be modified. For details on the hard and soft macros, see the section called “Hard vs. soft macros”). You can also rewrite the structured-data fields of messages complying to the RFC5424 (IETF-syslog) message format. Substitution rules use the following syntax:

Declaration: 

rewrite <name_of_the_rule> {
    subst("<string or regular expression to find>",
        "<replacement string>", value(<field name>), flags() );
};

The type() and flags() options are optional. The type() specifies the type of regular expression to use, while the flags() are the flags of the regular expressions. For details on regular expressions, see the section called “Regular expressions”.

A single substitution rule can include multiple substitutions that are applied sequentially to the message. Note that rewriting rules must be included in the log statement to have any effect.

TIP:

For case-insensitive searches, add the flags(ignore-case) option. To replace every occurrence of the string, add flags(global) option.

Example 14.13. Using substitution rules

The following example replaces the IP in the text of the message with the string IP-Address.

rewrite r_rewrite_subst{subst("IP", "IP-Address", value("MESSAGE"));};

To replace every occurrence, use:

rewrite r_rewrite_subst{
    subst("IP", "IP-Address", value("MESSAGE"), flags("global"));
};

Multiple substitution rules are applied sequentially. The following rules replace the first occurrence of the string IP with the string IP-Addresses.

rewrite r_rewrite_subst{
    subst("IP", "IP-Address", value("MESSAGE"));
    subst("Address", "Addresses", value("MESSAGE"));
};

Example 14.14. Anonymizing IP addresses

The following example replaces every IPv4 address in the MESSAGE part with its SHA-1 hash:

rewrite pseudonymize_ip_addresses_in_message {
  subst (
    "((([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])[.]){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]))",
    "$(sha1 $0)",
    value("MSG"),
    flags(global)
  );
};

Setting message fields to specific values

To set a field of the message to a specific value, you have to:

  • define the string to include in the message, and

  • select the field where it should be included.

You can set the value of available macros, for example HOST, MESSAGE, PROGRAM, or any user-defined macros created using parsers (for details, see Chapter 15, Parsing and segmenting structured messages and Chapter 16, Processing message content with a pattern database). Hard macros cannot be modified. For details on the hard and soft macros, see the section called “Hard vs. soft macros”). Note that the rewrite operation completely replaces any previous value of that field. Use the following syntax:

Declaration: 

rewrite <name_of_the_rule> {
    set("<string to include>", value(<field name>));
};

Example 14.15. Setting message fields to a particular value

The following example sets the HOST field of the message to myhost.

rewrite r_rewrite_set{set("myhost", value("HOST"));};

The following example appends the "suffix" string to the MESSAGE field:

rewrite r_rewrite_set{set("$MESSAGE suffix", value("MESSAGE"));};

For details on rewriting SDATA fields, see the section called “Creating custom SDATA fields”.


Creating custom SDATA fields

If you use RFC5424-formatted (IETF-syslog) messages, you can also create custom fields in the SDATA part of the message (For details on the SDATA message part, see the section called “The STRUCTURED-DATA message part”). According to RFC5424, the name of the field (its SD-ID) must not contain the @ character for reserved SD-IDs. Custom SDATA fields must be in the following format: .SDATA.group-name@<private enterprise number>.field-name, for example, .SDATA.mySDATA-field-group@18372.4.mySDATA-field. (18372.4 is the private enterprise number of One Identity LLC, the developer of syslog-ng PE.)

Example 14.16. Rewriting custom SDATA fields

The following example sets the sequence ID field of the RFC5424-formatted (IETF-syslog) messages to a fixed value. This field is a predefined SDATA field with a reserved SD-ID, therefore its name does not contain the @ character.

rewrite r_sd {
    set("55555" value(".SDATA.meta.sequenceId"));
};

The next example creates a new SDATA field-group and field called custom and sourceip, respectively:

rewrite r_rewrite_set {
    set("${SOURCEIP}" value(".SDATA.custom@18372.4.sourceip"));
};

If you use the ${.SDATA.custom@18372.4.sourceip} macro in a template or SQL table, its value will be that of the SOURCEIP macro (as seen on the machine where the SDATA field was created) for every message that was processed with this rewrite rule, and empty for every other message.

You can verify whether or not the format is correct by looking at the actual network traffic. The SDATA field-group will be called custom@18372.4, and sourceip will become a field within that group. If you decide to set up several fields, they will be listed in consecutive order within the field-group's SDATA block.


NOTE:

When using STRUCTURED-DATA macros, consider the following:

  • When referencing an element of the structured data, the macro must begin with the dot (.) character. For example, ${.SDATA.timeQuality.isSynced}.

  • The SDID and SDNAME parts of the macro names are case sensitive: ${.SDATA.timeQuality.isSynced} is not the same as ${.SDATA.TIMEQUALITY.ISSYNCED}.

Setting multiple message fields to specific values

The groupset() rewrite rule allows you to modify the value of multiple message fields at once, for example, to change the value of sensitive fields extracted using patterndb, or received in a JSON format.

  • The first parameter is the new value of the modified fields. This can be a simple string, a macro, or a template (which can include template functions as well).

  • The second parameter (values()) specifies the fields to modify. You can explicitly list the macros or fields (a space-separated list with the values enclosed in double-quotes), or use wildcards and glob expressions to select multiple fields.

  • Note that groupset() does not create new fields, it only modifies existing fields.

  • You can refer to the old value of the field using the $_ macro. This is resolved to the value of the current field, and is available only in groupset() rules.

Declaration: 

rewrite <name_of_the_rule> {
    groupset("<new-value-of-the-fields>", values("<field-name-or-glob>" ["<another-field-name-or-glob>"]));
};

Example 14.17. Using groupset rewrite rules

The following examples show how to change the values of multiple fields at the same time.

  • Change the value of the HOST field to myhost.

    groupset ("myhost" values("HOST"))
  • Change the value of the HOST and FULLHOST fields to myhost.

    groupset ("myhost" values("HOST" "FULLHOST"))
  • Change the value of the HOSTFULLHOST and fields to lowercase.

    groupset ("$(lowercase "$_")" values("HOST" "FULLHOST"))
  • Change the value of each field and macro that begins with .USER to nobody.

    groupset ("nobody" values(".USER.*"))
  • Change the value of each field and macro that begins with .USER to its SHA-1 hash (truncated to 6 characters).

    groupset ("$(sha1 --length 6 $_)" values(".USER.*"))

Conditional rewrites

Starting with 4 F1, it is possible to apply a rewrite rule to a message only if certain conditions are met. The condition() option effectively embeds a filter expression into the rewrite rule: the message is modified only if the message passes the filter. If the condition is not met, the message is passed to the next element of the log path (that is, the element following the rewrite rule in the log statement, for example, the destination). Any filter expression normally used in filters can be used as a rewrite condition. Existing filter statements can be referenced using the filter() function within the condition. For details on filters, see the section called “Filters”.

TIP:

Using conditions in rewrite rules can simplify your syslog-ng PE configuration file, as you do not need to create separate log paths to modify certain messages.

Procedure 14.1. How conditional rewriting works

Purpose: 

The following procedure summarizes how conditional rewrite rules (rewrite rules that have the condition() parameter set) work. The following configuration snippet is used to illustrate the procedure:

rewrite r_rewrite_set{set("myhost", value("HOST") condition(program("myapplication")));};
log {
    source(s1);
    rewrite(r_rewrite_set);
    destination(d1);};

Steps: 

  1. The log path receives a message from the source (s1).

  2. The rewrite rule (r_rewrite_set) evaluates the condition. If the message matches the condition (the PROGRAM field of the message is "myapplication"), syslog-ng PE rewrites the log message (sets the value of the HOST field to "myhost"), otherwise it is not modified.

  3. The next element of the log path processes the message (d1).

Example 14.18. Using conditional rewriting

The following example sets the HOST field of the message to myhost only if the message was sent by the myapplication program.

rewrite r_rewrite_set{set("myhost", value("HOST") condition(program("myapplication")));};

The following example is identical to the previous one, except that the condition references an existing filter template.

filter f_rewritefilter {program("myapplication");};
rewrite r_rewrite_set{set("myhost", value("HOST") condition(filter(f_rewritefilter)));};

Regular expressions

Filters and substitution rewrite rules can use regular expressions. In regular expressions, the characters ()[].*?+^$|\ are used as special symbols. Depending on how you want to use these characters and which quotation mark you use, these characters must be used differently, as summarized below.

  • Strings between single quotes ('string') are treated literally and are not interpreted at all, you do not have to escape special characters. For example the output of '\x41' is \x41 (characters as follows: backslash, x(letter), 4(number), 1(number)). This makes writing and reading regular expressions much more simple: it is recommended to use single quotes when writing regular expressions.

  • When enclosing strings between double-quotes ("string"), the string is interpreted and you have to escape special characters, that is, to precede them with a backslash (\) character if they are meant literally. For example the output of the "\x41" is simply the letter a. Therefore special characters like \(backslash) or "(quotation mark) must be escaped (\\ and \"). The following expressions are interpreted: \a, \n, \r, \t, \v. For example, the \$40 expression matches the $40 string. Backslashes have to be escaped as well if they are meant literally, for example, the \\d expression matches the \d string.

    TIP:

    If you use single quotes, you do not need to escape the backslash, for example match("\\.") is equivalent to match('\.').

  • Enclosing alphanumeric strings between double-quotes ("string") is not necessary, you can just omit the double-quotes. For example when writing filters, match("sometext") and match(sometext) will both match for the sometext string.

    NOTE:

    Only strings containing alphanumerical characters can be used without quotes or double quotes. If the string contains whitespace or any special characters (()[].*?+^$|\ or ;:#), you must use quotes or double quotes.

    When using the ;:# characters, you must use quotes or double quotes, but escaping them is not required.

By default, all regular expressions are case sensitive. To disable the case sensitivity of the expression, add the flags(ignore-case) option to the regular expression.

filter demo_regexp_insensitive { host("system" flags(ignore-case)); };

The regular expressions can use up to 255 regexp matches (${1} ... ${255}), but only from the last filter and only if the flags("store-matches") flag was set for the filter. For case-insensitive searches, use the flags("ignore-case") option.

Types and options of regular expressions

By default, syslog-ng uses POSIX-style regular expressions. To use other expression types, add the type() option after the regular expression.

The syslog-ng PE application supports the following expression types:

posix

Description: Use POSIX regular expressions. If the type() parameter is not specified, syslog-ng uses POSIX regular expressions by default.

Posix regular expressions have the following flag options:

global: Usable only in rewrite rules: match for every occurrence of the expression, not only the first one.

ignore-case: Disable case-sensitivity.

store-matches: Store the matches of the regular expression into the $0, ... $255 variables. The $0 stores the entire match, $1 is the first group of the match (parentheses), and so on. Matches from the last filter expression can be referenced in regular expressions.

utf8: Use UTF-8 matching.

Example 14.19. Using Posix regular expressions

filter f_message { message("keyword" flags("utf8" "ignore-case") ); };

pcre

Description: Use Perl Compatible Regular Expressions (PCRE). Starting with syslog-ng PE version 3.1, PCRE expressions are supported on every platform.

PCRE regular expressions have the following flag options:

global: Usable only in rewrite rules: match for every occurrence of the expression, not only the first one.

ignore-case: Disable case-sensitivity.

store-matches: Store the matches of the regular expression into the $0, ... $255 variables. The $0 stores the entire match, $1 is the first group of the match (parentheses), and so on. Named matches (also called named subpatterns), for example (?<name>...), are stored as well. Matches from the last filter expression can be referenced in regular expressions.

unicode: Use Unicode support for UTF-8 matches: UTF-8 character sequences are handled as single characters.

utf8: An alias for the unicode flag.

Example 14.20. Using PCRE regular expressions

rewrite r_rewrite_subst
        {subst("a*", "?", value("MESSAGE") type("pcre") flags("utf8" "global"));  };

string

Description: Match the strings literally, without regular expression support. By default, only identical strings are matched. For partial matches, use the flags("prefix") or the flags("substring") flags.

glob

Description: Match the strings against a pattern containing '*' and '?' wildcards, without regular expression and character range support. The advantage of glob patterns to regular expressions is that globs can be processed much faster.

*

matches an arbitrary string, including an empty string

?

matches an arbitrary character

NOTE:
  • The wildcards can match the / character.

  • You cannot use the * and ? literally in the pattern.

Optimizing regular expressions

The host(), match(), and program() filter functions and some other syslog-ng objects accept regular expressions as parameters. But evaluating general regular expressions puts a high load on the CPU, which can cause problems when the message traffic is very high. Often the regular expression can be replaced with simple filter functions and logical operators. Using simple filters and logical operators, the same effect can be achieved at a much lower CPU load.

Example 14.21. Optimizing regular expressions in filters

Suppose you need a filter that matches the following error message logged by the xntpd NTP daemon:

xntpd[1567]: time error -1159.777379 is too large (set clock manually);

The following filter uses regular expressions and matches every instance and variant of this message.

filter f_demo_regexp {
    program("demo_program") and
    match("time error .* is too large .* set clock manually"); };

Segmenting the match() part of this filter into separate match() functions greatly improves the performance of the filter.

filter f_demo_optimized_regexp {
    program("demo_program") and
    match("time error") and
    match("is too large") and
    match("set clock manually"); };

Chapter 15. Parsing and segmenting structured messages

The filters and default macros of syslog-ng work well on the headers and metainformation of the log messages, but are rather limited when processing the content of the messages. Parsers can segment the content of the messages into name-value pairs, and these names can be used as user-defined macros. Subsequent filtering or other type of processing of the message can use these custom macros to refer to parts of the message. Parsers are global objects most often used together with filters and rewrite rules.

syslog-ng PE provides the following possibilities to parse the messages, or parts of the messages:

Parsing messages with comma-separated and similar values

The syslog-ng PE application can separate parts of log messages (that is, the contents of the ${MSG} macro) at delimiter characters or strings to named fields (columns). One way to achieve this is to use a csv (comma-separated-values) parser (for other methods and possibilities, see the other sections of Chapter 15, Parsing and segmenting structured messages. The parsed fields act as user-defined macros that can be referenced in message templates, file- and tablenames, and so on.

Parsers are similar to filters: they must be defined in the syslog-ng PE configuration file and used in the log statement. You can also define the parser inline in the log path.

NOTE:

The order of filters, rewriting rules, and parsers in the log statement is important, as they are processed sequentially.

To create a csv-parser(), you have to define the columns of the message, the separator characters or strings (also called delimiters, for example, semicolon or tabulator), and optionally the characters that are used to escape the delimiter characters (quote-pairs()).

Declaration: 

parser <parser_name> {
    csv-parser(
    columns(column1, column2, ...)
    delimiters(chars("<delimiter_characters>"), strings("<delimiter_string1>"))
    );
};

Column names work like macros.

Names starting with a dot (for example, .example) are reserved for use by syslog-ng PE. If you use such a macro name as the name of a parsed value, it will attempt to replace the original value of the macro (note that only soft macros can be overwritten, see the section called “Hard vs. soft macros” for details).

Example 15.1. Segmenting hostnames separated with a dash

The following example separates hostnames like example-1 and example-2 into two parts.

parser p_hostname_segmentation {
    csv-parser(columns("HOSTNAME.NAME", "HOSTNAME.ID")
    delimiters("-")
    flags(escape-none)
    template("${HOST}"));
};
destination d_file { file("/var/log/messages-${HOSTNAME.NAME:-examplehost}"); };
log { source(s_local); parser(p_hostname_segmentation); destination(d_file);};

Example 15.2. Parsing Apache log files

The following parser processes the log of Apache web servers and separates them into different fields. Apache log messages can be formatted like:

"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %T %v"

Here is a sample message:

192.168.1.1 - - [31/Dec/2007:00:17:10 +0100] "GET /cgi-bin/example.cgi HTTP/1.1" 200 2708 "-" "curl/7.15.5 (i4 86-pc-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8c zlib/1.2.3 libidn/0.6.5" 2 example.balabit

To parse such logs, the delimiter character is set to a single whitespace (delimiters(" ")). Whitespaces between quotes and brackets are ignored (quote-pairs('""[]')).

parser p_apache {
    csv-parser(columns("APACHE.CLIENT_IP", "APACHE.IDENT_NAME", "APACHE.USER_NAME",
        "APACHE.TIMESTAMP", "APACHE.REQUEST_URL", "APACHE.REQUEST_STATUS",
        "APACHE.CONTENT_LENGTH", "APACHE.REFERER", "APACHE.USER_AGENT",
        "APACHE.PROCESS_TIME", "APACHE.SERVER_NAME")
         flags(escape-double-char,strip-whitespace)
         delimiters(" ")
         quote-pairs('""[]')
         );
};

The results can be used for example to separate log messages into different files based on the APACHE.USER_NAME field. If the field is empty, the nouser name is assigned.

log { source(s_local);
    parser(p_apache); destination(d_file);};
};
destination d_file { file("/var/log/messages-${APACHE.USER_NAME:-nouser}"); };

Example 15.3. Segmenting a part of a message

Multiple parsers can be used to split a part of an already parsed message into further segments. The following example splits the timestamp of a parsed Apache log message into separate fields.

parser p_apache_timestamp {
    csv-parser(columns("APACHE.TIMESTAMP.DAY", "APACHE.TIMESTAMP.MONTH", "APACHE.TIMESTAMP.YEAR", "APACHE.TIMESTAMP.HOUR", "APACHE.TIMESTAMP.MIN", "APACHE.TIMESTAMP.MIN", "APACHE.TIMESTAMP.ZONE")
    delimiters("/: ")
    flags(escape-none)
    template("${APACHE.TIMESTAMP}"));
    };
log { source(s_local); parser(p_apache); parser(p_apache_timestamp); destination(d_file);
};

Further examples: 

Options of CSV parsers

The syslog-ng PE application can separate parts of log messages (that is, the contents of the ${MSG} macro) at delimiter characters or strings to named fields (columns). One way to achieve this is to use a csv (comma-separated-values) parser (for other methods and possibilities, see the other sections of Chapter 15, Parsing and segmenting structured messages. The parsed fields act as user-defined macros that can be referenced in message templates, file- and tablenames, and so on.

Parsers are similar to filters: they must be defined in the syslog-ng PE configuration file and used in the log statement. You can also define the parser inline in the log path.

NOTE:

The order of filters, rewriting rules, and parsers in the log statement is important, as they are processed sequentially.

To create a csv-parser(), you have to define the columns of the message, the separator characters or strings (also called delimiters, for example, semicolon or tabulator), and optionally the characters that are used to escape the delimiter characters (quote-pairs()).

Declaration: 

parser <parser_name> {
    csv-parser(
    columns(column1, column2, ...)
    delimiters(chars("<delimiter_characters>"), strings("<delimiter_string1>"))
    );
};

Column names work like macros.

Names starting with a dot (for example, .example) are reserved for use by syslog-ng PE. If you use such a macro name as the name of a parsed value, it will attempt to replace the original value of the macro (note that only soft macros can be overwritten, see the section called “Hard vs. soft macros” for details).

columns()
Synopsis: columns("PARSER.COLUMN1", "PARSER.COLUMN2", ...)

Description: Specifies the name of the columns to separate messages to. These names will be automatically available as macros. The values of these macros do not include the delimiters.

delimiters()
Synopsis:

delimiters(chars("<delimiter_characters>")) or delimiters("<delimiter_characters>")

delimiters(strings("<delimiter_string1>", "<delimiter_string2>", ...)")

delimiters(chars("<delimiter_characters>"), strings("<delimiter_string1>"))

Description: The delimiter is the character or string that separates the columns in the message. If you specify multiple characters using the delimiters(chars("<delimiter_characters>")) option, every character will be treated as a delimiter. To separate the columns at the tabulator (tab character), specify \t. For example, to separate the text at every hyphen (-) and colon (:) character, use delimiters(chars("-:")), Note that the delimiters will not be included in the column values.

If you have to use a string as a delimiter, list your string delimiters in the delimiters(strings("<delimiter_string1>", "<delimiter_string2>", ...)") format.

If you use more than one delimiter, note the following points:

  • syslog-ng PE will split the message at the nearest possible delimiter. The order of the delimiters in the configuration file does not matter.

  • You can use both string delimiters and character delimiters in a parser.

  • The string delimiters can include characters that are also used as character delimiters.

  • If a string delimiter and a character delimiter both match at the same position of the message, syslog-ng PE uses the string delimiter.

flags()
Synopsis: drop-invalid, escape-none, escape-backslash, escape-double-char, greedy, strip-whitespace

Description: Specifies various options for parsing the message. The following flags are available:

  • drop-invalid: When the drop-invalid option is set, the parser does not process messages that do not match the parser. For example, a message does not match the parser if it has less columns than specified in the parser, or it has more columns but the greedy flag is not enabled. Using the drop-invalid option practically turns the parser into a special filter, that matches messages that have the predefined number of columns (using the specified delimiters).

    TIP:

    Messages dropped as invalid can be processed by a fallback log path. For details on the fallback option, see the section called “Log path flags”.

  • escape-backslash: The parsed message uses the backslash (\) character to escape quote characters.

  • escape-double-char: The parsed message repeats the quote character when the quote character is used literally. For example, to escape a comma (,), the message contains two commas (,,).

  • escape-none: The parsed message does not use any escaping for using the quote character literally.

  • greedy: The greedy option assigns the remainder of the message to the last column, regardless of the delimiter characters set. You can use this option to process messages where the number of columns varies.

    Example 15.4. Adding the end of the message to the last column

    If the greedy option is enabled, the syslog-ng application adds the not-yet-parsed part of the message to the last column, ignoring any delimiter characters that may appear in this part of the message.

    For example, you receive the following comma-separated message: example 1, example2, example3, and you segment it with the following parser:

    csv-parser(columns("COLUMN1", "COLUMN2", "COLUMN3") delimiters(","));

    The COLUMN1, COLUMN2, and COLUMN3 variables will contain the strings example1, example2, and example3, respectively. If the message looks like example 1, example2, example3, some more information, then any text appearing after the third comma (that is, some more information) is not parsed, and possibly lost if you use only the variables to reconstruct the message (for example, to send it to different columns of an SQL table).

    Using the greedy flag will assign the remainder of the message to the last column, so that the COLUMN1, COLUMN2, and COLUMN3 variables will contain the strings example1, example2, and example3, some more information.

    csv-parser(columns("COLUMN1", "COLUMN2", "COLUMN3") delimiters(",") flags(greedy));

  • strip-whitespace: The strip-whitespace flag removes leading and trailing whitespaces from all columns.

quote-pairs()
Synopsis: quote-pairs('<quote_pairs>')

Description: List quote-pairs between single quotes. Delimiter characters or strings enclosed between quote characters are ignored. Note that the beginning and ending quote character does not have to be identical, for example [} can also be a quote-pair. For an example of using quote-pairs() to parse Apache log files, see Example 15.2, “Parsing Apache log files”.

template()
Synopsis: template("${<macroname>}")

Description: The macro that contains the part of the message that the parser will process. It can also be a macro created by a previous parser of the log path. By default, this is empty and the parser processes the entire message (${MESSAGE}). For examples, see Example 15.1, “Segmenting hostnames separated with a dash” and Example 15.3, “Segmenting a part of a message”.

Related Documents