Extensible Markup Language (XML) is a text-based open standard designed for both human-readable and machine-readable data interchange. Like JSON, it is used primarily to transmit data between a server and web application. It is described in W3C Recommendation: Extensible Markup Language (XML).

The XML parser processes input in XML format, and adds the parsed data to the message object.

To create an XML parser, define an xml_parser that has the xml() option. By default, the parser will process the ${MESSAGE} part of the log message. To process other parts of a log message using the XML parser, use the template() option. You can also define the parser inline in the log path.

Declaration
parser xml_name {
    xml(template()
        prefix()
        drop-invalid()
        exclude-tags()
        strip-whitespaces()
    );
};
Example: Using an XML parser

In the following example, the source is an XML-encoded log message. The destination is a file that uses the format-json template. The log line connects the source, the destination and the parser.

source s_local {
        file("/tmp/aaa");
};

destination d_local {
    file("/tmp/bbb" template("$(format-json .xml.*)\n"));
};

parser xml_parser {
       xml();
};

log {
    source(s_local);
    parser(xml_parser);
    destination(d_local);
};

You can also define the parser inline in the log path.

log {
    source(s_file);
    parser { xml(prefix(".SDATA")); };
    destination(d_file);
};

The XML parser inserts an ".xml" prefix by default before the extracted name-value pairs. Since format-json replaces a dot with an underscore at the beginning of keys, the ".xml" prefix becomes "_xml". Attributes get an _ prefix. For example, from the XML input:

<tags attr='attrval'>part1<tag1>Tag1 Leaf</tag1>part2<tag2>Tag2 Leaf</tag2>part3</tags>

The following output is generated:

{"_xml":{"tags":{"tag2":"Tag2 Leaf","tag1":"Tag1 Leaf","_attr":"attrval","tags":"part1part2part3"}}}

When the text is separated by tags on different levels or tags on the same level, the parser uses the list-handling functionality (enabled by default) to handle lists in the XML.

The list-handling functionality of the XML parser separates vector-like structures by a comma as separate entries. Using the following structure as an example:

<vector>
    <entry>value1</entry>
    <entry>value 2</entry>
    <entry>Doe,John</entry>
    <entry>value3</entry>
    ...
    <entry>valueN</entry>
</vector>

After parsing, the entries are separated by a comma. If an entry has a space or is separated by a comma, for example, value 2 or Doe,John in the previous example, quoting is applied to the entry:

vector.entry = value1,"value 2","Doe,John",value3...valueN

Note that if you disable the list-handling functionality, the XML parser cannot address each element of a vector-like structure individually. Using the following structure as an example:

<vector>
    <entry>value1</entry>
    <entry>value2</entry>
    ...
    <entry>valueN</entry>
</vector>

After parsing, the entries are not addressed individually. Instead, the text of the entries are concatenated:

vector.entry = "value1value2...valueN"

For more information about the list-handling functionality, see Limitations of the XML parsers.

Whitespaces are kept as they are in the XML input. No collapsing happens on significant whitespaces. For example, from this input XML:

<133>Feb 25 14:09:07 webserver syslogd: <b>|Test\n\n   Test2|</b>\n

The following output is generated:

[2017-09-04T13:20:27.417266] Setting value; msg='0x7f2fd8002df0', name='.xml.b', value='|Test\x0a\x0a   Test2|'

However, note that users can choose to strip whitespaces using the strip-whitespaces() option.

Configuration hints

Define a source that correctly detects the end of the message, otherwise the XML parser will consider the input invalid, resulting in a parser error.

To ensure that the end of the XML document is accurately detected, use any of the following options:

  • Ensure that the XML is a single-line message.

  • In the case of multiline XML documents:

    • If the opening and closing tags are fixed and known, you can use multi-line-mode(prefix-suffix). Using regular expressions, specify a prefix and suffix matching the opening and closing tags. For details on using multi-line-mode(prefix-suffix), see the multi-line-prefix() and multi-line-suffix() options.

    • In the case of TCP, you can encapsulate and send the document in syslog-protocol format, and use a syslog() source. Make sure that the message conforms to the octet counting method described in RFC6587.

      For example:

      59 <133>Feb 25 14:09:07 webserver syslogd: <book>\nText\n</book>

      Considering the new lines as one character, 59 is appended to the original message.

    • You can use a datagram-based source. In the case of datagram-based sources, the protocol signals the end of the message automatically. Ensure that the complete XML document is written in one message.

    • Unless the opening and closing tags are fixed and known, stream-based sources are currently not supported.

In case you experience issues, start syslog-ng with debug logs enabled. There will be a debug log about the incoming log entry, which shows the complete message to be parsed. The entry should contain the entire XML document.

NOTE: If your log messages are entirely in .xml format, make sure to disable any message parsing on the source side by including the flags("no-parse") option in your source statement. This will put the entire log message in the $MESSAGE macro, which is the field that the XML parser parses by default.