Extensible Markup Language (XML) is a text-based open standard designed for both human-readable and machine-readable data interchange. Like JSON, it is used primarily to transmit data between a server and web application. It is described in W3C Recommendation: Extensible Markup Language (XML).
The XML parser processes input in XML format, and adds the parsed data to the message object.
To create an XML parser, define an xml_parser that has the xml() option. By default, the parser will process the ${MESSAGE} part of the log message. To process other parts of a log message using the XML parser, use the template() option. You can also define the parser inline in the log path.
Declaration
parser xml_name { xml(template() prefix() drop-invalid() exclude-tags() strip-whitespaces() ); };
Example: Using an XML parser
In the following example, the source is an XML-encoded log message. The destination is a file that uses the format-json template. The log line connects the source, the destination and the parser.
source s_local { file("/tmp/aaa"); }; destination d_local { file("/tmp/bbb" template("$(format-json .xml.*)\n")); }; parser xml_parser { xml(); }; log { source(s_local); parser(xml_parser); destination(d_local); };
You can also define the parser inline in the log path.
log { source(s_file); parser { xml(prefix(".SDATA")); }; destination(d_file); };
The XML parser inserts an ".xml" prefix by default before the extracted name-value pairs. Since format-json replaces a dot with an underscore at the beginning of keys, the ".xml" prefix becomes "_xml". Attributes get an _ prefix. For example, from the XML input:
<tags attr='attrval'>part1<tag1>Tag1 Leaf</tag1>part2<tag2>Tag2 Leaf</tag2>part3</tags>
The following output is generated:
{"_xml":{"tags":{"tag2":"Tag2 Leaf","tag1":"Tag1 Leaf","_attr":"attrval","tags":"part1part2part3"}}}
When the text is separated by tags on different levels or tags on the same level, the parser uses the list-handling functionality (enabled by default) to handle lists in the XML.
The list-handling functionality of the XML parser separates vector-like structures by a comma as separate entries. Using the following structure as an example:
<vector> <entry>value1</entry> <entry>value 2</entry> <entry>Doe,John</entry> <entry>value3</entry> ... <entry>valueN</entry> </vector>
After parsing, the entries are separated by a comma. If an entry has a space or is separated by a comma, for example, value 2 or Doe,John in the previous example, quoting is applied to the entry:
vector.entry = value1,"value 2","Doe,John",value3...valueN
Note that if you disable the list-handling functionality, the XML parser cannot address each element of a vector-like structure individually. Using the following structure as an example:
<vector> <entry>value1</entry> <entry>value2</entry> ... <entry>valueN</entry> </vector>
After parsing, the entries are not addressed individually. Instead, the text of the entries are concatenated:
vector.entry = "value1value2...valueN"
For more information about the list-handling functionality, see Limitations of the XML parsers.
Whitespaces are kept as they are in the XML input. No collapsing happens on significant whitespaces. For example, from this input XML:
<133>Feb 25 14:09:07 webserver syslogd: <b>|Test\n\n Test2|</b>\n
The following output is generated:
[2017-09-04T13:20:27.417266] Setting value; msg='0x7f2fd8002df0', name='.xml.b', value='|Test\x0a\x0a Test2|'
However, note that users can choose to strip whitespaces using the strip-whitespaces() option.
Configuration hints
Define a source that correctly detects the end of the message, otherwise the XML parser will consider the input invalid, resulting in a parser error.
To ensure that the end of the XML document is accurately detected, use any of the following options:
-
Ensure that the XML is a single-line message.
-
In the case of multiline XML documents:
-
If the opening and closing tags are fixed and known, you can use multi-line-mode(prefix-suffix). Using regular expressions, specify a prefix and suffix matching the opening and closing tags. For details on using multi-line-mode(prefix-suffix), see the multi-line-prefix() and multi-line-suffix() options.
-
In the case of TCP, you can encapsulate and send the document in syslog-protocol format, and use a syslog() source. Make sure that the message conforms to the octet counting method described in RFC6587.
For example:
59 <133>Feb 25 14:09:07 webserver syslogd: <book>\nText\n</book>
Considering the new lines as one character, 59 is appended to the original message.
-
You can use a datagram-based source. In the case of datagram-based sources, the protocol signals the end of the message automatically. Ensure that the complete XML document is written in one message.
-
Unless the opening and closing tags are fixed and known, stream-based sources are currently not supported.
-
In case you experience issues, start syslog-ng with debug logs enabled. There will be a debug log about the incoming log entry, which shows the complete message to be parsed. The entry should contain the entire XML document.
NOTE: If your log messages are entirely in .xml format, make sure to disable any message parsing on the source side by including the flags("no-parse") option in your source statement. This will put the entire log message in the $MESSAGE macro, which is the field that the XML parser parses by default.