General
Type: Top level item
Description: Determines which HTTP Content-Types are indexed. An HTTP message is indexed only if its Content-Type is listed in Whitelist and is not listed in Blacklist.
For example:
"General": { "Whitelist": ["text/.*", ".*json.*", "multipart/.*", "application/x-www-form-urlencoded"], "Blacklist": ["text/css", "application/javascript", "text/xslt", ".*xml.*"] },
General (Whitelist)
Type: list
Description: The list of HTTP Content-Types to index. Every entry of the list is treated as a regular expression.
For example:
"Whitelist": ["text/.*", ".*json.*", "multipart/.*", "application/x-www-form-urlencoded"],
General (Blacklist)
Type: list
Description: The list of HTTP Content-Types that are not indexed. Every entry of the list is treated as a regular expression.
For example:
"Blacklist": ["text/css", "application/javascript", "text/xslt", ".*xml.*"]
Form
Type: Top level item
Description: Determines which fields are indexed in HTTP POST messages.
For example:
"Form": { "Blacklist": ["password", "pass"] },
NOTE: If you want to index HTTP POST messages, include the "application/x-www-form-urlencoded" Content-Type in the General > WhiteList list. The indexer will decode URL encoding (percentage encoding), and create key=value pairs from the form fields and their values. Note that in the values, the indexer will replace whitespace with the underscore (_) character. To avoid indexing sensitive information (for example, passwords from login forms), use the Form > Blacklist option.
Form (Blacklist)
Type: list
Description: The list of fields that are not indexed in HTTP POST messages (for example, when submitting forms, such as login forms). Every entry of the list is treated as a regular expression.
For example:
"Blacklist": ["password", "pass"]
Html
Type: Top level item
Description: Include this section in the configuration to process text/html messages. HTML tags are stripped from the text, and only their content is indexed (for example, <html><title>Title</title></html> becomes Title).
For example:
"Html": { "Attributes": ["href", "name", "value", "title", "id", "src"], "StrippedTags": ["script", "object", "style", "noscript", "embed", "video", "audio", "canvas", "svg"] }
Html (Attributes)
Type: list
Description: The list of HTML attributes that extracted as key=value pairs and indexed. Note that in the values, the indexer will replace whitespace with the underscore (_) character, and decode URL encoding. For example:
"Attributes": ["href", "name", "value", "title", "id", "src"],
Note that for the content attribute of the meta name="description", meta name="keywords", meta name="author" and meta name="application-name" is always indexed.
For example, if an audit trail contains the following HTML:
<head> <meta name="description" content="Web page description"> <meta name="keywords" content="HTML,CSS,XML,JavaScript"> <meta name="author" content="OI SA"> <meta charset="UTF-8"> </head>
Then the index will contain the following text:
description=Web_page_description keywords=HTML,CSS,XML,JavaScript author=OI_SA
Html (StrippedTags)
Type: list
Description: The list of HTML tags that are not indexed.
For example:
"StrippedTags": ["script", "object", "style", "noscript", "embed", "video", "audio", "canvas", "svg"]