This white paper enables syslog-ng Store Box (SSB) end-users, integrators, and sales personnel to make predictions about the performance of the SSB appliance based on various environmental and configuration parameters.
The measured data refers to physical SSB appliances.
In this simple case we only measured sustained throughput of receiving logs in SSB. This means that there were no other operations like search ongoing at the time of the test. The source of messages was always TCP and we tested with and without encryption enabled on the source. The target logspace was using indexed, compressed log store where all fields were indexed using the default delimiters. We used the default 1024Mb memory limit. There was no pattern database loaded.
On physical hardware, SSB performance is usually limited by available processing power, that is, the performance is CPU bound.
During RAID synchronization the performance can drop significantly, because of the heavy load of the disk subsystem. As a result, a new SSB installation may give misleading performance numbers. Always wait for the RAID synchronization to finish before testing the performance.
If you use DNS for the log source, but the DNS server is unreachable, the performance of SSB will greatly decrease.
Sending logs from one log source into multiple logspaces degrades performance. If possible, send logs from one (or more) log source into a single logspace, and not duplicate the logs. If you need to filter or aggregate log messages in different ways, consider using the filtered logspace and multiple logspace features. (For details, see "Creating filtered logspaces" in the Administration Guide and "Creating multiple logspaces" in the Administration Guide.)
Parsing syslog headers adds an extra 18% overhead. You can improve the raw performance of SSB by selecting the Do not parse option in the log source. However, in this case you cannot search and filter the host, program, pid, and other fields of these messages.
Disabling flow control on a log source will not throttle back clients, and seems to increase performance. However, it may lead to losing messages.
The following factors have no effect, or only limited effect on the performance of SSB.
Number of plain TCP connections to the log sources of SSB, up to around 5000 connections.
Number of SSL/TLS TCP connections to the log sources of SSB, up to around 1000 connections.
Enabling debug logging in SSB has no effect: debug logs are related to tracing web access and related operations.
The Trusted, Use DNS, Use FQDN settings have limited effect on performance, provided that DNS is correctly set up. Internally there is a DNS cache in syslog-ng. (For details on these settings, see Administration Guide.)
Depending on its exact configuration and the mix of log formats received, the largest SSB appliance can collect and index up to 100,000 messages per second (100k EPS) for sustained periods.
This section describes how SSB indexes messages and stores data. It gives you insights to understanding and interpreting the search performance of SSB.
File system directories organized per year and day of month (YYYY/MM-DD).
Log messages are stored in a single file per day. There is also a file that lists index files (level 3) related to distinct time intervals inside the day.
Index file that holds an ordered list of tokens processed when SSB received the logs. For each token there is a list of unique identifiers that points to the messages that contained the token.
Every SSB search must include the time interval we search in. This information is used to find the days, thus the Level 2 and 3 files that need to be searched. Note SSB stores and searches the log messages based on the time SSB received them (the so-called processed time stamp), and not based on the time stamp included in the log message. The reason is that SSB collects logs in real-time, and also the time stamp in the log messages may not be reliable or complete.
Tokens are the words separated by the delimiters set for the logspace (for details, see "Configuring the indexer service" in the Administration Guide).
On Level 3, SSB looks up the tokens that match the basic expressions in the search query. Since the tokens are stored in alphabetic order, this lookup is very fast for exact searches. If the token contains wildcards (* or ?, then potential matches are checked individually.
At this point, SSB has the list of message identifiers it needs to calculate AND, OR, NOT expressions and finalize search results per day. Getting the final result simply means repeating the procedure for all the days that are requested in the search interval.
In this section we describe SSB search algorithm performance, measured from starting a search to returning the first 100 results. When a search is executed, SSB calculates the unique identifiers of every search results, without loading the individual messages. The actual messages are loaded temporarily only when requested on the user interface or the RPC API.
This means that it can easily happen that calculating the results takes under a second, but fetching all the resulting messages takes minutes, because it takes time to read the messages from the disk and return them. This also means that the size of the messages has no impact on the memory usage of search.
We have conducted our tests using a real life logspace containing 200 million log entries (about 9.1Gb compressed). We executed the searches directly on SSB to avoid network and caching effects. The test hardware was an SSB T4 appliance, but the response times are very similar on SSB T10 appliances as well. For SSB T1, response times are higher by a factor of x2.5 on the average.
The most basic search expression is the empty (or *) search, which searches for any message stored in a logspace. This search is immediate with little memory usage, since it only involves looking up the unique identifier ranges intersecting with the time range of the search itself.
Example1: username Example2: restart
The simplest search expression is a specific token, like login. Tokens are the words separated by the delimiters set for the logspace (for details, see "Configuring the indexer service" in the Administration Guide).
For 200 million logs, searching for a token takes between 1-5 seconds, and the used memory is roughly the same number of bytes as the number of results.
Example1: user?ame Example2: system* Example3: *tool
You can specifying part of a token, or add *, ? characters after and/or in front of the token. For example *pple or appl?.
Search times depend on how many letters are known of the token, especially at the front of the token. The worst case is when the search expression starts with the wildcard, for example *pple, which would take between 30-60 seconds to search for in 200 million messages. Searching for appl* takes around 9 seconds. The absolute worst case is *? where no letter is known, which takes 80 seconds.
Memory consumption is a sum of the number of eventual results plus at most the size of the biggest index file involved in the search. The size of the index file depends on the Memory limit setting of the logspace. The higher the limit, the larger the index files. (For details, see "Configuring the indexer service" in the Administration Guide.)
Example1: NOT user Example2: NOT window*
You can exclude tokens from a search using the NOT keyword, as in “NOT apple, NOT *pple, and so on. Such search takes slightly longer than searching for the same expression without NOT. The memory used is roughly the same.
Search expressions can be combined with the AND and OR keywords to create complex search expressions. A negation of a complex search expression with NOT keyword remains a complex search expression. Note that SSB automatically optimizes certain search expressions before evaluating them, for example, expression AND * becomes expression, NOT NOT expression becomes expression.
Example1: username OR pass* Example2: server1 OR server32 OR server50 OR server3
Response time can be calculated by adding up the response times of the searches included in the OR expression. The actual OR operation is extremely efficient, so there is little additional overhead.
Example1: user AND login AND fail Example2: user AND NOT close*
The maximal response time can be calculated by adding up the response times of included searches in the AND expression. The actual AND operation is extremely efficient, so there is little additional overhead.
Same response times and memory consumption expected as for regular searches.