Report Engine Overview
Today's networks generate an enormous quantity of traffic, not nearly all of which is of general interest or worthy of reporting. System updates, keep-alive pings and heartbeat checks are examples of traffic that may comprise a majority of actual requests, but be of little-to-no interest either to network administrators or Report Reviewers.
The Security Appliance project processes the log lines with the goal of disregarding the noise so that the signal can emerge. That signal of interest being the activity generated by the human users on a network.
When examining a log line to determine what section it should be assigned, the report engine answers the following questions (and more):
- Is this a web page request, loading directly in a browser?
- Is this is a search query for stock images, research, or e-commerce?
- Is this a request for visual media such as graphics, PDFs, or other media?
- Were all requests to this domain always blocked?
If a log line does not fit into one of those questions above, it is discarded. The remaining log lines are grouped into Sections that can be used in the Logline Viewer as well as Report Layouts in Usage reports. See below for a brief description of how the report engine does this classifying.
Mimetypes and Extensions
The report engine considers the request mimetype to be of prime significance. When the web server does not forward a correct mimetype, the URL is parsed to extract any file extension that may be present.
The query parameter of URL is parsed to determine if this is a user-generated search term.
The methods of most interest are GET, POST and CONNECT. The methods primarily used for API-based communications are not saved in the report database.
The 200 series Success status code is naturally of greatest interest. The 300 series Redirection status codes, 400 series Client Error status codes, and 500 series Server Error status codes aren't saved in the report database.
Response Body Size
When the response body is small, the request is considered to be of little reporting value and is not saved into the report database. Small response bodies typically indicate heartbeat traffic or keep-alive pings, etc.
Domain / URL Patterns
Finally, the URL is parsed into its component pieces and further checked for unwanted patterns. Requests that match these patterns are not saved into the report database. Network administrators can use Logline Filters to extend this feature.