How does the import filter work?

Hi!

I imported a logfile in Piwik. Though I know that AWStats and Piwik generate different numbers, I still don’t know how Piwik filters the imported data from the logfile, here is an example:


Logs import summary
-------------------

    243847 requests imported successfully
    659002 requests were downloads
    1817102 requests ignored:
        43592 HTTP errors
        992469 HTTP redirects
        142 invalid log lines
        0 requests did not match any known site
        0 requests did not match any --hostname
        765275 requests done by bots, search engines...
        15624 requests to static resources (css, js, images, ico, ttf...)
        0 requests to file downloads did not match any --download-extensions

[ul]
[li] In AWStats a lot less requests by bots are identified, how does Piwik identify those requests and how are they filtered?
[/li][li] Downloads with status code 206 (partial download) seem to be filtered out completely in Piwik, so even if a user downloads a complete file in multiple parts, this won’t show up as a complete download?
[/li][li] I don’t understand the ignored HTTP requests? Which logfile entries are ignored here?
[/li][/ul]

Help is very much appreciated.

Thanks and regards,
hulotte

Hi there,

In AWStats a lot less requests by bots are identified, how does Piwik identify those requests and how are they filtered? 

See: https://github.com/piwik/piwik-log-analytics/blob/master/import_logs.py#L71-L100

Downloads with status code 206 (partial download) seem to be filtered out completely in Piwik, so even if a user downloads a complete file in multiple parts, this won’t show up as a complete download?

That’s possible, please create a feature request at: https://github.com/piwik/piwik-log-analytics/issues/

I don’t understand the ignored HTTP requests? Which logfile entries are ignored here?

By default, HTTP redirects and HTTP errors are ignored. you can pass parameters to the tool to include them

Just one follow-up question:

There are a total of 2.060.94 requests in the logfile.
243.847 were imported,
1.817.102 ignored.

But there are a total number of 659.002 downloads. How ist this number calculated?

Downloads are log lines that were direct files requests (eg, images, other files) and not web pages or web resources (js, css)

But htis does not explain the number of downloads or is there a misunderstanding (there was a type error in my previous comment)?

2.060.949 requests

  • 1.817.102 ignored requests
    = 243.847 imported requests

But how can 243.847 imported requests result in 659.002 downloads?