How does the import filter work?


#1

Hi!

I imported a logfile in Piwik. Though I know that AWStats and Piwik generate different numbers, I still don’t know how Piwik filters the imported data from the logfile, here is an example:


Logs import summary
-------------------

    243847 requests imported successfully
    659002 requests were downloads
    1817102 requests ignored:
        43592 HTTP errors
        992469 HTTP redirects
        142 invalid log lines
        0 requests did not match any known site
        0 requests did not match any --hostname
        765275 requests done by bots, search engines...
        15624 requests to static resources (css, js, images, ico, ttf...)
        0 requests to file downloads did not match any --download-extensions

[ul]
[li] In AWStats a lot less requests by bots are identified, how does Piwik identify those requests and how are they filtered?
[/li][li] Downloads with status code 206 (partial download) seem to be filtered out completely in Piwik, so even if a user downloads a complete file in multiple parts, this won’t show up as a complete download?
[/li][li] I don’t understand the ignored HTTP requests? Which logfile entries are ignored here?
[/li][/ul]

Help is very much appreciated.

Thanks and regards,
hulotte


(Matthieu Aubry) #2

Hi there,

In AWStats a lot less requests by bots are identified, how does Piwik identify those requests and how are they filtered? 

See: piwik-log-analytics/import_logs.py at master · matomo-org/piwik-log-analytics · GitHub

Downloads with status code 206 (partial download) seem to be filtered out completely in Piwik, so even if a user downloads a complete file in multiple parts, this won’t show up as a complete download?

That’s possible, please create a feature request at: Issues · matomo-org/piwik-log-analytics · GitHub

I don’t understand the ignored HTTP requests? Which logfile entries are ignored here?

By default, HTTP redirects and HTTP errors are ignored. you can pass parameters to the tool to include them


#3

Just one follow-up question:

There are a total of 2.060.94 requests in the logfile.
243.847 were imported,
1.817.102 ignored.

But there are a total number of 659.002 downloads. How ist this number calculated?


(Matthieu Aubry) #4

Downloads are log lines that were direct files requests (eg, images, other files) and not web pages or web resources (js, css)


#5

But htis does not explain the number of downloads or is there a misunderstanding (there was a type error in my previous comment)?

2.060.949 requests

  • 1.817.102 ignored requests
    = 243.847 imported requests

But how can 243.847 imported requests result in 659.002 downloads?