Logfile import

Hi,

I am importing extracts of an Apache logfile containing lines with pdf downloads only:

python /srv/www/htdocs/piwik/misc/log-analytics/import_logs.py --url=https://myPiwkURL myLogFileExtract --idsite=4 --recorders=4 --enable-http-errors --enable-http-redirects --enable-static --enable-bots

The report of import_logs.py always says that there are more download requests than successfully imported requests. Has anyone an idea how this can happen? The number of successfully imported request equals the number of lines of myLogFileExtract.

Thank you for your help!

mucctecc

can you please post an example test file here with 5-10 lines and the example output showing the bug? thx

I already tried my luck at the German speaking Piwik forum, so I might repeat my example here:

python /srv/www/htdocs/piwik/misc/log-analytics/import_logs.py --url=https://myPiwikPage.de test.log --idsite=17 --recorders=4 --enable-http-errors --enable-http-redirects --enable-static --enable-bots

“test.log” contains only one line (I set the ip address to zero):

00.000.00.000 - - [21/Nov/2012:00:07:09 +0100] “GET /1579/1/paper_189.pdf HTTP/1.1” 304 - “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”

And this is my result:

===============================================================================
Logs import summary

1 requests imported successfully
2 requests were downloads
0 requests ignored:
0 invalid log lines
0 requests done by bots, search engines, …
0 HTTP errors
0 HTTP redirects
0 requests to static resources (css, js, …)
0 requests did not match any known site
0 requests did not match any requested hostname

Website import summary

1 requests imported to 1 sites
1 sites already existed
0 sites were created:

0 distinct hostnames did not match any existing site:

Performance summary

Total time: 1 seconds
Requests imported per second: 0.99 requests per second

===============================================================================

Now, looking in Piwik for Website with idsite=17 there is no download at all listed or recognized; idsite=17 is only an internal typo3 Website with absolutely controlled requests. I also tired

/usr/local/bin/php /srv/www/htdocs/piwik/misc/cron/archive.php --url=https://myPiwikPage.de > /srv/www/logs/piwik-archive.log

which makes no difference.

Thank you for your help!

mucctecc

I reported this in the relevant ticket, thanks: Log analytics list of improvements · Issue #3163 · matomo-org/matomo · GitHub