Log import ignores date


#1

Cheers everyone, forum first-timer here …

I’m in the process of importing server access logs into Piwik and hit a wall there. Everything I import gets recorded with the date and time the import is running. It seems to me that the dates in those access logs are being simply ignored.

I’ve done this import stuff before and as far as I can remember it went well then.

I’m calling the importer like that:


python misc/log-analytics/import_logs.py --show-progress -dddddddd --url=http://mypiwik.example --idsite=<SITE_ID_HERE> --token-auth=<AUTH_TOKEN_HERE> /path/to/logfile.log

The output of that yields:


2015-07-16 00:16:03,629: [DEBUG] Accepted hostnames: all
2015-07-16 00:16:03,630: [DEBUG] Piwik URL is: http://mypiwik.example
2015-07-16 00:16:03,630: [DEBUG] Authentication token token_auth is: <AUTH_TOKEN_HERE>
2015-07-16 00:16:03,630: [DEBUG] Resolver: static
0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
2015-07-16 00:16:03,991: [DEBUG] Launched recorder
Parsing log /path/to/logfile.log...
2015-07-16 00:16:03,991: [DEBUG] Detecting the log format
2015-07-16 00:16:03,991: [DEBUG] Check format icecast2
2015-07-16 00:16:03,992: [DEBUG] Format icecast2 matches
2015-07-16 00:16:03,992: [DEBUG] Format match contains 9 groups
2015-07-16 00:16:03,992: [DEBUG] Check format w3c_extended
2015-07-16 00:16:03,993: [DEBUG] Format w3c_extended does not match
2015-07-16 00:16:03,993: [DEBUG] Check format iis
2015-07-16 00:16:03,993: [DEBUG] Format iis does not match
2015-07-16 00:16:03,993: [DEBUG] Check format common
2015-07-16 00:16:03,994: [DEBUG] Format common matches
2015-07-16 00:16:03,994: [DEBUG] Format match contains 6 groups
2015-07-16 00:16:03,994: [DEBUG] Check format common_vhost
2015-07-16 00:16:03,994: [DEBUG] Format common_vhost does not match
2015-07-16 00:16:03,994: [DEBUG] Check format nginx_json
2015-07-16 00:16:03,995: [DEBUG] Format nginx_json does not match
2015-07-16 00:16:03,995: [DEBUG] Check format s3
2015-07-16 00:16:03,995: [DEBUG] Format s3 does not match
2015-07-16 00:16:03,995: [DEBUG] Check format ncsa_extended
2015-07-16 00:16:03,996: [DEBUG] Format ncsa_extended matches
2015-07-16 00:16:03,996: [DEBUG] Format match contains 8 groups
2015-07-16 00:16:03,996: [DEBUG] Check format common_complete
2015-07-16 00:16:03,996: [DEBUG] Format common_complete does not match
2015-07-16 00:16:03,996: [DEBUG] Check format amazon_cloudfront
2015-07-16 00:16:03,996: [DEBUG] Format amazon_cloudfront does not match
2015-07-16 00:16:03,997: [DEBUG] Format icecast2 is the best match

Logs import summary
-------------------

    6 requests imported successfully
    0 requests were downloads
    3 requests ignored:
        0 HTTP errors
        0 HTTP redirects
        0 invalid log lines
        0 requests did not match any known site
        0 requests did not match any --hostname
        3 requests done by bots, search engines...
        0 requests to static resources (css, js, images, ico, ttf...)
        0 requests to file downloads did not match any --download-extensions

Website import summary
----------------------

    6 requests imported to 1 sites
        1 sites already existed
        0 sites were created:

    0 distinct hostnames did not match any existing site:



Performance summary
-------------------

    Total time: 0 seconds
    Requests imported per second: 15.45 requests per second

Processing your log data
------------------------

    In order for your logs to be processed by Piwik, you may need to run the following command:
     ./console core:archive --force-all-websites --force-all-periods=315576000 --force-date-last-n=1000 --url='http://mypiwik.example'

It doesn’t matter if I run the suggested archiving command or not; the dates remain wrong (also double-verified by directly querying the database for those records).

A stripped down version of a tested log file looks like this (this very file generated the output above):


195.154.188.41 - - [15/Apr/2015:00:19:05 +0200] "GET /portfolio-view/novomania-2011-shanghai HTTP/1.0" 200 20322 "http://atelierschiefer.de/portfolio-view/novomania-2011-shanghai" "Mozilla/5.0 (Windows NT 5.1; rv:33.0) Gecko/20100101 Firefox/33.0" atelierschiefer.de
80.131.0.172 - - [15/Apr/2015:00:20:57 +0200] "GET /kontakt HTTP/1.1" 200 14691 "http://atelierschiefer.de/" "Mozilla/5.0 (Windows NT 6.1; rv:37.0) Gecko/20100101 Firefox/37.0" atelierschiefer.de
131.0.172 - - [15/Apr/2015:00:21:10 +0200] "GET /aktuell HTTP/1.1" 200 18704 "http://atelierschiefer.de/kontakt" "Mozilla/5.0 (Windows NT 6.1; rv:37.0) Gecko/20100101 Firefox/37.0" atelierschiefer.de
80.131.0.172 - - [15/Apr/2015:00:21:35 +0200] "GET /atelier HTTP/1.1" 200 18909 "http://atelierschiefer.de/aktuell" "Mozilla/5.0 (Windows NT 6.1; rv:37.0) Gecko/20100101 Firefox/37.0" atelierschiefer.de
80.131.0.172 - - [15/Apr/2015:00:22:33 +0200] "GET /leistungen HTTP/1.1" 200 14449 "http://atelierschiefer.de/atelier" "Mozilla/5.0 (Windows NT 6.1; rv:37.0) Gecko/20100101 Firefox/37.0" atelierschiefer.de
104.167.106.73 - - [15/Apr/2015:04:54:17 +0200] "GET /portfolio-view/sports-up HTTP/1.1" 200 20895 "-" "Java/1.4.1_04" atelierschiefer.de
207.46.13.44 - - [15/Apr/2015:05:20:12 +0200] "GET /home/ HTTP/1.1" 200 406 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" www.atelierschiefer.de
210.65.193.73 - - [15/Apr/2015:05:39:53 +0200] "GET / HTTP/1.1" 200 18426 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0)" atelierschiefer.de
207.46.13.44 - - [15/Apr/2015:07:24:51 +0200] "GET /beratung/cage HTTP/1.1" 200 431 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" atelierschiefer.de

I poked around a bit in import_logs.py and it seems that the dates are at least correct up to the point where the whole lot is handed over to piwik.php. From there on things are out of my league to debug.

Also, all logs together result in around 150k lines and range from April to today and right now these only generate 280 visitors shown on the Dashboard. I’m aware that there’s a lot of garbage in those logs and it’s not a very frequented site but still this seems like very little outcome. Though I’m not sure if this won’t change once the date issue is resolved.

Piwik is at version 2.14.0, PHP is 5.4.17.

If I can provide more information that might be helpful, please let me know.

Best and thanks in advance for any help
Matthias


(Matthieu Aubry) #2

Hi Matthias,

can you reproduce an issue with a small log file, eg. 5 lines or so? if you can reproduce pleasecreatea bug report at: Issues · matomo-org/piwik-log-analytics · GitHub


#3

Hello,

Are there any updates or workarounds? I am having the same issue.

New install on CentOS 7, Piwik 2.14.2.

Thank you,
Jack


(Matthieu Aubry) #4

Hi there,

please create an issue in Issues · matomo-org/piwik-log-analytics · GitHub with steps to reproduce and small log file