Error when importing Apache logs --> Invalid date


#1

Hi guys,

I’ve spent quite a bit of time trying to figure out how to import apache logs using the import_logs script and a problem I have is that my logs are not recorded in a traditional format. After playing around with some regexes, I was able to figure out one that actually parses my files. Following is an example line from one of my logs in case:


2012-05-04 23:58:36 EDT 28944 - "GET /index.php?option=com_tools&task=diskusage&no_html=1&msgs=0 HTTP/1.1" 200 150 129.74.35.243 "https://adapt.nd.edu/tools/ccvi2/session/1307" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.54.16 (KHTML, like Gecko) Version/5.1.4 Safari/534.54.16" TLSv1 0 89012 1063 gci0bkpqku8dk7ea1a4bs1k6d1 - session - - - - -

I am now able to run the import_logs script with the following command:


python ./import_logs.py --url=http://127.0.1.1/piwik ../../../logs/hub-access.log-20120504 --idsite=1 --recorders=4 --enable-http-errors --enable-http-redirects --enable-static --log-format-regex='(?P<date>.*? [\d+]*:[\d+]*:[\d+]*) (?P<timezone>\S+) (?P<pid>\d+) - "(?P<request>\S+) (?P<path>.*?) \S+" (?P<status>\d+) (?P<length>\d+) (?P<ip>[(\d\.)]+) "(?P<referer>.*?)" "(?P<user_agent>.*?)" (?P<protocol>.*?) (\d+) (\d+) (.*?) (?P<session>.*?) (.*?) (.*?) (.*?) (.*?) (.*?) (.*?) (.*?)(.*?)$' --debug --debug

But the above results in 0 requests imported successfully and a series of errors similar to this:


[DEBUG] Invalid line detected (invalid date): 2012-05-04 23:58:36 EDT 28944 - "GET /index.php?option=com_tools&task=diskusage&no_html=1&msgs=0 HTTP/1.1" 200 150 129.74.35.243 "https://adapt.nd.edu/tools/ccvi2/session/1307" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.54.16 (KHTML, like Gecko) Version/5.1.4 Safari/534.54.16" TLSv1 0 89012 1063 gci0bkpqku8dk7ea1a4bs1k6d1 - session - - - - -

Would there happen to be an easy fix to this issue? Does piwik expect the date to be in a particular format? I’m guessing I might need to pre-format my logs before running them through this script but I would really like to avoid doing that because the logs are extremely large.

Thanks!


(Matthieu Aubry) #2

sorry I don’t have ready made answer, but if you’re a developer you can fix the script and do a pull request. http://piwik.org/participate/contributing-with-git/