"Invalid date" when importing Apache logs (solved)


#1

I have a series of logs that are recorded with the following format:


2012-05-04 07:02:17 EDT 12033 - "GET /members/137/favorites?category=resources&limit=30&limitstart=0 HTTP/1.1" 500 20 66.249.71.245 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" TLSv1 0 84141 - d61uhjuvbjko8aegt8co2i0oh3 - - - - - 137 -

I am trying to use the import_logs script to import these into my piwik setup but after finding a regex to parse these logs and trying to run the script with the following command:


sudo python import_logs.py --url=http://127.0.1.1/piwik ../../../logs/hub-access.log-20120504 --idsite=1 --recorders=2 --enable-http-errors --enable-http-redirects --enable-static --enable-bots --log-format-regex='(?P<date>.*? [\d+]*:[\d+]*:[\d+]*) (?P<timezone>\S+) (?P<pid>\d+) - "(?P<request>\S+) (?P<path>.*?) \S+" (?P<status>\d+) (?P<length>\d+) (?P<ip>[(\d\.)]+) "(?P<referer>.*?)" "(?P<user_agent>.*?)" (?P<protocol>.*?) (\d+) (\d+) (.*?) (?P<session>.*?) (.*?) (.*?) (.*?) (.*?) (.*?) (.*?) (.*?)(.*?)$' --debug --debug

I keep seeing errors like:


2013-12-06 12:13:16,345: [DEBUG] Invalid line detected (invalid date): 2012-05-04 07:02:17 EDT 12033 - "GET /members/137/favorites?category=resources&limit=30&limitstart=0 HTTP/1.1" 500 20 66.249.71.245 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" TLSv1 0 84141 - d61uhjuvbjko8aegt8co2i0oh3 - - - - - 137 -

and nothing gets imported. Does this script expect the dates to be in any particular format? Would there be anything else that I could do to bypass this error without having to format my log files (they are very large and very numerous)?


#2

For anyone else with similar issues, I was able to circumvent mine by re-formatting my log files to look like:


03/May/2012:23:59:09 -0400 6341 - "GET /events/8/08/19/week HTTP/1.1" 301 205 66.249.71.175 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" - 0 281 - - - - - - - - -

The following command imported the logs in that format:


sudo python import_logs.py --url=http://127.0.1.1/piwik ../../../formatted_logs/hub-access.log-20120504 --idsite=1 --recorders=2 --enable-http-errors --enable-http-redirects --enable-static --enable-bots --log-format-regex='(?P<date>.*?:[\d+]*:[\d+]*:[\d+]*) (?P<timezone>\S+) (?P<pid>\d+) - "(?P<request>\S+) (?P<path>.*?) \S+" (?P<status>\d+) (?P<length>\d+) (?P<ip>[(\d\.)]+) "(?P<referer>.*?)" "(?P<user_agent>.*?)" (?P<protocol>.*?) (\d+) (\d+) (.*?) (?P<session>.*?) (.*?) (.*?) (.*?) (.*?) (.*?) (.*?) (.*?)(.*?)$' --debug --debug