I have a series of logs that are recorded with the following format:
2012-05-04 07:02:17 EDT 12033 - "GET /members/137/favorites?category=resources&limit=30&limitstart=0 HTTP/1.1" 500 20 66.249.71.245 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" TLSv1 0 84141 - d61uhjuvbjko8aegt8co2i0oh3 - - - - - 137 -
I am trying to use the import_logs script to import these into my piwik setup but after finding a regex to parse these logs and trying to run the script with the following command:
sudo python import_logs.py --url=http://127.0.1.1/piwik ../../../logs/hub-access.log-20120504 --idsite=1 --recorders=2 --enable-http-errors --enable-http-redirects --enable-static --enable-bots --log-format-regex='(?P<date>.*? [\d+]*:[\d+]*:[\d+]*) (?P<timezone>\S+) (?P<pid>\d+) - "(?P<request>\S+) (?P<path>.*?) \S+" (?P<status>\d+) (?P<length>\d+) (?P<ip>[(\d\.)]+) "(?P<referer>.*?)" "(?P<user_agent>.*?)" (?P<protocol>.*?) (\d+) (\d+) (.*?) (?P<session>.*?) (.*?) (.*?) (.*?) (.*?) (.*?) (.*?) (.*?)(.*?)$' --debug --debug
I keep seeing errors like:
2013-12-06 12:13:16,345: [DEBUG] Invalid line detected (invalid date): 2012-05-04 07:02:17 EDT 12033 - "GET /members/137/favorites?category=resources&limit=30&limitstart=0 HTTP/1.1" 500 20 66.249.71.245 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" TLSv1 0 84141 - d61uhjuvbjko8aegt8co2i0oh3 - - - - - 137 -
and nothing gets imported. Does this script expect the dates to be in any particular format? Would there be anything else that I could do to bypass this error without having to format my log files (they are very large and very numerous)?