[size=medium][/size]I posted this on stack overflow, but hopefully this is a better place to get an answer …
I’m new to piwik and trying to import a bunch of logs. I need help with the log-format-regex. A sample line from the log is:
"1.1.1.1" 2.2.2.2 - myuser [09/Dec/2012:04:03:29 -0500] "GET /signon.html HTTP/1.1" 304 "http://www.example.com/example" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0.1) Gecko/20100101 Firefox/9.0.1"
The command I’m running is here:
python /var/www/piwik/misc/log-analytics/import_logs.py --url=http://ec2-1-1-1-1.compute-1.amazonaws.com/piwik/ /disk2/httpd_prod-psweb1/access_log.2012-11-09 --enable-static --idsite=1 --dry-run --log-format-regex='\\\\"(?P<ip>\\\\S+)\\\\" \\\\S+ \\\\S+ \\\\S+ \\\\[(?P<date>.*?) (?P<timezone>.*?)\\\\] \\\\"\\\\S+ (?P<path>.*?) \\\\S+\\\\" (?P<status>\\\\S+) (?P<length>\\\\S+) \\\\"(?P<referrer>.*?)\\\\" \\\\"(?P<user_agent>.*?)\\\\"'
I’m consistently getting all “requests ignored” and “invalid log lines”. For example:
Logs import summary
0 requests imported successfully
0 requests were downloads
236252 requests ignored:
236252 invalid log lines
0 requests done by bots, search engines, ...
0 HTTP errors
0 HTTP redirects
0 requests to static resources (css, js, ...)
0 requests did not match any known site
0 requests did not match any requested hostname
How can I fix log-format-regex?
TIA,
dan
Edit: Environment is:
Piwik 1.9.2
Python 2.7.3
PHP 5.3.10
on Ubuntu 12.04