I’m not a software developer nor have I had much experience with a bug tracking system, so I figured I would format the data as best I could in a forum post so at least someone could see the problem/solution.
I have Piwik 1.11.1 setup exclusively with Apache logs for ~6 websites on a dedicated Linux server running Debian 6, so there’s no Javascript tracking to worry about. I have every website setup to automatically rotate the logs, import the data into Piwik, have Piwik archive and then archive the log files in a tar ball, but when I came in today I noticed there was nothing in Piwik for the weekend. I looked through my scripts log file and found that Piwik was labeling nearly every line as “invalid” during the import but just with a few websites, not all of them. After researching all day, I found that it seems to be a bug in the import_logs.py script for 1.11.1, but not for the script in 1.10.1 or in the master branch off of PiWiks git.
Here’s the beginning of a log file that causes this issue in 1.11.1:
127.0.0.1 - - [24/Mar/2013:00:05:04 -0400] "GET / HTTP/1.0" 200 2873 "-" "Wget/1.12 (linux-gnu)"
127.0.0.1 - - [24/Mar/2013:00:10:03 -0400] "GET / HTTP/1.0" 200 2873 "-" "Wget/1.12 (linux-gnu)"
127.0.0.1 - - [24/Mar/2013:00:10:03 -0400] "GET / HTTP/1.0" 200 2873 "-" "Wget/1.12 (linux-gnu)"
69.181.99.999 - - [24/Mar/2013:00:14:48 -0400] "GET /ca_bested/ HTTP/1.1" 200 664 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; FunWebProducts; SearchToolbar 1.1; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30618; ClickPotato 1.0.546.0; CP_IT_E8790372B276545230AA99)"
69.181.99.999 - - [24/Mar/2013:00:14:55 -0400] "GET /ca_bested/index.php?cacheGood=false&shortName=ca_bested HTTP/1.1" 302 703 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; FunWebProducts; SearchToolbar 1.1; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30618; ClickPotato 1.0.546.0; CP_IT_E8790372B276545230AA99)"
69.181.99.999 - - [24/Mar/2013:00:14:55 -0400] "GET /ca_bested/dashboard/Interface.html HTTP/1.1" 304 210 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; FunWebProducts; SearchToolbar 1.1; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30618; ClickPotato 1.0.546.0; CP_IT_E8790372B276545230AA99)"
69.181.99.999 - - [24/Mar/2013:00:14:56 -0400] "GET /ca_bested/dashboard/history/history.css HTTP/1.1" 304 210 "http://blah.blahblah.com/ca_bested/dashboard/Interface.html" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; FunWebProducts; SearchToolbar 1.1; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30618; ClickPotato 1.0.546.0; CP_IT_E8790372B276545230AA99)"
69.181.99.999 - - [24/Mar/2013:00:14:56 -0400] "GET /ca_bested/dashboard/AC_OETags.js HTTP/1.1" 304 211 "http://blah.blahblah.com/ca_bested/dashboard/Interface.html" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; FunWebProducts; SearchToolbar 1.1; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30618; ClickPotato 1.0.546.0; CP_IT_E8790372B276545230AA99)"
69.181.99.999 - - [24/Mar/2013:00:14:56 -0400] "GET /ca_bested/dashboard/history/history.js HTTP/1.1" 304 211 "http://blah.blahblah.com/ca_bested/dashboard/Interface.html" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; FunWebProducts; SearchToolbar 1.1; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30618; ClickPotato 1.0.546.0; CP_IT_E8790372B276545230AA99)"
And this is what the output of import_logs.py looks like
Parsing log /home/mattw/stupidLogs/www_blah.blahblah.com:_apache_access.log...
6717 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
12044 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 1 records/sec (avg), 5 records/sec (current)
15245 lines parsed, 5 lines recorded, 1 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
15245 lines parsed, 5 lines recorded, 0 records/sec (avg), 0 records/sec (current)
Purging Piwik archives for dates: 2013-03-24 2013-03-25
To re-process these reports with your new update data, execute the piwik/misc/cron/archive.php script, or see: http://piwik.org/setup-auto-archiving/ for more info.
Logs import summary
-------------------
578 requests imported successfully
0 requests were downloads
14667 requests ignored:
14667 invalid log lines
0 requests done by bots, search engines, ...
0 HTTP errors
0 HTTP redirects
0 requests to static resources (css, js, ...)
0 requests did not match any known site
0 requests did not match any requested hostname
Website import summary
----------------------
578 requests imported to 1 sites
1 sites already existed
0 sites were created:
0 distinct hostnames did not match any existing site:
Performance summary
-------------------
Total time: 33 seconds
Requests imported per second: 17.4 requests per second
The first three lines of the log file is what the script is getting caught up on. Upon removing all three of those lines (Not just one), the script handles the log just fine with the following output:
Parsing log /home/mattw/stupidLogs/www_blah.blahblah.com:_apache_access.log...
3741 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 64 lines recorded, 10 records/sec (avg), 64 records/sec (current)
4634 lines parsed, 64 lines recorded, 9 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 64 lines recorded, 7 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 64 lines recorded, 7 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 64 lines recorded, 6 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 64 lines recorded, 5 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 64 lines recorded, 5 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 64 lines recorded, 4 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 64 lines recorded, 4 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 64 lines recorded, 4 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 218 lines recorded, 13 records/sec (avg), 154 records/sec (current)
4634 lines parsed, 229 lines recorded, 13 records/sec (avg), 11 records/sec (current)
4634 lines parsed, 229 lines recorded, 12 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 229 lines recorded, 12 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 428 lines recorded, 21 records/sec (avg), 199 records/sec (current)
4634 lines parsed, 428 lines recorded, 20 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 428 lines recorded, 19 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 428 lines recorded, 18 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 428 lines recorded, 17 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 428 lines recorded, 17 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 428 lines recorded, 16 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 428 lines recorded, 15 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 428 lines recorded, 15 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 428 lines recorded, 14 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 562 lines recorded, 18 records/sec (avg), 134 records/sec (current)
4634 lines parsed, 562 lines recorded, 18 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 562 lines recorded, 17 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 562 lines recorded, 17 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 853 lines recorded, 25 records/sec (avg), 291 records/sec (current)
4634 lines parsed, 867 lines recorded, 24 records/sec (avg), 14 records/sec (current)
4634 lines parsed, 867 lines recorded, 24 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 867 lines recorded, 23 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 867 lines recorded, 22 records/sec (avg), 0 records/sec (current)
4634 lines parsed, 867 lines recorded, 22 records/sec (avg), 0 records/sec (current)
5631 lines parsed, 1250 lines recorded, 31 records/sec (avg), 383 records/sec (current)
5631 lines parsed, 1250 lines recorded, 30 records/sec (avg), 0 records/sec (current)
5631 lines parsed, 1250 lines recorded, 29 records/sec (avg), 0 records/sec (current)
5631 lines parsed, 1250 lines recorded, 29 records/sec (avg), 0 records/sec (current)
5631 lines parsed, 1250 lines recorded, 28 records/sec (avg), 0 records/sec (current)
5631 lines parsed, 1250 lines recorded, 27 records/sec (avg), 0 records/sec (current)
5631 lines parsed, 1250 lines recorded, 27 records/sec (avg), 0 records/sec (current)
5631 lines parsed, 1250 lines recorded, 26 records/sec (avg), 0 records/sec (current)
5631 lines parsed, 1330 lines recorded, 27 records/sec (avg), 80 records/sec (current)
5631 lines parsed, 1330 lines recorded, 27 records/sec (avg), 0 records/sec (current)
....
Once the import_logs.py script is replaced with the 1.10.1[/url] version or the [url=https://raw.github.com/piwik/piwik/master/misc/log-analytics/import_logs.py]master version from GitHub, it handles the log file fine without having to edit it, so the bug seems to be fixed as of now but I thought others might have the same frustrations as me.