For websites with low traffic, bots can be a large chunk of all visits. I noticed when I created an AdWords campaign, I consistently got 3 more visitors a day. Turned out to be a Google bot.
I have two suggestions:
[ol]
[li] By default filter out all visits from known bots by user agent. The list doesn’t have to be complete, but just filtering Google, Bing, Yahoo, etc. would make a big difference. Or:
[/li][li] Tag visits from bots and give users the option to show/hide these visits from reports. I’d say the default option is hiding the bot visits.
[/li][/ol]
One may argue this is no new functionality and users can filter the bots already. However, I think for most users, figuring out how to filter visits from bots is difficult and a lot of work. Moreover, all users having to maintain such lists, can be compared with all people using e-mail maintaining their own spam filters.
By the way, 3 visits/day may not look like a lot, but it got me to believe that there was still a significant number of visitors using Win XP with an old IE version coming to my site. Even on sites with 10x the traffic of my site, some stats can be completely distorted by bots.
I just installed Piwik 2.0.3 and the issue still persists. I haven’t check the complete log, but it visits from Google on pages that are part of an AdWords campaign jump out as still being counted.
Worse, after adding 66.249.. to the block filter list under Settings>Websites>Global list of Excluded IPs, and forced Piwik to reprocess the reports by dropping the archive tables (as described here), the visits still show up.
Thanks for that! I would have one last request… could you look into your “access.log” file on your server, for these IP addresses, and let me know the full list of user agents?
In the code I checked, we exclude all user agents containing googlebot, AdsBot-Google, etc. but maybe google is now using a new user agent ?
I’m sorry, I can’t access the file. I’m on shared hosting and found a directory path matching your description (the path is inside a .vs folder), but I don’t have rights to download the file.
We had an unusual high amount of Internet Explorer 8 users, so I had to dig into the issue. I’m seeing exactly the same behavior and can provide you with some details from the webserver log files.
I was adding 66.249.. to the installation wide list of blocked IP addresses. But I think it would be better to have a general solution in place, so that not everyone would need to dig into this.
Edit: There also seems to be another IP range at 64.233.172.* accessing the side, as mentioned above…