Is it normal that Piwik counts Google's queries as visits?


#1

For one of my websites I noticed that a surprisingly large amount of visits come from an IP address that, according to Whois.net, belongs to Google. I already suspected that it was Google, because that IP address hits the pages that have AdWords campaigns three times per day. As I don’t get many thousands of visitors on this site, this skews my stats completely.

There must be a better solution to blocking such non-human visits than just blocking that IP, right?


#2

In Settings-Website you have global lists to exclude, by Ip or by User-agents. So if you dont want to exclude google bots by IP (66.249.. would work fine), you can do it by user-agents. Google crawlers - Search Console Help here you can see the agents of google bots. But i really dont see why won`t you exclude visits by ip, many obscure bots dont send agents, or send fake ones, so that means you would have to exclude them by ip anyways.


#3

Thanks for clarifying that, Glisse.

Sure, excluding by IP works fine, but I’d expected something more user-friendly. This solution relies on all users figuring out that a bot with a certain IP hits their sites. I don’t think I’m the exception in finding out only after a couple of months. Also, it relies on bots staying on that IP range, of which I’m don’t know we can be certain. For that reason, I’m now filtering for IP address and user agent.

It seems the visits that were registered, still show up in my stats. Any idea if the filters can be applied to existing data too?


#4

From what I know (but i might be wrong), archived data can`t be easily “recalculate”…so only your future visits are to be excluded, not past ones. There might be a hard way, deleting certain database tables and forcing piwik to rebuild archived data and MAYBE take into account the new rule you set, but i dont think its worth the risk to have ALL your data gone. Maybe somebody else can help you with this one.

On my site which is pretty old (8 monts) i had only two types of bots showing up: google and some seo bots from france. Excluded both via ip range and had never seen any bot in my stats in the past several months. But of course, user-agents is also a good way to get rid of official bots like yahoo, yandex, bing, amazon, etc without worrying of them changing IPs in the future.

Best of luck


#5

Alright, thanks. Sounds like too much effort to me :slight_smile:


(mahdi1234) #6

Piwik contains predefined set of bots to be excluded, these are found at

oss.yml section “# Bot” regex

  • for web server logs it’s in this file

import_logs.py section “XCLUDED_USER_AGENTS =”

If you see your bot there and piwik still counts visits, log a defect.

If you don’t see your bot there, log a defect to be included.

For the archive re-process. First you must not have enabled “Delete old visitor logs and reports” - http://piwik.org/faq/troubleshooting/#faq_42 - because otherwise you may miss some older data and you risk losing historical reports.

To reprocess archives see http://piwik.org/faq/how-to/faq_59/ better take backup of your db prior to run this.

cheers


(Matthieu Aubry) #7

By default we try to avoid counting all famous bots but obviously google now using new IP addresses not yet banned.

====> @all please send the exact IP address used by Google and not yet banned by Piwik ?


#8

I continued the discussion in a bug report.