Bot visits are getting counted

Problem

It seems like some bots have managed to disguise themselves as regular users and get counted by Matomo, because the number of daily visits have risen 15-fold, mainly from bot-looking visits.

Tasks

  1. How can we make sure bots are not getting counted?
    See comment #2 below
  2. How can we delete these false visits?
    See comment #3 below

Observations

Some observations which seems to confirm that these are bots:

The majority report using a 800x600 screen

Some stats, which seem to confirm the bot theory:

  • 295 visits, 295 unique visitors
  • 0s average visit duration -100%
  • 99% visits have bounced (left the website after one page) +4.2%

From the last 400 visits, these ~160 IP’s share the same parent ranges:

154.94.32.109
154.94.32.177
154.94.32.227
154.94.36.107
154.94.36.219
154.94.43.132
154.94.43.151
154.94.47.192
154.94.51.26
154.94.56.133
154.94.63.184
154.94.63.238
172.121.116.204
172.121.117.198
172.121.148.175
172.121.149.143
172.121.172.69
172.121.173.116
172.121.173.134
172.121.173.232
172.121.173.39
172.121.173.64
172.121.184.169
172.121.184.210
172.121.188.151
172.121.188.36
172.121.188.54
172.121.28.139
172.121.28.151
172.121.28.204
172.121.28.204
172.121.28.88
172.121.6.147
172.121.6.96
172.121.72.27
172.121.74.132
172.121.75.183
172.121.75.227
172.121.76.185
172.121.79.189
172.121.81.219
172.121.88.124
172.121.90.130
172.121.90.186
172.121.92.143
172.121.97.106
172.121.97.144
172.252.104.155
172.252.109.190
172.252.109.246
172.252.114.104
172.252.114.211
172.252.114.218
172.252.115.237
172.252.116.57
172.252.122.155
172.252.122.214
172.252.124.5
172.252.124.50
172.252.132.151
172.252.137.229
172.252.153.139
172.252.153.246
172.252.180.153
172.252.182.118
172.252.182.187
172.252.182.207
172.252.182.69
172.252.184.141
172.252.186.125
172.252.186.215
172.252.193.75
172.252.197.198
172.252.197.213
172.252.197.50
172.252.199.212
172.252.199.231
172.252.199.81
172.252.205.82
172.252.206.7
172.252.209.162
172.252.209.85
172.252.216.219
172.252.216.63
172.252.223.17
172.252.223.57
172.252.223.71
172.252.228.81
172.252.237.186
172.252.237.201
172.252.237.37
172.252.238.150
172.252.28.224
172.252.28.99
172.252.40.130
172.252.41.241
172.252.41.69
172.252.42.218
172.252.44.170
172.252.45.105
172.252.45.72
172.252.45.78
172.252.47.122
172.252.54.132
172.252.54.148
172.252.54.2
172.252.55.191
45.206.112.3
45.206.113.118
45.206.114.113
45.206.115.228
45.206.115.63
45.206.116.212
45.206.118.253
45.206.118.92
45.206.120.123
45.206.120.74
45.206.121.167
45.206.122.76
45.206.124.115
45.206.124.154
45.206.124.75
45.206.125.4
45.206.125.59
45.206.126.59
45.206.127.60
45.206.127.95
45.206.80.181
45.206.80.194
45.206.80.82
45.206.81.209
45.206.83.170
45.206.83.202
45.206.84.162
45.206.86.164
45.206.87.25
45.206.87.5
45.206.88.58
45.206.89.206
45.206.90.149
45.206.91.195
45.206.91.39
45.206.92.31
45.206.93.234
45.206.95.210
45.207.166.199
45.207.176.106
45.207.178.102
45.207.179.211
45.207.180.111
45.207.181.225
45.207.183.20
45.207.184.237
45.207.185.232
45.207.187.120
45.207.187.203
45.207.187.252
45.207.189.50
45.207.31.10
45.207.31.184
45.207.31.47
45.207.45.150

I found https://plugins.matomo.org/BotTracker but it seems to rely on the honesty of the bot farm to disclose itself as a bot via the user agent. Sadly, many don’t do this, and try to pass as a regular visitor. So a visit from such a disguised bot will be counted as a regular visit.

I instead added the Tracking Spam Prevention plugin, maybe that can take care of blocking disguised bots from getting counted?

… and enabled these filters:

x Block tracking requests from the cloud
Blocks tracking requests originating from cloud providers like AWS, Azure, Digital Ocean, Google Cloud and Oracle by fetching a list of their IP ranges. It should be safe to turn on if you are only tracking using the JavaScript tracker, as their tracking requests do not orginate from clouds, unless they use a VPN that routes data through cloud providers. The setting applies to all your sites.

x Block headless browsers
These are browsers without a user interface, mostly used for automation. It should be safe to turn this on if you only have regular websites or apps. It can block additional bots and spam requests that otherwise would not be detected.

x Block tracking requests from server-side libraries
Use this if only using JavaScript Tracker, as other traffic will be attacks or spam anyway. It blocks tracking requests from cURL, HTTP, Guzzle, and Postman.
Note: Do not use it if track data using a server-side SDK like the Matomo PHP tracking SDK, Java SDK, Python SDK, Android or iOS SDK, or other server-side programming languages.

About deleting erroneously registered bot visits, I found How to invalidate the past historical reports so they can be re-processed from the logs and used the GDPR tool to delete bot visits via a search for Resoution: 800x600. Adding “OS: Windows” or “Browser: Chrome” was not needed – it seems like the 800x600 is the signal.

I then used the InvalidateReports plugin (Option 1) to invalidate the data, but I could still see the visits … I could have waited for the hourly Cron-job (Note: See https://forum.matomo.org/t/problem-setting-up-cron-job-for-archiving/53403/7 ) to rebuild the data, but triggered it manually, and the bot visits are now deleted.

tldr;

  1. Delete bot visits with GDPR tool (Resoution: 800x600)
  2. Invalidate data with Invalidate Reports plugin
  3. Rebuild data manually, or wait for Cron

Thanks for sharing your process. This indeed seems to be the best way exclude the bots.

You’re welcome. Sadly, the bots are still getting through, even with the Tracking Spam Prevention plugin enabled …

It seems like Matomo used to be able to filter off and ignore bots, but bots have gotten so clever that they circumvent the old methods in Matomo, and even the Tracking Spam Prevention plugin.

So currently, it’s a game of Whac-A-Mole, deleting thousands of fake bot visits manually daily …

I hope Matomo can take a look at this, and maybe consider adding new methods to prevent bots from getting counted? See also Tracking Spam Prevention gets bypassed and Feature request: Exclude by metric.