Add a referrer exclude list

Hi,

In the “Global websites settings” of the Administration console, there are various lists of parameters that lead to a visit not being registered. To this date (I can’t give the version of Piwik because that version info is nowhere to be seen unless I go to API), these “parameters”/technical data are: IP address, Query URL parameters, and User agents.

I’m asking for “referrer” to also be taken into account. The reason is that my Dashboard entries are being polluted by the so-called “services” of SEO companies that have started launching their crawlers in all directions over the Web. One such company is Semalt. Besides the fact that their purpose is damn nebulous to me, I have explicitly asked, using a form that they provide, for my website to be removed from their database… to no avail. I get tens if not hundreds of entries in the dashboard, to the point that 60% to 80% of entries are from their 100dollars-seo.com website, a website that you can’t reach in a Web browser but that turns up as a referrer in analytics entries. Looks like spam techniques to me.

I am fed up with this, all the more since I can’t just exclude IP addresses: their users are apparently from the entire world and there’s no IP address that I can single out and exclude.

Thanks.

I use a .htaccess file to exclude a number of these spammers. That way, the spammers don’t get past the front door.

The latest version of Piwik (version 2.13.1) has implemented suppression of spamming referrers, but the way of updating the referrer suppression list is not well documented and also doesn’t include a user interface for this updating. Moreover, simply because the spamming referrers are excluded from the Piwik statistics doesn’t mean that your site won’t see this unwanted traffic. I’m not sure whether the spamming referrers are excluded from Piwik’s database.

You should be able to determine what Piwik version you’re running. When logged in with Superuser rights, the Admin pages have a “Check for Updates” button near the top right of each Admin page. When you hover over this button, you see what version you’re running. If you don’t have such a button, it’s definitely time to upgrade.

Sorry to jump in on your post - but how do you use the “referrer suppression list” ? I found the file on github, but don’t know how/where I should put this list of domains to block! I’m getting fed up with them wasting my resources and skewing the analytics

TIA

Andy

@andy:
I agree – this feature is very poorly documented.

From what I can gather, the file on github is downloaded automatically by the latest versions of Piwik. Exactly when and how often this happens is a mystery to me. Since the download happens automatically, the file would have to be stored somewhere in the /tmp directory or in the database.

The file is on github so that everyone can update it.

Quite frankly, I would prefer that the superuser have full control over whether, how often, and when this file gets downloaded. Once downloaded, the superuser should have some means of modifying the file locally without involving github. The reasoning here is that one administrator’s referrer spam may not be another’s. Ah well, dream on!

As I’ve stated in other posts (search for “referrer spam” in the forum (how do you find things in github??)), I prefer to use a .htaccess file to turn away referrer spam before Piwik actually sees it.

@canajun2eh - yeah, tell me about it. The reason I was even looking for it, was because I’m seeing lots of “100 dollar seo” (first on the list here: referrer-spam-blacklist/spammers.txt at master · matomo-org/referrer-spam-blacklist · GitHub ), coming up in my stats … so was thinking maybe there was something I need to do my end :confused:

Quite frankly, I would prefer that the superuser have full control over whether, how often, and when this file gets downloaded. Once downloaded, the superuser should have some means of modifying the file locally without involving github. The reasoning here is that one administrator’s referrer spam may not be another’s. Ah well, dream on!

Yeah - the problem with that though, is that it doesn’t always work (at least, it didn’t on my google analytics). You put a “black hole” or redirect to google in place… yet they still showed up).

Bloody annoying, and total waste of time! Maybe someone needs to give the spammer domains a dose of a DDoS attack on their servers, to get the point across that we are pissed off with them? :wink: haha

So far I have not noticed, that Piwik is filtering any referrer spam. I had to do that manually with the above mentioned .htaccess file in the Piwik folder of the website being monitored by Piwik, including those lines:


## STOP REFERRER SPAM
RewriteCond %{HTTP_REFERER} 100dollars-seo\.com [NC,OR]
RewriteCond %{HTTP_REFERER} abcd4\.de [NC,OR]
RewriteCond %{HTTP_REFERER} adviceforum\.info [NC,OR]
RewriteCond %{HTTP_REFERER} best-seo-offer\.com [NC,OR]
RewriteCond %{HTTP_REFERER} best-seo-solution\.com [NC,OR]
RewriteCond %{HTTP_REFERER} buttons-for-your-website\.com [NC,OR]
RewriteCond %{HTTP_REFERER} semalt\.com [NC]
RewriteRule .* - [F]

Still, the existing log files in Piwik are spoiled by those spam referrers, so the charts are not correct. I guess, more than 20% of my logs are spam. If anyone has an idea how to get rid of them from the database… please let me know.

Happy weekend,
Ravelli

Actually, the .htaccess file should be in the webroot directory of the site that Piwik is monitoring. If you’re monitoring multiple sites, each site’s webroot directory needs a copy of that .htaccess file.

You want the referrer spam to be rejected by your sites, and not just by Piwik.

Hi canajun2eh,
you are absolutely right, and I have it like you explained in the websites root. I wrote it wrong and will fix it above.
Many thanks for the hint!
Ravelli

Agreed - the block needs to be on the PAGE (not Piwik)… because as far as the Piwik request is concerned, the %{HTTP_REFERER} is actually your site (as that is what requested the resource)

However - a single .htaccess file will work just fine - as long as its in the public_html/www folder

Hi everyone,

what happened is that our last release was in early may and it has a very outdated referrer spam list. Starting in PIwik 2.14.0 it will download the list once a week from Github. Then from Piwik 2.14.0 you will enjoy the updated list and we can all fight spammers together efficiently!

Sorry it took so long for 2.14.0 but it is a good release Piwik 2.14.0 - Analytics Platform - Matomo (currently RC)

Hi,

Thanks for the reply. Is there a way we can manually update the list, until the 2.14.0 in released as an update?

TIA

Andy

2.14.0 has been released, and the list will now be automatically updated every week. Check it out :slight_smile:

The list is released under the Public Domain and anyone can use it within their applications to exclude referrer spammers.Many people have already contributed new spammers to the list. We invite you to use the list in your apps and websites and help us keep the list up to date!