Exclude piwik from being indexed by search engines


(pwk) #1

Currently, the piwik pages (e.g. login screen) get indexed by Google, you have to add a header manually to /plugins/login/templates/header.tpl to prevent this:

Also, the robots.txt for piwik should contain the following lines (instead of being empty):

User-agent: *
Disallow: /

This might also become a security issue, as people can goolge for piwik installations and use possible security flaws to gain access to piwik.


(Matthieu Aubry) #2

this is not and won’t be a security issue as it is possible to see that somebody is using piwik by looking at the source code of each page grepping for piwik.js (or google analytics, or any other service…).
we have no plans of adding the robots.txt as some users deliberatly want their piwik to be indexed.


(Marc) #3

I would prefer

At least an optionen to activate or deactivte this meta tag.

But to exclude the piwik directory with robots.txt would be stupid in that case. Because then the robots would never read that they should not index, nor archive the piwik login.


(jr-ewing) #4

dont forget that the robots.txt must be in the domain root ! When you have it in /piwik/ the robot is not reading it

best regards
Tom


(manafta) #5

Piwik includes an empty robots.txt file. Because it is empy, it doesn’t really serve a purpose. Therefore I would like to see that it is removed from new piwik releases.

Also during upgrades, piwik will overwrite an existing robots.txt file, erasing all the rules in it.


(Chrissle) #6

Maybe instead of placing an empty file, it would be useful to actually fill it with contents of use for search engines. Such as:

User-agent: *
Disallow: /

(Or whatever may be best.)


(manafta) #7

that has already been rejected, see:
http://forum.piwik.org/index.php?showtopic=796


#8

Hmm, I just put into header.tpl

I prefer this way over excluding via robots.txt. When using robots.txt Google will still have the piwik folder URL in its SERPs (http://domain.tld/piwik/) but with no content. Only is safe if you want to make sure your piwik folder does not appear in Google SERPs. [cf. 1]

And I also want to have a way to insert the meta tag without editing the core. This should be a basic feature in my eyes with content=“noindex” as default. But a plugin would also be great.

@matt: I think it’s a security issue because an evil crawler can use Google to find piwik folders and then check if the sites use old piwik versions with security issues. This way is much easier for a bot than hoping from one webpage to the other and checking the source code of each page for piwik.js.

[1] Duplicate Content bei Google vermeiden - Seokratie (German, sorry)


(Karlsson) #9

It seems that header.tpl does not exist in Piwik 2.0

The new file for the noindex entry is:
/plugins/Login/templates/login.twig