MSNBot accessing module=CoreHome in URL

Hello,
I searched through the forums but couldn’t exactly find my problem but on the other side I am not really sure what I should search for…

I found this odd behaviour of two specific MSNBot IP adresses that try directly to access
mydomain/piwik/?module=CoreHome&action=index&date=[dateOfAccess]&period=day&idSite=1
Whereas [dateOfAccess] obviously is the date the Bot is accessing that URL.

At first I tried to ignore them through the global IP ignore list but since these IPs don’t access my site in the expected/usual way they still show up in the statistics. The only information there I get is the IP, that URL and the country it is coming from. Browser, Plugins and resolution are unknown.
IP adresses are:
199.30.16.0
199.30.20.0
I am using Piwik 1.11.1
How to go about this to stop?

Can you use a robots.txt to have the entire piwik directory ignored?

I could do that but not all bots care for robots.txt. Especially this particular MSNbot doesn’t index anything it tries to access the statistic site of piwik directly for idSite=1. Something that it shouldn’t do in the first place. That’s also why it circumvents the global ignore list of Piwik.

For now I deny access to those two IPs through htaccess in my root but since IPs can change that is not the best solution in the long term…
I still hope there is or will be an option within Piwik to hide those accesses as this is an undesireable behaviour.

It’s a hard thing to try and figure out. If you think about the bots as you would a regular viewer, there is nothing preventing them from viewing that part of your site. You could password protect the whole piwik directory, but then your tracking will be limited to only reading the logs files.

Seems to me like the only way to prevent this is to block the IP (like you’re doing). You could probably make a script at the server level that would analyze these types of requests but I found something that may lead you in the right direction: Site5 KnowledgeBase » Bots: How to block bots that don’t Respect your robots.txt file

Same google, from ip: 66.249.81.58 is accesing ’ index.php?module=CoreHome&action=index&date=2013-05-21&period=day&idSite=1 ’