Wrong live.com referrer count

pboese · March 19, 2009, 9:25am

I recently installed the latest piwik beta (0.2.32) and am really happy with it except for one point. Most of our visitors come from google search and I was wondering why piwik displayed most of our visitors coming from live.com. So I checked my access log and found that the live.com bot is not correctly recognized as a crawler. And the worst thing is, that the bloody bot sends a referrer matching the most important keyword of the site to crawl which results in piwik displaying lots of traffic coming from live.com search.

examples from my access log:

65.55.107.207 - - [19/Mar/2009:09:29:21 +0100] "GET /waadt/villars-tiercelin HTTP/1.0" 200 11576 "http://search.live.com/results.aspx?q=villars" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322)"
65.55.107.194 - - [19/Mar/2009:09:29:24 +0100] "GET /waadt/ependes HTTP/1.0" 200 11488 "http://search.live.com/results.aspx?q=ependes" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322)"
65.55.110.16 - - [19/Mar/2009:09:29:39 +0100] "GET /waadt/molondin HTTP/1.0" 200 11440 "http://search.live.com/results.aspx?q=molondin" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322)"

I will now try to find out how to identify the bloody bot from redmond. I can either try to implement the detection on my own or maybe someone will give me a hint on where to start? I will make the patch available to the community as soon as it’s working of course.

vipsoft · March 19, 2009, 12:27pm

please refer to the earlier post http://forum.piwik.org/?showtopic=311