Searchengine Robot Hits

It would be interesting which sites and when SE robots are visiting the site. Seems that they wont be register by piwik now.

Yes, this would be great because at my host, awstats and webalizer wont work! Is it possible, because robots dont use javascript?

Search for “smart robots.txt”. Basically you would use a php script to log a spider visit using the piwik API.

are search engines not tracked via noscript tag?

Search engines are tracked by referer, when a visitor clicks on a link from search results to your site. Piwik does not track crawlers/spiders/robots which generally don’t execute javascript or load images.

How hard would it be to add the ability to log visits from the API? Then you could add robot detection to your site and if it detected a robot, it could pass the information to the Piwik API. It wouldn’t need to be nearly as complex as the js code, just track a few $_SERVER fields in php.

I would even goes so far as to say that you could track every visit using the API augmented with the js. That way you log every page view, but get the richness of the javascript checks. Very much a hybrid approach.

Possibly in a future version?

This begs the question, “what is the value?” Other than as a curiosity, there is no metrics use case because the only information you can capture is the ip addr, uri request, and timestamp. In contrast, the user agent is typically forged.

I agree that it’s not of high value when you’re looking at user stats and if the data was collected, I don’t think it should be included in the user reports.

The value I can see is for monitoring peek usage and abuse. If someone is using Piwik, they’re probably not also mining their logs. If Piwik could overlay Non-javascript-“user”-activity, that would give a better picture of overall site load (since spiders can add a great deal of load to a site) and you’d also be able to track bots that may be scraping the site, testing for vulnerabilities, adding spam, etc.

[quote=dmorin @ Jan 16 2009, 01:55 AM]How hard would it be to add the ability to log visits from the API? Then you could add robot detection to your site and if it detected a robot, it could pass the information to the Piwik API. It wouldn’t need to be nearly as complex as the js code, just track a few $_SERVER fields in php.

I would even goes so far as to say that you could track every visit using the API augmented with the js. That way you log every page view, but get the richness of the javascript checks. Very much a hybrid approach.

Possibly in a future version?[/quote]

that would be a great feature to have, see http://dev.piwik.org/trac/ticket/134
any more specification and code very welcome!

I don’t think this is a good idea, because where does piwik get the data from for building a smart robot.txt? How do you get it working on cms systems with multiple domain names and on one source code directory etc. etc.

Search for “smart robots.txt”. Basically you would use a php script to log a spider visit using the piwik API

Í know how it’s work. It’s only possible is you have more control over you’re server. For hosted piwik installation (where piwik is on other domain etc) its a feature that cannot be used. You can put time in it, but there are other more importent feature to develop imho.

For me, in order to move to Piwik, I would like to have robots spider info. Why is it important…you ask? It tells you how your SEO is doing. Is Googlebot visiting youe site? How often, which pages…right now all the other free “javascript” tag web analytics do not do this. I think it’s a good differentiator for Piwik against Google Analytics and it would make a lot of people’s life easier. I now use two analytics software. I love the design of Piwik and it’s much nicer than Crawl Track which I use for robots track… I can tell you when Piwik gets there, I will move over to Piwik exclusive…I am not a PHP expert, but maybe I will play around with API and see if I can work one up. In the end, don’t undersestimate how much this add on will make many people come to this software…

I have to somewhat disagree with your premise here. The frequency of visits by the SE bots is really not that great of a barometer of your SEO. Personally I think the easiest way to test your SEO is to see how your pages are ranking for your desired keywords. (By simply running searches manually in the SE’s.)

Respectfully,

-Lisa

Is there any update on adding this feature?

It would be great if this could be implemented in a future version (maybe with an option to switch off the tracking of spiders/crawlers to reduce server load).

As you all probably know this is one of the features Google Analytics is missing too…

[quote=Lisa H @ Jun 16 2009, 03:48 AM]I have to somewhat disagree with your premise here. The frequency of visits by the SE bots is really not that great of a barometer of your SEO. Personally I think the easiest way to test your SEO is to see how your pages are ranking for your desired keywords. (By simply running searches manually in the SE’s.)

Respectfully,

-Lisa[/quote]
Have to agree with Lisa on this. In my experience the more the bots are visiting my site usually indicates to me that I’m getting a lot of visitors and back links. It’s not reliable for gaugeing how your SEO for the site is. There is a site that will check your rankings for you. I’ll post it if I can find it again as it makes things a lot more organised if you have several keywords.

[quote=Beachcomber @ Jul 14 2009, 03:43 PM]Is there any update on adding this feature?

It would be great if this could be implemented in a future version (maybe with an option to switch off the tracking of spiders/crawlers to reduce server load).

As you all probably know this is one of the features Google Analytics is missing too…[/quote]

agreed, i think this would be an awesome feature =D similar to what statpress reloaded plugin for wordpress does, but only piwik will be viewed on one interface!

It is my understanding that the frequency of visits by the Search Engine bots can be a good indication of how much “authority” they consider your site to have. Tracking SE bots visits to our sites would be valuable SEO information and I support the suggestion that this feature be included in a future Piwik update.

Knowing the robots frequentation from awstats was a life saver for me since I discovered some bots made 10000 requests per day on a calendar plugin, it is something that was completely stealthy on analytics. I support this motion.

http://dev.piwik.org/trac/ticket/2391