Piwik Data Not Accurate

Hi,

I’ve noticed several problems with the data being collected by Piwik, specifically with regards to the Keyword list data being recorded for Live and Google. I have listed these problems below:

  1. The search engine keyword list for Live.com appears to be wrong. Several top keywords on the list are definitely keywords that are not used to access our site, such as “Shopping” and “Customer”. I’m not sure how these keywords are being captured by Piwik or if there is something wrong with the Live.com.

I’ve since been informed that this may be the cause:
http://ekstreme.com/thingsofsorts/blogging…spammed-you-too

  1. The Google search engine keyword list is also not compiling properly. For example, we rank high for the term “asset label” and this would be the term we would expect to see in the Piwik keyword list for Google. Instead, we see “asset” and “label” as two different words, even though we don’t rank in the top 100 for both of those keywords. The term “asset label” shows up near the bottom of the keyword list.

  2. Due to the problems described above, a legitimate keyword only shows up on our overall keyword list (all search engines combined) at position 5. The top 4 keywords on the list are all false.

  3. The user’s configuration settings list is also being compiled incorrectly. The top rated configuration setting is reporting as “Windows 2003 Server, Internet Exlporer, 800 x 600”. This is clearly wrong and should actually fall near the bottom of the list. I’m not sure if the problem is occuring due to the problems described above as well?

Piwik is an impressive product however, the inability to accurately report and measure real information basically destroys its purpose. Analysis can’t be performed using the compiled data due to a complete lack of trust of what is being reported.

Is anyone else experiencing similar problems?

Thanks,
Olimess

There’s not much Piwik can do about referral spam. We do have an open ticket to ignore visits from certain IP addresses (e.g., reserved local IPs and user defined ones).

they’re unfortunately all due to referral spam, piwik tracks properly double words keywords.
I think we could block the one from live pretty easily, what we need is:

  • the rows from your piwik_log_visit that you believe are due to referer spam

Also you mention a problem with google. can you locate the rows in piwik_log_visit that seem to have a wrong referer_keyword column?

I took a look at the piwik_log_visit table in the database, which helped to clear some things up for me.

Firstly, Google is actually reporting correctly. It turns out that we are receiving some traffic from Google India for keywords that we are not ranked for in Google US. I’m not sure why we are ranked high for these keywords in Google India since we do very little business with India. The only Google we really care about is Google US. Too bad we can’t isolate Google US results since the information shown becomes rather useless when combining Google country search results in this way.

I have attached the list of spam referrals from Live. The funny things is that the spam referrals sometimes use portions of keywords listed on our site but that we definitely don’t rank in the top 100 for. Hopefully you find this information useful.

Olimess

see http://dev.piwik.org/trac/ticket/686 for more info