Over the course of the last couple of years Google has been trying to make sure that webmasters are slaves to adwords and that keyword data is only provided to advertisers, not webmasters.
This has shown up in out Piwik reports as “keyword not defined” and it now accounts for over a third of all the results.
See data here:
I thought this post here was interesting:
“How to steal some ‘not provided’ data back from Google”
He essentially is suggesting this:
1/ Looks for ‘(not provided)’ search terms.
2/ Where it finds them, it looks to see which page the visitor landed on.
3/ It then changes your keywords report in Google Analytics to show those two pieces of information (the fact that Google suppressed the keyword, and the landing page), rather than just the utterly anonymous ‘(not provided)’.
This info can already be seen in the Visitor > Visitor Log Screen in Piwik (as long as you aren’t in Firefox, because currently it doesn’t work, but that is another post entirely!).
However, what I propose is that because 2/3rds of the keywords are recorded that Piwik essentially ALREADY has a probability database of what a click to THAT page from THAT search engine is likely to be, as long as someone has previously visited that page from that search engine.
So, I propose a new feature.
Instead of showing “keyword not defined”. It should show “Keyword not defined - But Most likely to be…” and then show the top three keywords that historically from the database have been the referrers.
That would at least provide a more useful idea then simply “keyword not defined”, even if it is far from perfect.
For internal pages especially where there are only a small group of keywords driving the traffic, I also think this would end up being fairly accurate.
This also has the effect of making the keyword data more accurate where some particular pages have a disproportionate amount of pages set to “keyword not defined”.
I do think that as a community we need to start thinking up new creative ways to combat this, because “keyword not defined” is projected to reach 100% by the beginning of 2016. (It was 4% in March 2012. It is currently 34% just a year later); and to my mind this is a massive threat to the very existence and usefulness of applications like Piwik.