Bounce rate bounced up dramatically with 1.1.1


#1

Hi,

on the same day I installed Piwik 1.1.1, the bounce rate for my biggest website (20k visitors per day) shot up from about 19 % to 95 %.
I do not believe that my visitors changed their behaviour dramatically like this, especially considering that the website did not change, but only the tracking software Piwik changed.
I also believe that too many unique visitors are now reported. I have the suspicion that repeated page impressions caused by the same people are now counted as created by different people, which overstates both the bounce rate and the number of visitors.

Very much looking forward to a version that counts visitors and bounces correctly!


(baschan) #2

same problem here. The live-visitor plugin counts every PageImpression as an extra visit, too.


#3

Hi

I have installed the 1.1.2b1, but the visit graph still shows twice the number of visits for the time frame where I had 1.1 installed (vs. 1.0 before).
Looks like the data in the database for this period is corrupted. How can I correct it ?

Thanks for your help !


#4

Hi !

Has someone any idea how to solve this problem ?

Thanks !


#5

I suspect we’re just screwed for those days.
Thanks, though, for pointing to Version 1.1.2b! Now my visitor number is not inflated anymore. Actually, it looks quite deflated today. It’s not a holiday in Germany, where most of my visitors come from, as far as I am aware…
The bounce rate is now realistic, though. Maybe there are genuinely fewer visitors today.


#6

Thanks rtpiwik.
Could at least a Piwik developer tells us what was wrong in the code so that I try to fix my data ?
I assumed that grouping visits that happened within 30min in the table “piwik_log_visit” would be enough, but it looks like it’s not sufficient in my case.
Piwik developer, please don’t forsake your users ! I have installed 1.1 because it was advertised as an important security release and now my statistics are messed up.


#7

Of course, I’d also be interested in such a proper fix. But if that should prove impossible, I guess one could also use some other data source, such as the impressions from OpenX on the same website during the same timeframe, and adjust the numbers based on that? So, for a very basic and dirty solution, calculate by how many percent the visitor number is inflated and then randomly delete records such that the visitor number is as it should be? If one deletes records randomly enough, other properties such as the geographic distribution should remain more or less intact…


(vipsoft) #8

Unfortunately, it’s non-trivial to fix any affected data using only SQL. (Otherwise, we would have included a conversion script.)


#9

Thanks for the feedback. You have to understand the frustration of users installing a security update provided as stable release and ending up with corrupted data. Is it impossible or “just” hard ? In the later case, I can work on it if you provide a description of the problem and/or point me to the relevant bugs in the 1.1 code. Then any user who suffered from this bug could benefit from the result.


(vipsoft) #10

Take a look at Matt’s comment here: Edge case: each page is a new visit · Issue #1916 · matomo-org/matomo · GitHub

(I’ll try to answer questions in his absence.)


#11

Thanks vipsoft for the help !

I have fixed the data in piwik_log_visit and piwik_log_action, as illustrated by the requests below.
I have also dropped all the piwik_archive_* tables, but unfortunately, my graphs are still showing twice the visits as expected.
Any idea ? Is there a cache somewhere, or are my queries not representative of what piwik is doing internally to prepare the graph data ?

BEFORE:


mysql> SELECT count(distinct visitor_idcookie) as nb_uniq_visitors,count(*) as nb_visits,sum(visit_total_actions) as nb_actions,max(visit_total_actions) as max_actions,sum(visit_total_time) as sum_visit_length,sum(case visit_total_actions when 1 then 1 else 0 end) as bounce_count,sum(case visit_goal_converted when 1 then 1 else 0 end) as nb_visits_converted FROM piwik_log_visit  where visit_last_action_time >= "2011-01-15" and visit_last_action_time < "2011-01-16";
+------------------+-----------+------------+-------------+------------------+--------------+---------------------+
| nb_uniq_visitors | nb_visits | nb_actions | max_actions | sum_visit_length | bounce_count | nb_visits_converted |
+------------------+-----------+------------+-------------+------------------+--------------+---------------------+
|              233 |      1277 |       1973 |          59 |            48702 |         1173 |                   0 |
+------------------+-----------+------------+-------------+------------------+--------------+---------------------+
1 row in set (0.08 sec)


AFTER:


mysql> SELECT count(distinct visitor_idcookie) as nb_uniq_visitors,count(*) as nb_visits,sum(visit_total_actions) as nb_actions,max(visit_total_actions) as max_actions,sum(visit_total_time) as sum_visit_length,sum(case visit_total_actions when 1 then 1 else 0 end) as bounce_count,sum(case visit_goal_converted when 1 then 1 else 0 end) as nb_visits_converted FROM piwik_log_visit  where visit_last_action_time >= "2011-01-15" and visit_last_action_time < "2011-01-16";
+------------------+-----------+------------+-------------+------------------+--------------+---------------------+
| nb_uniq_visitors | nb_visits | nb_actions | max_actions | sum_visit_length | bounce_count | nb_visits_converted |
+------------------+-----------+------------+-------------+------------------+--------------+---------------------+
|              233 |       655 |       1973 |          59 |            48702 |          552 |                   0 |
+------------------+-----------+------------+-------------+------------------+--------------+---------------------+
1 row in set (0.07 sec)


(vipsoft) #12

From what I can see, there should have been a marked difference in your graphs. Piwik doesn’t cache any of the graph data. You could try clearing your browser cache.


#13

Sorry, I was looking at the wrong place. My code was in fact working.

I have a pretty good version of it now. It’s a Perl script that you will find here :
http://ocroquette.fr/a/piwik/fix-piwik.txt

It looks for visits from the same visitor that took place in less than 30 minutes and merges them. I tried to keep as much data as consistent, fixing the visit properties and also the action table.

Something interesting : it finds a few problems also for the period BEFORE I installed version 1.1, but to a much smaller scale. So the bug (or a similar one) was there before. I will check with the latest beta version.

You will find below an example of the execution on my data. If you want to do it on yours, make sure you test first on a clone and backup your original data twice !

Comments are very much welcome.


$ perl fix-piwik.pl 
Fetching data piwik_log_visit...
Found 41900 visit entries
Found 13188 merges to perform :
  2010-11-27	1
  2010-11-28	2
  2010-11-29	4
  2010-11-30	7
  2010-12-01	5
  2010-12-02	1
  2010-12-03	4
  2010-12-04	2
  2010-12-05	3
  2010-12-06	4
  2010-12-08	2
  2010-12-09	3
  2010-12-10	3
  2010-12-12	16
  2010-12-13	3
  2010-12-14	37
  2010-12-15	9
  2010-12-16	21
  2010-12-17	52
  2010-12-18	10
  2010-12-19	5
  2010-12-20	1
  2010-12-21	22
  2010-12-22	22
  2010-12-23	4
  2010-12-29	2
  2011-01-02	1
  2011-01-03	5
  2011-01-04	66
  2011-01-05	1334
  2011-01-06	1414
  2011-01-07	907
  2011-01-08	779
  2011-01-09	1105
  2011-01-10	1399
  2011-01-11	1393
  2011-01-12	1539
  2011-01-13	835
  2011-01-14	1320
  2011-01-15	843
  2011-01-16	2
  2011-01-19	1
Now exiting. Use --fixit to fix the problems found.


#14

this is really cool, thanks!

Would it be possible to limit the date range on which it operates to Jan. 5 until Jan. 9 or so (maybe configurable)? I don’t like the idea of fixing data that predate the bug (even if some variant of the problem really has been there). If anything goes wrong with your script despite testing, I would not be too sad if it would only affect a date range that is screwed up anyways…

Also, your observation that the problem has been there before could explain why I feel that I have fewer visitors with 1.1.2b than I had with 1.0; this might be an early version of the bug and no real behaviour change on the part of my visitors.


#15

You got it, I have uploaded a new version with the options --from and --to (format: “YYYY-MM-DD” )


#16

Thanks a lot!

unfortunately, I have either too many visitor records (8 m records in the table) or too weak a server - the select statement takes too long, and the script gets killed despite generous research allocation. I tried adding an index and a restriction based on the dates, but still did not get anywhere…


#17

The implementation is very naive (I don’t have much data as you have seen). I will try to optimize it a bit and come back here.

In the meantime, could a Piwik developer give some feedback ? Any issues with this approach ? vipsoft do you have an opinion on this script ? Thanks !


#18

I have updated the script. It will now fetch only the relevant data from the database (based on the given dates).
If the data for your whole time window is still too big, you can call the script day by day.
Furthermore, it will now display the progress for the queries.
Let me know if it works for you.


#19

perfect, thanks!

I am attaching the “before” and “after” images of the German subsection of my site. It now looks a hell of a lot more believable!

EDIT: I ran your script only on the range from Jan 5 to Jan 14 (which I believe is the time when I had the buggy version 1.1.1 installed)


(vipsoft) #20

Nice. IIRC there was a race condition in earlier versions where requests on the same second might return different visitor cookies. Perhaps those are the instances your script caught.

I’m on the road now, so at the earliest, I can take a look at your script tonight when I get back.