API pageUrl request returns duplicate data


#1

Hello,

I’m having trouble using this API, because any newly created URL seems to have recurrent visits even when it did not exist.

Here is the request I’m using
http://piwik.mywebsite.com/.fr/index.php?method=Actions.getPageUrl&pageUrl=http://my_url&date=last10&period=day&module=API&idSite=1&token_auth=e9e3bddcfa2455c5111e39a70024deed&format=xml

which returns the following XML


<results>
<result date="2012-07-07"/>
<result date="2012-07-08"/>
<result date="2012-07-09"/>
<result date="2012-07-10"/>
<result date="2012-07-11"/>
<result date="2012-07-12"/>
<result date="2012-07-13">
<row>
<label>/a5c2f0a40a</label>
<nb_visits>4</nb_visits>
<nb_uniq_visitors>4</nb_uniq_visitors>
<nb_hits>4</nb_hits>
<sum_time_spent>1706</sum_time_spent>
<entry_nb_uniq_visitors>2</entry_nb_uniq_visitors>
<entry_nb_visits>2</entry_nb_visits>
<entry_nb_actions>33</entry_nb_actions>
<entry_sum_visit_length>12669</entry_sum_visit_length>
<entry_bounce_count>1</entry_bounce_count>
<exit_nb_uniq_visitors>2</exit_nb_uniq_visitors>
<exit_nb_visits>2</exit_nb_visits>
<avg_time_on_page>427</avg_time_on_page>
<bounce_rate>50%</bounce_rate>
<exit_rate>50%</exit_rate>
<url>http://mywebsite.com/a5c2f0a40a</url>
</row>
</result>
<result date="2012-07-14"/>
<result date="2012-07-15"/>
<result date="2012-07-16"/>
<result date="2012-07-17"/>
<result date="2012-07-18"/>
<result date="2012-07-19"/>
<result date="2012-07-20"/>
<result date="2012-07-21"/>
<result date="2012-07-22"/>
<result date="2012-07-23"/>
<result date="2012-07-24">
<row>
<label>/a5c2f0a40a</label>
<nb_visits>4</nb_visits>
<nb_uniq_visitors>4</nb_uniq_visitors>
<nb_hits>4</nb_hits>
<sum_time_spent>1706</sum_time_spent>
<entry_nb_uniq_visitors>2</entry_nb_uniq_visitors>
<entry_nb_visits>2</entry_nb_visits>
<entry_nb_actions>33</entry_nb_actions>
<entry_sum_visit_length>12669</entry_sum_visit_length>
<entry_bounce_count>1</entry_bounce_count>
<exit_nb_uniq_visitors>2</exit_nb_uniq_visitors>
<exit_nb_visits>2</exit_nb_visits>
<avg_time_on_page>427</avg_time_on_page>
<bounce_rate>50%</bounce_rate>
<exit_rate>50%</exit_rate>
<url>http://mywebsite.com/a5c2f0a40a</url>
</row>
</result>
<result date="2012-07-25"/>
<result date="2012-07-26">
<row>
<label>/a5c2f0a40a</label>
<nb_visits>4</nb_visits>
<nb_uniq_visitors>4</nb_uniq_visitors>
<nb_hits>4</nb_hits>
<sum_time_spent>1706</sum_time_spent>
<entry_nb_uniq_visitors>2</entry_nb_uniq_visitors>
<entry_nb_visits>2</entry_nb_visits>
<entry_nb_actions>33</entry_nb_actions>
<entry_sum_visit_length>12669</entry_sum_visit_length>
<entry_bounce_count>1</entry_bounce_count>
<exit_nb_uniq_visitors>2</exit_nb_uniq_visitors>
<exit_nb_visits>2</exit_nb_visits>
<avg_time_on_page>427</avg_time_on_page>
<bounce_rate>50%</bounce_rate>
<exit_rate>50%</exit_rate>
<url>http://mywebsite.com/a5c2f0a40a</url>
</row>
</result>
</results>

and the URL was opened today (and it might have gotten 5 visits effectively).

Indeed, I noticed days had all the same visits. If I go back to last200 some others day have the same result.

Does anyone have any idea ?

It could help if I could query the MySQL database directly, though I’m not used to querying in MySQL + I don’t know the tables structure. Could anyone tell me what sql request should I make ?

Regards,
Alexis.


(Matthieu Aubry) #2

What is the problem exactly, that the 2 days have the same data? is it always like this?


#3

The problem is that http://my_url was created today and can not have received visits the previous days (it fact it is longer URL, partly random, such as http://my_url/x/5sdfsf5)


(Matthieu Aubry) #4

So Fake data is inserted for past dates?? That is rather strange and the first time I hear of such problem.

How do you track data in piwik, with the javascript code?


#5

I’m not sure whether fake data is inserted or whether the API outputs wrong results.

piwik.js is used everywhere on the site but on the URLs I’m tracking by API. For those my colleague used a Java code he found on this forum, which registers the visit via the API.
I reviewed it, it does not seem to do bad things. Here is the connection it makes:


http://piwik.website.fr/piwik.php?idsite=3&rec=1&apiv=1&url=http%3A%2F%site.com%2F34950ed8bd&urlref=&rand=0.7849625403024532&_id=bd211f466565c351&_ref=&_refts=Fri Jul 27 09:12:05 CEST 2012&res=0x0null&action_name=%2Fa%2F34950ed8bd

Does it seem wrong ?

Also, could you indicate me a SQL query I could make to get the visits straight from the database ? (if it’s easy for you).

Alexis.


#6

Big step in my understanding of the problem.

First of all, my piwik web GUI fails when I print a chart (with something like ‘Oops problem met during the request please try again’ written in orange in the page). I have to rely on the sum of the URLs.

I noticed that this sum and mine is different, and mine is done by looping on the day-per-day array. So I compared period=range and period=day starting from today, and duplicate data appears again.

Here is what it does :

From day 25 to 27
Still good because 19+15+5 = 39

PERIOD

?method=Actions.getPageUrl&pageUrl=http://mywebsite.fr/a/598dc26b&date=2012-07-25,2012-07-27&period=range&module=API&idSite=1&token_auth=e9e3bddcfa2455c5111e39a70024deed&format=json


[{"label":"\/598dc26b","nb_visits":39,"nb_hits":80,"sum_time_spent":23809,"entry_nb_visits":19,"entry_nb_actions":75,"entry_sum_visit_length":30617,"entry_bounce_count":13,"exit_nb_visits":18,"sum_daily_nb_uniq_visitors":39,"sum_daily_entry_nb_uniq_visitors":19,"sum_daily_exit_nb_uniq_visitors":18,"avg_time_on_page":610,"bounce_rate":"68%","exit_rate":"46%","url":"http:\/\/mywebsite.fr\/a\/598dc26b"}]

DAY

?method=Actions.getPageUrl&pageUrl=http://mywebsite.fr/a/598dc26b&date=2012-07-25,2012-07-27&period=day&module=API&idSite=1&token_auth=e9e3bddcfa2455c5111e39a70024deed&format=json


{"2012-07-25":[{"label":"\/598dc26b","nb_visits":19,"nb_uniq_visitors":19,"nb_hits":43,"sum_time_spent":12991,"entry_nb_uniq_visitors":9,"entry_nb_visits":9,"entry_nb_actions":"21","entry_sum_visit_length":"6194","entry_bounce_count":"6","exit_nb_uniq_visitors":8,"exit_nb_visits":8,"avg_time_on_page":684,"bounce_rate":"67%","exit_rate":"42%","url":"http:\/\/mywebsite.fr\/a\/598dc26b"}],
"2012-07-26":[{"label":"\/598dc26b","nb_visits":15,"nb_uniq_visitors":15,"nb_hits":30,"sum_time_spent":7069,"entry_nb_uniq_visitors":7,"entry_nb_visits":7,"entry_nb_actions":"22","entry_sum_visit_length":"4925","entry_bounce_count":"5","exit_nb_uniq_visitors":8,"exit_nb_visits":8,"avg_time_on_page":471,"bounce_rate":"71%","exit_rate":"53%","url":"http:\/\/mywebsite.fr\/a\/598dc26b"}],
"2012-07-27":[{"label":"\/598dc26b","nb_visits":5,"nb_uniq_visitors":5,"nb_hits":7,"sum_time_spent":3749,"entry_nb_uniq_visitors":3,"entry_nb_visits":3,"entry_nb_actions":"32","entry_sum_visit_length":"19498","entry_bounce_count":"2","exit_nb_uniq_visitors":2,"exit_nb_visits":2,"avg_time_on_page":750,"bounce_rate":"67%","exit_rate":"40%","url":"http:\/\/mywebsite.fr\/a\/598dc26b"}]}

For day 24 to 27
Wrong because 12+19+15+12 != 51.

PERIOD

?method=Actions.getPageUrl&pageUrl=http://mywebsite.fr/a/598dc26b&date=2012-07-24,2012-07-27&period=range&module=API&idSite=1&token_auth=e9e3bddcfa2455c5111e39a70024deed&format=json


[{"label":"\/598dc26b","nb_visits":51,"nb_hits":121,"sum_time_spent":34660,"entry_nb_visits":25,"entry_nb_actions":91,"entry_sum_visit_length":35728,"entry_bounce_count":16,"exit_nb_visits":24,"sum_daily_nb_uniq_visitors":51,"sum_daily_entry_nb_uniq_visitors":25,"sum_daily_exit_nb_uniq_visitors":24,"avg_time_on_page":680,"bounce_rate":"64%","exit_rate":"47%","url":"http:\/\/mywebsite.fr\/a\/598dc26b"}]

DAY

http://piwik.mywebsite.fr/index.php?method=Actions.getPageUrl&pageUrl=http://mywebsite.fr/a/598dc26b&date=2012-07-24,2012-07-27&period=day&module=API&idSite=1&token_auth=e9e3bddcfa2455c5111e39a70024deed&format=json


{"2012-07-24":[{"label":"\/598dc26b","nb_visits":12,"nb_uniq_visitors":12,"nb_hits":41,"sum_time_spent":10851,"entry_nb_uniq_visitors":6,"entry_nb_visits":6,"entry_nb_actions":"16","entry_sum_visit_length":"5111","entry_bounce_count":"3","exit_nb_uniq_visitors":6,"exit_nb_visits":6,"avg_time_on_page":904,"bounce_rate":"50%","exit_rate":"50%","url":"http:\/\/mywebsite.fr\/a\/598dc26b"}],
"2012-07-25":[{"label":"\/598dc26b","nb_visits":19,"nb_uniq_visitors":19,"nb_hits":43,"sum_time_spent":12991,"entry_nb_uniq_visitors":9,"entry_nb_visits":9,"entry_nb_actions":"21","entry_sum_visit_length":"6194","entry_bounce_count":"6","exit_nb_uniq_visitors":8,"exit_nb_visits":8,"avg_time_on_page":684,"bounce_rate":"67%","exit_rate":"42%","url":"http:\/\/mywebsite.fr\/a\/598dc26b"}],
"2012-07-26":[{"label":"\/598dc26b","nb_visits":15,"nb_uniq_visitors":15,"nb_hits":30,"sum_time_spent":7069,"entry_nb_uniq_visitors":7,"entry_nb_visits":7,"entry_nb_actions":"22","entry_sum_visit_length":"4925","entry_bounce_count":"5","exit_nb_uniq_visitors":8,"exit_nb_visits":8,"avg_time_on_page":471,"bounce_rate":"71%","exit_rate":"53%","url":"http:\/\/mywebsite.fr\/a\/598dc26b"}],
"2012-07-27":[{"label":"\/598dc26b","nb_visits":12,"nb_uniq_visitors":12,"nb_hits":41,"sum_time_spent":10851,"entry_nb_uniq_visitors":6,"entry_nb_visits":6,"entry_nb_actions":"16","entry_sum_visit_length":"5111","entry_bounce_count":"3","exit_nb_uniq_visitors":6,"exit_nb_visits":6,"avg_time_on_page":904,"bounce_rate":"50%","exit_rate":"50%","url":"http:\/\/mywebsite.fr\/a\/598dc26b"}]}

tl;dr :slight_smile: : problem is that the last day 27 for period=day is wrong in this last request. It is a duplicate of day 25.

NB : I have to use a range of date because lastX returns false data on period=range : last1 to last5 always returns last5, then lastX where X is over 5 the result is correct.


#7

The incorrect lines have changed (they change from time to time), but this does not seem to be related to the auto-archiving because I disabled it and the problem is still here.

I enabled sql profiling and error/warning logging but nothing interesting came up.

Anyone have any idea about that ? This is really weird.


(Matthieu Aubry) #8

I have some (maybe) good news- that I hope I fixed your bug in trunk.

It is now released in 1.8.3 beta which you can use already if you wish, info at: 301 Moved Permanently


#9

I was about to re-install anyway, as part of debugging. I will try this release thanks Matt.


#10

Well, that did not work, I still have random duplicated days in the response of an Actions.getPageUrl by range API request.

And I still also have this :
http://img856.imageshack.us/img856/1832/capturedu20120801091834.png
(when I try to see the chart of any URL directly from the web GUI).

which lets me think my piwik is broken in some way but I can’t figure out what.


#11

Nevertheless thanks to the implementation of bulk request I was able to get a correct result by bulking the same request for each day whereas requesting the range of dates.

I think we will switch to that for the moment.


#12

It does not support any other format than XML at the moment does it ?


#13

In fact it does, but it must be inserted as a http parameter.


(Matthieu Aubry) #14

If you still have the problem with 1.8.3 beta, then I’d be very interested to have a DB dump of your database (without archive tables) and the steps to reproduce the issue ? I really dont understand this issue, thx


#15

I can’t believe I had not done that before but I tried something simple : I tested another URL of my website (a piwik.js-tracked one) and the problem was gone. This is probably the Java Tracker I am using that does this problem.

Still, it is very strange that it does that to the database.

Does the dump still interest you ?


(Matthieu Aubry) #16

I’m a bit overbooked so if you don’t have the bug anymore, i’m fine with it!