Using API for special URLs in relation to JS page tracking dublicates visits on some browsers


#1

Hi,
I have a script that generates file download urls and have inserted the needed code there to send to piwik that information and count it as a download.
I am able to track those downloads fine. The problem is a little wierd.

The user visit gets counted normaly by piwik using either the js or img part of the tracking code. But if the same user the next moment hits the download button that will power-up the downlaod script I mentioned, then instead of seeing piwik adding the “download” to that user’s actions on the stats, it opens a completely new line, using the same IP of the user and browser info, and adds the “download” there. That would not be a problem if the visit number would not change as well – yes the download gets counted as a new visit by the same IP and not just as a download action of the same visitor.

What complicates things is that on IE everything works as expected.

I will include the code I use in a moment.
Any help appreciated.

EDIT:
This is the php code I use in the download script:


require_once(LIBS.'stats/PiwikTracker.php');
PiwikTracker::$URL = STATS_URL;
if (!isset($stats)) { $stats = new PiwikTracker(PIWIK_SITE_ID, STATS_URL); }

$stats->setTokenAuth(PIWIK_TOKEN_AUTH);
$stats->setIp( get_ip() );
$stats->doTrackAction(BASE_APP_URL.$filename, 'download');

if ($referer = referer())
{ $stats->setUrlReferrer($referer); }

if ($user_agent = user_agent())
{ $stats->setUserAgent($user_agent); }

$stats->doTrackGoal(PIWIK_SITE_DOWNLOAD_GOAL_ID, PIWIK_SITE_DOWNLOAD_GOAL_REVENUE); // $idGoal, $revenue

Notice in the screenshot. although I hided the IP, it is the same visit, counted as 2 visits.

Another example with Google Chrome:

Now see how it all works just fine when the visit is from IE:

If there is one thing I can notice, this is that on other browsers other than IE (maybe more I haven’t tested), the download shows “Direct Entry”. For some reason this doesn’t appear on IE. As I mentioned this is a download “button” and not a href link for the file.
Anybody has an idea how to make this not split the visit into 2 ?
Any help appreciated.


(Matthieu Aubry) #2

What I would do is debug the call with this technique: http://piwik.org/docs/tracking-api/reference/#toc-debugging-the-tracking-api-requests

Then you can look at the output message. Piwik will create a new visit, but you expected Piwik to put the download in the previous visit. The message should explain why a new visit was created.


#3

thanks for the reply matt, I will check this as soon as possible and reply back.
thanks again.


#4

So,

I went to the link you gave me http://piwik.org/docs/tracking-api/reference/
I set $GLOBALS[‘PIWIK_TRACKER_DEBUG’] = true; in piwik.php
The http requests to piwik.php by visiting a page that has the js tracking code in the end of the body is only 1 request and is as follows as shown in firebug:


Cache-Control	no-cache, private, no-transform, must-revalidate, proxy-revalidate, post-check=300, pre-check=300, max-age=300
Connection	Keep-Alive
Content-Encoding	gzip
Content-Type	text/html
Date	Thu, 05 Sep 2013 03:47:13 GMT
Keep-Alive	timeout=5, max=99
Pragma	no-cache
Server	Apache/2.2.22 (Ubuntu)
Transfer-Encoding	chunked
Vary	Accept-Encoding,User-Agent
Request Headersview source
Accept	image/png,image/*;q=0.8,*/*;q=0.5
Accept-Encoding	gzip, deflate
Accept-Language	en-us,en;q=0.5
Connection	keep-alive
DNT	1
Host	stats.dev.ubuntu
Referer	http://dev.ubuntu/software/tests/test
User-Agent	Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:23.0) Gecko/20100101 Firefox/23.0

or


GET /piwik.php?action_name=testing&idsite=1&rec=1&r=806821&h=6&m=47&s=13&url=http%3A%2F%2Fdev.ubuntu%2Fsoftware%2Ftests%2Ftest&urlref=http%3A%2F%2Fdev.ubuntu%2Fsoftware%2Ftests&_id=8b528b7540f3a728&_idts=1378348605&_idvc=1&_idn=0&_refts=0&_viewts=1378348605&pdf=0&qt=1&realp=0&wma=1&dir=0&fla=1&java=0&gears=0&ag=0&cookie=1&res=1920x1080&gt_ms=810 HTTP/1.1
Host: stats.dev.ubuntu
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:23.0) Gecko/20100101 Firefox/23.0
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Referer: http://dev.ubuntu/software/tests/test
Connection: keep-alive

The HTML tab or if I run the URL request on browser I get this:


Debug enabled - Input parameters: <br/>array ( 'action_name' => 'testing', 'idsite' => '1', 'rec' => '1', 'r' => '556146', 'h' => '7', 'm' => '7', 's' => '8', 'url' => 'http://dev.ubuntu/software/tests/test', 'urlref' => 'http://dev.ubuntu/software/tests', '_id' => '8b528b7540f3a728', '_idts' => '1378348605', '_idvc' => '1', '_idn' => '0', '_refts' => '0', '_viewts' => '1378348605', 'pdf' => '0', 'qt' => '1', 'realp' => '0', 'wma' => '1', 'dir' => '0', 'fla' => '1', 'java' => '0', 'gears' => '0', 'ag' => '0', 'cookie' => '1', 'res' => '1920x1080', 'gt_ms' => '646', )
Loading plugins: { Provider,Goals,UserCountry }
Current datetime: 2013-09-05 04:08:37
(this is not a Site Search request)
Excluding parameters "aaaaaa,bbbbbb" from URL
Action is a Page URL, Action name = testing, Action URL = http://dev.ubuntu/software/tests/test

Piwik_Cookie::__set_state(array(
   'name' => '_pk_uid',
   'expire' => 1441426117,
   'path' => '',
   'domain' => '',
   'secure' => false,
   'httponly' => false,
   'value' => 
  array (
  ),
   'keyStore' => false,
))

Matching visitors with: visitorId=8b528b7540f3a728 OR configId=653ff52fe8d942d1
The visitor is known (idvisitor = 1261a5ce0a1facc5, config_id = 653ff52fe8d942d1, idvisit = 111, last action = Thu, 05 Sep 2013 04:08:21 +0000, first action = Thu, 05 Sep 2013 02:35:41 +0000, visit_goal_buyer' = 0)
Visit is known (IP = 192.168.1.140)
Updating existing visit: array ( 'visit_exit_idaction_name' => 142, 'visit_exit_idaction_url' => 131, 'visit_last_action_time' => '2013-09-05 04:08:37', 'visit_total_time' => 5577, 'idvisitor' => '1261a5ce0a1facc5', 'visit_goal_buyer' => '0', )

array (
  'idvisit' => '111',
  'idsite' => 1,
  'idvisitor' => 'a¥Î
',
  'server_time' => '2013-09-05 04:08:37',
  'idaction_url' => '131',
  'idaction_name' => 142,
  'idaction_url_ref' => '131',
  'idaction_name_ref' => '142',
  'time_spent_ref_action' => 16,
  'custom_float' => 646,
)

Piwik_Cookie::__set_state(array(
   'name' => '_pk_uid',
   'expire' => 1441426117,
   'path' => '',
   'domain' => '',
   'secure' => false,
   'httponly' => false,
   'value' => 
  array (
  ),
   'keyStore' => false,
))

-> Scheduled tasks not triggered.
Next run will be from: 2013-09-05 04:47:13 UTC
Nothing to notice => default behaviour
End of the page.

array (
)

Piwik_Timer::__set_state(array(
   'timerStart' => 1378354117.6629,
   'memoryStart' => 2539416,
))

For some reason the time is wrong although I use a special php.ini for piwik and each site on this server and date.timezone is set. The time in piwik stats though is fine.

I cut an image and looks like this – see below. Ignore the last pink folder that looks like a visit on the lower line, that is because I was testing pageview in the php application. Also it looks like there is a time difference just because I made lots of refreshes on the page without hitting the download button for a long time:

Now the page you gave me says:
If the requests are triggered from your app or software directly, you can output or log the Tracking URL piwik.php?… and manually load it to view the logging messages.

As I showed you in my first post, I am communicating with piwik to send the download event using that php code. I am not making any http requests, nor I have set an image to track the downloads.

What I actually do is the following:

  1. Once the download button exists on the page, when pressed I do:

header('Location: '.$download_url.'&site_button=1');
exit;

  1. In the download app, when the .$download_url is found to exist, I check if the ‘site_button’ variable is set, if yes, before I give the file using readfile() I run the code I showed you to send the download event to piwik.

So I cannot understand how to debug this.
I also cannot understand why it is stored as a different visit while the IPs are the same and the time difference is like some seconds later. Maybe cause there is no cookie in the second case ?

Also, while I was testing something different I had the same effect. If I disable javascript on browser and the image part of the code triggers the visit, it also does the same exactly thing, it splits it as a separate visitor and not in the same line of events. If I put back javascript on the browser, it also continues the visits of the other visitor from the same IP. So even if it is me on the same computer and browser and without much delay. if I switch the javascript setting I am 2 different visitors. But this doesn’t bother me at all once none is going to do this on a live website.

Any ideas what to do with the download issue ?
Thank you !

EDIT 1:
This is a lan server but setup similar to the online one and the online website behaves exactly the same with the downloads.
By the way, a lan IP shows USA flag :smiley: heh

EDIT 2:
It might sound unbelievable but I just had a visitor that switched their javascript setting on/off and also made a download. His no-javascript visit was in the same line of actions as his downloads. He wasn’t split into 3 visitors.

EDIT 3:
I tested without javascript and when I visit the page and then download the visit does not get split into 2.
I am trying to figure out what variable is missing and I could send to the downlaod script.


#5

Anyway, I just feel I’ve lost my day on this, and it is important to understand, that it doesn’t want to ‘behave’ and without any help it is almost impossible in this case.

setUrlReferrer() function doesn’t do anything, at least to what it appears, either setting it or not everything is shown as ‘Direct Entry’.

Now on the other hand I feel a little weird concerning the fact that searching all around Google I found only a couple of topics started with this as a subject – which clearly means that it does not appear to be a problem to many, so there must be something that “I” do wrong, although these topics do not end/close with a solution other than --> look for the VisitorId and set it yourself.

I found no source that has even the basic information about where the VisitorId is stored, where to look for it. And say I write my own script to search for it, yeah fine I love to do that, will that have to be something like, search the database for it ? so another database request ? that information is in front of piwik’s face already, a user action that is a couple of seconds later using the same IP, same useragent, and other information that I am indeed able to pass to piwik using the api functions etc etc, but no, piwik says no, this is another user, it totally ignores anyting else. That is not so smart you have to admit.

I sent the referer from the actual page to the other script by setting a variable in the url, I can set as many as needed and send it to the other script, and I did, but where is the VisitorId to send it and finish with this. It is one simple thing which, if missing, then you got to do magic tricks to find it.

Anyway, I am just hours on this honestly and when I am stuck with programming it makes me very tired.
But seriously, hey, when you get the chance just drop a couple of lines sharing your knowledge about what I am doing and what I should do.
Thanks !


(Matthieu Aubry) #6

setUrlReferrer() function doesn’t do anything, at least to what it appears, either setting it or not everything is shown as ‘Direct Entry’.

See Troubleshooting - Analytics Platform - Matomo

I found no source that has even the basic information about where the VisitorId is stored, where to look for it.

See How to - Analytics Platform - Matomo


#7

Hi Matt,

If I do any of the following I still get the visit as Direct Entry:


$stats->setUrlReferrer('test');

$stats->setUrlReferrer('http://www.test.com'); 

if ($referer = referer()) { $stats->setUrlReferrer($referer); }

referer() function does this:


function referer()
{
	if (!empty($_SERVER['HTTP_REFERER']))
	{
		return clean_url($_SERVER['HTTP_REFERER']);
	}
}

The visit comes from the computer I am typing and lands to the laptop on my left both running ubuntu with all web services apache etc.
As we know when SSL is enabed then $_SERVER[‘HTTP_REFERER’] shows nothing so I am testing on LAN without SSL. The actual live site has no SSL so far.
Once I redesigned the buttons and the download system it is impossible for one to download without being on the same page the button exists.
This means that I can only see the download being stored in piwik if someone is using some plugin to block piwik js, but is impossible to block the server side script that fires the download and the rest of the commands to piwik.

So in order to test such a case that should actually show me that the download has a referer and the referer is the page that the button was on, I use a plugin on firefox to block piwik and hit the download button from there. I still get Direct Entry. Please see what I do below in the scripts, it is simple known things.

Actually on the page I display the download button and before the page renders, I do this:


if (isset($_POST[$button_name]))
{
	if (isset($_POST['download_token']))
	{
		$session = JFactory::getSession();

		if ($_POST['download_token'] == $session->get('download_token'))
		{
			$session->clear('download_token');
			header('Location: '.$download_url.'?ref='.$e->current_url_encoded());
			exit;
		}
		else { exit('Invalid Token'); }
	}
	else { exit('Invalid Token'); }
}

current_url_encoded() is a function that takes the current url of that page:


	function current_url_encoded()
	{
		$url = str_rot13($this->clean_url(JURI::current()));
		return str_replace('.', '_', rawurlencode($url));
	}

the $download_url when fired, the downlaod script runs, once the file is found to exist (as I already said), before giving it out, I do:


..
					if (isset($_GET['ref']))
					{
						$stats->setUrlReferrer( url_decoded($_GET['ref']) );
....
..


function url_decoded($url)
{
	$url = str_replace('_', '.', rawurldecode($url));
	$url = str_rot13($url);
	return clean_url($url);
}

all prety simple and obvious. But it still shows Direct Entry. Not too bad, but it is missing information. But not too important.

Now, the other link you just gave me describes how to retrieve visitorId from DB, thanks for the link, this is good info but.
Yesterday I figured out something else and tell me if the way I use it is right or wrong.


if ($visitor_id = $stats->getVisitorId()) { $stats->setVisitorId($visitor_id); }

This is the magic line that put all visits and pageviews as well as download actions of the same user on 1 line.
Once you are not suggesting me setVisitorId() and you are showing me how to get the visitorid from db, is usign this function the way I do right ? Do I confuse piwik this way ?

Now unfortunately whatever I tried to make the javascript on/off of the same user appear on 1 line as well I had no success.
Still, if a user visits with js on and switches to off and refreshes the page, a new line appears. I set his visitorid using &_id= and the above trick but still it splits. Not that bad, I care less about this, I am just mentioning.

Overall I think I solved it, but I would like your confirmation.
Thanks !

EDIT:
One more question: is the visitorid supposed to change on every pageload ? In all my testing it changes on every pageload.


#8

I believe I fixed it. An undefined php variable was causing some problem. Now switching back nad forth java/no-java keeps tracking on the same line. Though, sometimes it does split on real visitation. But it could be that the user closed the tab and reopened it ? who knows.

The thing is… it now looks wonderful ! :slight_smile:


#9

Can you post the completed working version?