JavaScript statistics vs. import_logs.py statistics

douglas · January 26, 2015, 7:48pm

We are on 2.9.1.

I am comparing the JS results and the import_logs results and I am find a notable discrepancy. Here are some numbers:

Looking at our main site:

                            Visits        Unique Visits         Avg. Time       Actions per visit         Pageviews            Unique pageviews

www/import 17865 17855 200s 6.1 171,742 63,790
www/js 2261 1341 276s 1.9 4,281 2,932
admissions/import 820 820 219s 6.8 5,518 3,234
admissions/js 30 14 232s 2.3 70 47

A count of the uniq ip addresses:
www 14,289
admissions 854

These two counts were done by:

cat access_log|grep -v gsa-crawler|awk ‘{print $1}’|sort|uniq|wc

As you can see, the uniq ips hitting each server is closer to the import count than the JS count.

When I go to the main page on either server or a sub-page, I still see the JS code in there. I am pretty convinced that it is
being “hit” on each page.

The import command I am using in both cases is:

./import_logs.py --url=http://piwik.gpc.edu/piwik --idsite=${idsite} --recorders=8 --enable-http-errors --enable-http-redirects --enable-reverse-dns --recorder-max-payload-size=600 access_log

where access_log is the access log for the site, weather it is www, admissions or something else.

The archive command I use is:

./console core:archive --url=http://piwik.gpc.edu/piwik

Any idea why the counts would be so different between the import and the JS?

Thanks,
Douglas

mnapoli · January 30, 2015, 1:21am

That’s indeed a big difference.

A smaller difference could be explained by ad blockers and other extensions such a Ghostery that block Piwik JS.

Maybe also the log importer imports all HTTP requests, including AJAX calls or calls to pages that are not tracked by Piwik (e.g. admin section, or error pages, or …?).

Could it be a reason? Could you try to find specific pages that appear in the log but not in what Piwik’s JS has tracked?

douglas · February 3, 2015, 7:34pm

Thanks for the suggestions.

I do find some differences in the ‘Page URLs’ section. The main difference is the number of times for an URL. There are some URLs
in the import report that are not in the JS report (and maybe the reverse). But I guess my main concern is that to get into any of these
pages, you have to ‘go by’ the JS tracking code. That being said, is there anyway I could be wrong in saying that the JS count is wrong
given the following information:

14,289 unique ips found in the log file for that day (based off my UNIX command in the original post)
17,865 visits according to the import command
2,261 visits according to the JS code

Thanks,
Douglas

mnapoli · February 3, 2015, 9:33pm

Are those URLs Ajax requests or “real” web pages?

Could it be bots/crawlers? Those wouldn’t execute JS but would be logged by Apache.

douglas · February 9, 2015, 7:39pm

Hi,

As to the second part, are they bots or web crawlers, I do not think so, at least not 12k+ of them (14k-2k). I
actually went through the logs and found a number of search engines coming in, but nothing that would account
for an extra 12k entries in the logs.

As to the first part, I am not really sure how to know if they are AJAX requests or not. We use Drupal right now
and I see a lot of css requests and repetitive request like ‘get modules…’ and so on. That being said, I am still taking
all requests in the log file and looking at the ip number, regardless of what it accesses, and getting a unique list of
those and seeing how many there are.

So, if I have ips:

1.2.3.4
1.2.3.4
1.2.3.5
1.2.3.6
1.2.3.7
1.2.3.7

I would have 6 log lines and 4 unique ips: 1.2.3.4, 1.2.3.5, 1.2.3.6 and 1.2.3.7

I just don’t see what I am missing in order to have such a wide discrepancy…

Thanks!
Douglas

mnapoli · February 9, 2015, 10:35pm

I’m asking about Ajax requests because imagine for example you have a “Search” page that loads the results through Ajax. Each time a user does a search, the page is not refreshed, Piwik doesn’t track a new page view, however an Ajax request is made and logged on the server.

If the user does 10 searches without reloading the page, Piwik will see 1 page view and the logs will see 11 requests.