Archive.php errors are a mystery


#1

We’re running piwik tracking 700+ sites with something like 200-300,000 hits/day, evaluating it alongside another analytics software to decide which we will use for the long run. We’re running archive.php from cron every hour, and for the most part it runs without error.

But 2 or 3 times a day archive.php will have an error like this:


Error: Got invalid response from API request: http://mydomain/index.php?module=API&method=VisitsSummary.getVisits&idSite=778&period=day&date=last2&format=php&token_auth=token&trigger=archivephp. Response was '<br /> <b>Fatal error</b>:  Exception thrown without a stack frame in <b>Unknown</b> on line <b>0</b><br />'
[2013-04-26 10:15:50] [17123131] 1 total errors during this script execution, please investigate and try and fix these errors

I have browser archiving turned off and reports processing set to 3600 in Settings. A few things I’ve noticed:

*archive.php sometimes has date=last2 and sometimes date=last52
*The error, when it happens, always happens for the first 1 or 2 sites, which in our case is 778 and 779.
*Looking at ‘ps’, I see sleeping archive.php processes from long ago which never finished/exited. For example:


root      2226  0.0  0.0   2516   884 ?        Ss   00:05   0:00 /bin/sh -c /usr/bin/php /var/www/html/piwik/misc/cron/archive.php --url=http://mydomain/ >> /var/log/piwik-archive.log
root      2230  0.0  0.3  32592 14688 ?        S    00:05   0:00 /usr/bin/php /var/www/html/piwik/misc/cron/archive.php --url=http://mydomain/
root     17524  0.0  0.0   2516   880 ?        Ss   Apr25   0:00 /bin/sh -c /usr/bin/php /var/www/html/piwik/misc/cron/archive.php --url=http://mydomain/ >> /var/log/piwik-archive.log
root     17525  0.0  0.3  32592 14688 ?        S    Apr25   0:00 /usr/bin/php /var/www/html/piwik/misc/cron/archive.php --url=http://mydomain/
root     19478  0.0  0.0   2516   884 ?        Ss   01:05   0:00 /bin/sh -c /usr/bin/php /var/www/html/piwik/misc/cron/archive.php --url=http://mydomain/ >> /var/log/piwik-archive.log
root     19483  0.0  0.3  32928 15044 ?        S    01:05   0:00 /usr/bin/php /var/www/html/piwik/misc/cron/archive.php --url=http://mydomain/
root     32245  0.0  0.0   2516   888 ?        Ss   Apr25   0:00 /bin/sh -c /usr/bin/php /var/www/html/piwik/misc/cron/archive.php --url=http://mydomain/ >> /var/log/piwik-archive.log
root     32246  0.0  0.3  32592 14780 ?        S    Apr25   0:00 /usr/bin/php /var/www/html/piwik/misc/cron/archive.php --url=http://mydomain/

Can anyone offer an idea about what may be happening here and why we get these intermittent errors? My first thought was that maybe the first site(s) sometimes takes a long time to archive. But the archive.php log shows that less than a second goes by:


[2013-04-26 10:09:03] [17123131] ---------------------------
[2013-04-26 10:09:03] [17123131] START[2013-04-26 10:09:03] [17123131] Starting Piwik reports archiving... 
[2013-04-26 10:09:03] [17123131] ERROR: Got invalid response from API request: http://mydomain/index.php?module=API&method=VisitsSummary.getVisits&idSite=778&period=day&date=last2&format=php&token_auth=token&trigger=archivephp. Response was '<br /> <b>Fatal error</b>:  Exception thrown without a stack frame in <b>Unknown</b> on line <b>0</b><br />'
[2013-04-26 10:09:03] [17123131] WARNING: Empty or invalid response '' for website id 778, Time elapsed: 0.123s, skipping
[2013-04-26 10:09:07] [17123131] Archived website id = 779, period = day, Time elapsed: 4.127s
[2013-04-26 10:09:10] [17123131] Archived website id = 779, period = week, 1860 visits, Time elapsed: 3.351s

So the mysteries are:

(1) Why the intermittent errors on the first site
(2) Why are some archive.php processes not exiting but sleeping forever
(3) Why is date sometimes last2 and sometimes last52
(4) Most importantly, are these errors and sleeping archive.php processes causing data to be lost for the affected sites?


(Matthieu Aubry) #2

To know what these errors are, check out your server error log file, it will contain the reason of the error. Piwik ideally should not issue errors so try to investigate.


#3

My server/apache logs don’t give any more information, and I have PHP verbose logging on. This is the reason I’m posting here. I’m hoping someone has some ideas about what could be happening. Are you familiar with the archive.php code, Matt? No idea what could be causing sleeping archive.php processes and errors for the first site id(s)?


(Matthieu Aubry) #4

Can you check that mod_security plugin is not enabled on your apache?

Appart from that not sure, but strange that the error is not logged in your apache/error.log, this could be misconfiguration.


#5

I just started tweaking PHP and MySQL until the archive script stopped giving those errors. So for anyone else with similar issues, here are my tweaks. We have 700+ sites, 300,000 hits/day, log table: 25 million, report tables: 7 million.

php.ini:
max_execution_time = 600
max_input_time = 120
memory_limit = 512M

my.cnf:
max_connections = 400


#6

Matt, do you have any idea why we are getting these sleeping archive.php processes? What would cause them to sleep forever? Is it when one takes longer than 1 hour and cron launches a new one? Would that cause the old one to sleep like this? Could this be avoided by archiving every 2 hours instead of 1 hour?


#7

Please see my last post here. We are still have issues with dozens of sleeping archive.php processes, and I believe it is causing a resource issue. At the top of every hour our Piwik server becomes unresponsive as the archive.php begins. This causing all of our sites using Piwik to load slowly.

In all of the Piwik documentation I only see mentioned using the archive.php script every hour. Would there be any issue with setting it to run every 2 hours, in order to give the script plenty of time to process all of the sites? Would I need to change the reports processing setting to something other than 3600? Or leave that and just modify the archive.php cron to every 2 hours?


(Matthieu Aubry) #8

No problem to run every 2 hours, it’s recommended if one hour is not enough. It’s common to process stats every 6 hours or 24 hours.


#9

Thanks! This helped a lot.