On the weekend we just upgraded from Piwik 2.8.3 to 2.10.0 (we had the upgrade tested and planned before 2.11.0 came out). I’m not sure if it’s a coincidence or not, but this week we have started to see errors during certain times when the auto-archiving cron runs. The cron is scheduled to run once per hour. During most of the day the archiving process takes under 20 minutes. However, around 7 PM Eastern (see related forum thread on scheduled tasks and server time: 301 Moved Permanently) everything starts to fall apart. The cron is taking a little over two hours to run, which seems to cause a cascading effect of issues that takes 5-6 hours to recover from. During these executions it is very common to see “invalid response from API request” messages due to “Lock wait timeout exceeded”. My assumption here is that when the cron fires for the next hour (8 PM), if the previous hour (7 PM) isn’t finished there are competing locks for the archive tables in the database.
As a temporary measure I have reduced the frequency of the cron to only run every two hours from 7 PM to 1 AM. Given that the 7 PM job seems to be taking a little over two hours to run I’m not sure that this will be enough. As per my other forum thread, it is at 7 PM when Piwik runs its daily scheduled tasks (which I would think should be at midnight).
To give a little more information, we have 196 added to Piwik, About 60 sites are actively receiving traffic. Over the last three days we are receiving about 91,000 visits and 690,000 pageviews per day.
I have created a spreadsheet to show how long the archiving process has been taking over the last few days. Unfortunately I do not have any metrics from before this, but we have never encountered any errors before this week.
https://docs.google.com/spreadsheets/d/1MmPbUmaueDrcml21jAmW6mEYS3E8hWYISZur2-FlnVY/edit?usp=sharing