We’ve been piwik users for about 2 years and we use it to track stats on a lot of different aspects of our sites. We probably handle about 50,000 page views a day over 4700 websites (some of these are deals/listings trackers). For the most part it is used internally, but some of our customers also grab data periodically, so we wanted to speed it up a little based on the recent high volume recommendations. The first step we took was to move over to the auto-archiving cronjob. We started this last week on 1.5 before upgrading and then ended up with some corrupt data. Our assumption was that our 2gb machine was not going to cut it, so we rolled over to an AWS EC2 medium machine with 3.75G. We then upgraded and ran the archive process again and it took over 30 hours. We wrote this off as catching up and then ran it again with success every hour that day. Our average run was about 4-5 minutes, but we also noticed data was missing. We monitored CPU usage during this time and were typically around 0-5% (this is a dedicated instance).
Following that night, the next morning we woke up to an extremely slow server in which the archive process took over 10 hours and used about 50-60% of the memory with spikes that were even higher. We ran mysqltuner and adjusted our settings and everything was back in action. We even moved to 30 minute intervals, which worked great, but then at night had the same issue where the whole system gets clogged.
Our thought is that at night it’s starting to archive weekly and monthly numbers due to a full day being completed. Is this the case? And if this is the case, is there a chance that our cronjobs are just building up on top of each other every 30 minutes? Our next step is to upgrade our RAM, but it just seems odd that most of the day when we have the most traffic we have no issues. Thanks for any recommendations!