Archiving Recommendations

We’ve been piwik users for about 2 years and we use it to track stats on a lot of different aspects of our sites. We probably handle about 50,000 page views a day over 4700 websites (some of these are deals/listings trackers). For the most part it is used internally, but some of our customers also grab data periodically, so we wanted to speed it up a little based on the recent high volume recommendations. The first step we took was to move over to the auto-archiving cronjob. We started this last week on 1.5 before upgrading and then ended up with some corrupt data. Our assumption was that our 2gb machine was not going to cut it, so we rolled over to an AWS EC2 medium machine with 3.75G. We then upgraded and ran the archive process again and it took over 30 hours. We wrote this off as catching up and then ran it again with success every hour that day. Our average run was about 4-5 minutes, but we also noticed data was missing. We monitored CPU usage during this time and were typically around 0-5% (this is a dedicated instance).

Following that night, the next morning we woke up to an extremely slow server in which the archive process took over 10 hours and used about 50-60% of the memory with spikes that were even higher. We ran mysqltuner and adjusted our settings and everything was back in action. We even moved to 30 minute intervals, which worked great, but then at night had the same issue where the whole system gets clogged.

Our thought is that at night it’s starting to archive weekly and monthly numbers due to a full day being completed. Is this the case? And if this is the case, is there a chance that our cronjobs are just building up on top of each other every 30 minutes? Our next step is to upgrade our RAM, but it just seems odd that most of the day when we have the most traffic we have no issues. Thanks for any recommendations!

Doug

Our thought is that at night it’s starting to archive weekly and monthly numbers due to a full day being completed. Is this the case?
Yes you can see the output of the ‘archive.php’ script which should show you that it was archiving week and month.

And if this is the case, is there a chance that our cronjobs are just building up on top of each other every 30 minutes?

Piwik archiving for today’s reports is not incremental: running the archiving several times per day will not lower the memory requirement for weeks, months or yearly archives. Piwik will read all logs for the full day to process a report for that day.

Thanks for any recommendations!
can you try to use 1.8.3 beta2 since we fixed a few memory usage bugs and it should use less memory hopefully? http://forum.piwik.org/read.php?2,91869

Thanks Matt! I switched my cronjob to run every 2 hours and it looks like the last two days it’s been working well over night. I’ll try to setup a demo environment today or tomorrow to test out the new version and see if there are any speed improvements.

Doug