I think it might be a good idea to allow users to schedule different statistics archiving tasks at different times. This might be especially important for sites with a lot of traffic.
For instance:
- Monthly statistics every month at the 31st around 04:00 GMT
- Weekly statistics every Sunday around 03:30 GMT
- Daily statistics every day at 03:00 GMT
This would distribute resource usage over various days and times. (Of course the times/days/dates are just for illustration.)
All that is really needed to make this happen is making 3 different archive.sh files:
(I’m not sure whether there are any more subdivisions which could be made.)
Let me know what you think
; shought
Edit: also (it’s possible that this is already available) it might be a good idea to allow users to set a custom ‘cut-off’ date for the statistics. Small likes might want 3 years of statistics to be available whereas big sites may rather export anything they deem of use every 3 months and then remove all the stats after that period in order to maintain a (relatively) small database.
I know some scripting so I thought I’d give it a shot myself, and it worked (Was easy, luckily.)
All you need to do to modify the script is:
- Search for this line: ‘for period in day week month year; do’ (without the quotes).
- Remove year, month, week or day or any combination of these (as you see fit).\
- Add them in another file with exactly the same contents, but that line.
So if you remove month and year it will only run the daily and weekly reports archiving every day, or every hour (depending on your preference, you can still define the cronjob yourself).
Then you would copy the whole file and make another file (archive-monthly.sh for instance) and replace ‘day week’ with ‘month year’. You could schedule this file to run every month or every week (depending on your preference).
(Advanced: the ‘scheduled tasks’ only need to run with the daily archiving, so in principle you can remove them from weekly, monthly and yearly, but it won’t hurt anything if you leave them.)
You can find the contents of the files that have been split here:
(These scripts have an updated description so they will report ‘Archiving weekly’ or ‘Archiving daily’ instead of just ‘Archiving’, to allow you to differentiate between them when looking at the output.)
This would (1) limit the memory usage and (2) limit the impact on your site whilst archiving.
Are there any arguments against doing this?
I’ve done this for a couple of weeks now (7500 unique visitors, 36000 actions a day) and I haven’t noticed any memory issues.
These numbers aren’t huge obviously, so I’d love to see what happens when this is employed at a larger site.