Optimize Auto-Archiving


#1

Hi guys,
we have a medium/high setup:
[ul]
[li] 20 sites (growing fast)
[/li][li] > 300k pageviews/day
[/li][li] 50k visits/day
[/li][li] 3 pre-processed segments
[/li][/ul]

We are importing web server logs (by the import_logs.py script) and trying to run the auto-archive each hour.
We have already implemented some tips pointed out in the “optimize for speed” guide (server resources, php/mysql tuning, piwik specific conf as disable year unique visitors, etc.), but still not achieving to let the archive ending in 1 hour (I will post a specific message on perf issue in another message).

We are looking for the possibility to optimize the (cron) archive phase. In particular, we have tried to use some the core:archive params, i.e., --force-timeout-for-periods and --force-date-last-n, to reduce the overall archiving time and to avoid to run useless tasks. Unfortunately, for the time being we are not able to achieve that goal.

We have some specific questions:
[ol]
[li] How to pre-process week/month/year periods at most every 24 hours? By configuring --force-timeout-for-periods=86400 and by using hourly cron import_logs.py, logs are invalidate by the script and statistics are computed for week/month/year each hour.
[/li][li] Does it make sense to run --force-date-last-n with n=1 every hour and with n=2 ONLY in the first cron run after midnight? … the idea is to avoid yesterday logs been processed each hour during the current day.
[/li][li] We tolerate not having updated week/month/year stats every hour hence we would like to schedule archiving in the following way
[/li][list=a]
[li] archive period=day and last=1 every hour
[/li][li] archive period=week and last=1 once a day
[/li][li] archive period=month and last=1 once a week
[/li][li] archive period=year and last=1 once a month
[/li][li] in order to properly archive the previous period statistics
[/li][list=i]
[li] first run after midnight, e.g., at 00:05, archive period=day and last=2
[/li][li] first run after end-of-week, e.g., on monday at 00:05, archive also period=week and last=2
[/li][li] first run after end-of-month, e.g., on 1st day of month at 00:05, archive also period=month and last=2
[/li][li] first run after end-of-year, e.g., on 1st jan at 00:05, archive also period=year and last=2
[/li][/ol]
[/list]
Is it possible to implement such customized auto-archiving?
[/list]

Thanks in advance for your feedbacks.
Regards
Ugo