Run archive everyday : some questions?


#1

Hi,

We are setting Piwik on AWS (EC2 + RDS) to monitor 90 websites, for a total of 3.000.000 pages views each day.
Archiving is taking more than 4 ou 5 hours.

I set a cron job to run archive.php each night at 2:00. In the web interface, I leave the default value for “Reports for today (or any other Date Range including today) will be processed at most every” : 3600 s.

=> Do I have to set to 86400 (24 hours) for better performance ?
If my understading is correct, I should say no.

2/ Multiple archive.php
I read different threads and trac tickets (#4903) talking about using multiple instances of archive.php (when it actually launchs, the CPU never gets higher thant 30%).
I saw that we can specify which sites to archive with the option --force-idsites.

=> Can we create multiple cron jobs with different idsites and launch them simultaneously ?

Thanks for your help,

Regards,

Fred


(Mariano Fernández) #2

Hi,
your question:
Do I have to set to 86400 (24 hours) for better performance ?

if you set to 86400 (24 hours), the archive.php file process all data with (visit time < current time - 24hs).

I think that you can not get a better performance with that configuration.


(Matthieu Aubry) #3

Can we create multiple cron jobs with different idsites and launch them simultaneously ?

Yes see the work done in: Add possibility to run multiple archiver in parallel · Issue #4903 · matomo-org/matomo · GitHub

If you need pro help, contact: http://piwik.org/consulting/