Is it possible to archive single sites?


#1

Hi,

I’m using Piwik to analyze 300+ sites at the moment (most of them low traffic, but about 5 with medium to high traffic).

I disabled the default archiving and use the cronjob instead.

So far so good… but because of the great number of websites and the database load while archiving I only start the cronjob every 6 hours. The script takes about 20 minutes to archive all the sites.

For most of the sites that frequency is okay, but for some e-commerce sites it might be too long.

My question:
Is it possible to archive single sites without destroying any collected data or does the archiving process only work if all the sites at once are processed? Could it for example be done with a modified version of the cronjob script? I’m thinking of running the archive script for all sites every 6 hours and on top of that start a modified version in smaller intervals that uses a “idsite” argument to just archive a single site.

Something like:

archive all 300+sites every 6 hours

0 */6 * * * /path_to_piwik/misc/cron/archive.sh > /dev/null

archive site 123 more often because we need the latest data

20,30,40,50 * * * * /path_to_piwik/misc/cron/archive.sh 123 > /dev/null

Modifiying the script to take a site-Id as an argument (or if none given use the API to get all sites, like it is done at the moment) seems not to be the problem, but I’m not sure if that would mess with archiving process itself because it probably has to process all the sites at once or some data won’t be archived at all.

I did not make a feature request out of this yet because it might be easier than I imagine or impossible at all to archive single sites.

Thanks in advance for your replies,
Ruediger


(Matthieu Aubry) #2

Ruediger, you will be interested to hear that we have worked on a new archive.php script, replacing archive.sh, that is much more optimized. You need to use “trunk” and see this ticket for more info: New optimized archive.php script for faster and optimized archiving when hundreds/thousands of websites · Issue #2327 · matomo-org/matomo · GitHub


#3

Thanks, Matt!

Increasing the performance of the whole process by remembering the last state is indeed a much better solution than just make single sites addressable.

Looking forward to 1.6 :wink:

Big Thanks to the devs for all your effort…

Best Regards,
Ruediger