Load Balanced Best Practices - Archiving

I am moving our matomo system from a single web server/single DB to a multi-webserver/single DB environment in the Azure public cloud. I’ve read through FAQs and forums, but I still do not have a idea of how to handle the Archiver process in this configuration.
Which setup is generally best and are there cons to any of these?:

  1. Each Web Server clone (running matomo) has an Archiver task (or webjob) running? Cons?
  2. Only one Web Server clone (running matomo) has an Archiver task (or webjob) running? Cons?
  3. A separate Web Server clone (not running matomo but has matomo files) runs the Archiver task (or webjob) running? Cons?
  4. Other setup?

Thanks for your help!

Ideally in a separate server I would imagine but i couldn’t seem to get that setup working properly.

I have 3 web servers load balanced and i share the archiving between them. Server 1 archives at 3pm, Server 2 archives at 4pm, Server 3 archives at 5pm then its back to Server 1 at 6pm.

Seems to work for me

Generally the archiver doesn’t benefit hugely from being scaled horizontally. This is primarily due to the fact that one of the primary limitations on archiver speed is the database backend. Adding additional archiver instances will increase the load on the database, potentially to the point that it starts to negatively impact all archivers. Since you’ll also primarily be tracking a high volume of traffic for a limited number of sites, it’s recommended to not horizontally scale the archiver as having multiple archivers on different hosts means that you could have a case where multiple archivers might try to process the same report at the same time, causing locks on hangups on the database itself. When archiver processes are spun up on a single host the archiver is able to manager processes via the CLI to ensure that this doesn’t happen and that the different archiver processes are able to communicate and work together to process all reports.

Generally if you do wish to scale horizontally for archiving, you’d need to at least keep archiving for a specific ID Site limited to a single instance. (This can be done simply with the core:archive crontab by specifying the option for --force-idsites but would mean that if you ever add additional sites that the crontabs need to be modified for each additional site)

1 Like