If you run archive.sh with a lot of empty sites, it takes 200ms per request on average. When archiving 1000 empty sites, for day/week/month/year periods, for N segments, that is already: 800 * N seconds.
The problem is that then it takes a long time to reach these websites that have traffic, when most sites don’t have traffic.
I am not sure what the best solution is, but some ideas are:
profile the code and make an empty site archiving request faster (most of the time is spent in PHP, not SQL, so there is probably optimization there)
could remember last time the archive.sh ran till the end, then run it the next time replacing "last52" with "last2" for example Could run it multithreading, triggering archiving for multiple sites on each core #2563 could run, first, the websites that have traffic (requires modification in the SitesManager API or a new API to return sites "in order of importance") we could run archiving only for websites that received some data since the last archiving run when there are segments to pre-process (see [Segments] in config file for more info): we could only process the list of segments, if there are some visits when for the request without segment (otherwise we know in advance there is no data for the segments) could archive first, sites that have been queried via the API recently (add a new "set flag" in the API Proxy to say "this site data was requested")