Tracking 3M page views daily - core:archive running slower and slower

Hi!

Background:
I’m using Matomo to track visits on a rather large website with traffic at ~3 000 000 page views/day. We’ve set up Matomo to run in a kubernetes cluster - having the web interface (GUI + matomo.php) being served by 6 pods backed by Redis (2 vms) and MariaDB (3 vms). We’re using the QueuedTracking plugin and are doing all the processing work in the background in separate pods (using kubernetes cron job scheduling).

During load tests we found that, in our environment, the optimal number of pods running “console queuedtracking:process” at the same time was 16 - so we’re going with that number and have also configured Matomo to cache the requests in 16 separate Redis queues. With this setup, we were able to process ~300 req/s at the most - which is well above above out expected traffic.

It works great!

The problem:
However - now after a few months I’ve started to notice that the background job “console core:archive” (which is scheduled to run once an our at the most) - takes longer and longer to finish. We reached the point where it took > 1 hour to finish some time ago - and this night it kept on working for 9 hours before completion.

What can we do to make it run faster? Should we enable “Purge old visitors logs” in the privacy settings - will that make a difference? Is there any other way?

Thanks and regards,
Oliver

INFO [2021-05-26 21:35:28] 75  Start processing archives for site 16.
INFO [2021-05-26 21:35:28] 75    Will invalidate archived reports for today in site ID = 16's timezone (2021-05-26 00:00:00).
INFO [2021-05-26 21:35:28] 75    Will invalidate archived reports for yesterday in site ID = 16's timezone (2021-05-25 00:00:00).
INFO [2021-05-26 21:53:31] 75  Archived website id 16, period = day, date = 2021-05-26, segment = '', 872064 visits found. Time elapsed: 1082.298s
INFO [2021-05-26 22:13:16] 75  Archived website id 16, period = day, date = 2021-05-26, segment = 'referrerType==campaign;referrerName==**************************', 0 visits found. Time elapsed: 8.346s
INFO [2021-05-26 22:13:16] 75  Archived website id 16, period = day, date = 2021-05-26, segment = 'pageUrl=^https%3A%2F%2Fwww.**********.com%2F', 872213 visits found. Time elapsed: 1184.519s
INFO [2021-05-26 22:13:16] 75  Archived website id 16, period = day, date = 2021-05-26, segment = 'pageUrl=^https%3A%2F%2Fstage.**********.com%2F', 246 visits found. Time elapsed: 1184.519s
INFO [2021-05-26 22:14:54] 75  Archived website id 16, period = day, date = 2021-05-26, segment = 'pageUrl=^https%3A%2F%2Ftest.**********.com%2F', 19 visits found. Time elapsed: 98.058s
INFO [2021-05-26 22:34:35] 75  Archived website id 16, period = week, date = 2021-05-24, segment = '', 2811228 visits found. Time elapsed: 1180.267s
INFO [2021-05-26 22:56:05] 75  Archived website id 16, period = week, date = 2021-05-24, segment = 'referrerType==campaign;referrerName==**************************', 1 visits found. Time elapsed: 63.683s
INFO [2021-05-26 22:56:05] 75  Archived website id 16, period = week, date = 2021-05-24, segment = 'pageUrl=^https%3A%2F%2Fwww.**********.com%2F', 2796346 visits found. Time elapsed: 1289.994s
INFO [2021-05-26 22:56:05] 75  Archived website id 16, period = week, date = 2021-05-24, segment = 'pageUrl=^https%3A%2F%2Fstage.**********.com%2F', 746 visits found. Time elapsed: 1289.994s
INFO [2021-05-26 22:59:53] 75  Archived website id 16, period = week, date = 2021-05-24, segment = 'pageUrl=^https%3A%2F%2Ftest.**********.com%2F', 61 visits found. Time elapsed: 227.766s
INFO [2021-05-26 23:26:21] 75  Archived website id 16, period = month, date = 2021-05-01, segment = '', 25263506 visits found. Time elapsed: 1587.416s
INFO [2021-05-27 01:25:38] 75  Archived website id 16, period = month, date = 2021-05-01, segment = 'referrerType==campaign;referrerName==**************************', 24 visits found. Time elapsed: 531.461s
INFO [2021-05-27 01:25:38] 75  Archived website id 16, period = month, date = 2021-05-01, segment = 'pageUrl=^https%3A%2F%2Fwww.**********.com%2F', 25122531 visits found. Time elapsed: 7157.170s
INFO [2021-05-27 01:25:38] 75  Archived website id 16, period = month, date = 2021-05-01, segment = 'pageUrl=^https%3A%2F%2Fstage.**********.com%2F', 7232 visits found. Time elapsed: 7157.170s
INFO [2021-05-27 02:39:21] 75  Archived website id 16, period = month, date = 2021-05-01, segment = 'pageUrl=^https%3A%2F%2Ftest.**********.com%2F', 256 visits found. Time elapsed: 4422.927s
INFO [2021-05-27 03:04:17] 75  Archived website id 16, period = year, date = 2021-01-01, segment = '', 53455555 visits found. Time elapsed: 1495.757s
INFO [2021-05-27 05:04:52] 75  Archived website id 16, period = year, date = 2021-01-01, segment = 'referrerType==campaign;referrerName==**************************', 24 visits found. Time elapsed: 526.708s
INFO [2021-05-27 05:04:52] 75  Archived website id 16, period = year, date = 2021-01-01, segment = 'pageUrl=^https%3A%2F%2Fwww.**********.com%2F', 53221728 visits found. Time elapsed: 7234.892s
INFO [2021-05-27 05:04:52] 75  Archived website id 16, period = year, date = 2021-01-01, segment = 'pageUrl=^https%3A%2F%2Fstage.**********.com%2F', 12882 visits found. Time elapsed: 7234.892s
INFO [2021-05-27 06:22:02] 75  Archived website id 16, period = year, date = 2021-01-01, segment = 'pageUrl=^https%3A%2F%2Ftest.**********.com%2F', 864 visits found. Time elapsed: 4629.136s
INFO [2021-05-27 06:22:02] 75  Finished archiving for site 16, 20 API requests, Time elapsed: 31593.848s [5 / 8 done]

There’s a lot that can be done.

  1. Don’t use segments too much, especially don’t segment for URLs. These segments mostly don’t do what you think they do (no filter for this URL, but all Users that visited this URL (amongst others) in their session).
    For every segment added, you add more than 100% of work / complexity of the archiving base job, because all dimension and metrics have to be calculated again for each segment.
  2. Split Archiving jobs by responsibilities - one job only doing base archiving, the other one processing it with segments, another one doing scheduled tasks, etc. See https://matomo.org/docs/setup-auto-archiving/ for the parameters
1 Like