Overload and invalidation issues after creating CustomReports

Hello,

I would like to share an important difficulty which has just happened to me and which has kept me busy for 2 days.

I have a Matomo server, with 55 sites, a few segments (around sixty, only two of which concern all the sites), and a certain number of plugins, therefore CustomReports.
Until now CustomReports was little used.
Archiving is processed by a cron, every hour.

Friday November 24, I created 4 CustomReports (not complicated), valid on all sites.
The following cron launched a certain number of processes and calculations which saturated the MySQL and PHP. Over the weekend, these services had to be restarted several times. Every hour, the cron restarted and restarted all these colossal calculations (I only understood this later).
Monday, I tried to understand what was happening…
By researching, I understood that the CustomReports + segments combination may generate a large server load.
I first tried modifying the cron to make it skip the segments (–skip-all-segments), but that didn’t change anything, so I suspended the cron on Monday evening (November 27). If this has not changed anything it is because each cron launched at each previous hour had launched calculations in the background which were not completed.
Tuesday (November 28), I wanted to restart the archiving processing (core:archive) going step by step. So I started with a single site (of a small volume) by invalidating the reports (core:invalidate-report-data) over the period of the last 5 days; then by launching archiving (core:archive) for this site only over the period of the last 5 days, with the verbose option at --vvv.
The archive processing took hours.
The log files contained thousands of lines that looked quite similar. For these lines, I did not find a relevant explanation in the Matomo doc, nor on the internet.

Here are some of the strange lines I had by the thousands:

DEBUG [2023-11-28 08:32:00] 2651139  Found archive with intersecting period with others in concurrent batch, skipping until next batch: [idinvalidation = 22494007, idsite = 1, period = day(2023-11-23 - 2023-11-23), name = donee5a38b4afa7a61a09cee0443b5c75de6.CustomReports, segment = countryCode==be]
DEBUG [2023-11-28 08:32:00] 2651139  Found archive with intersecting period with others in concurrent batch, skipping until next batch: [idinvalidation = 22485693, idsite = 1, period = day(2023-11-23 - 2023-11-23), name = donee5a38b4afa7a61a09cee0443b5c75de6.CustomReports, segment = countryCode==be]
DEBUG [2023-11-28 08:42:40] 2652630  Found duplicate invalidated archive , ignoring: [idinvalidation = 23764570, idsite = 1, period = day(2023-11-23 - 2023-11-23), name = donee5a38b4afa7a61a09cee0443b5c75de6.CustomReports, segment = countryCode==be]
DEBUG [2023-11-28 08:42:41] 2652630  Found duplicate invalidated archive , ignoring: [idinvalidation = 23744037, idsite = 1, period = day(2023-11-23 - 2023-11-23), name = done54c2107b87d963c1a0b72c1fed325364.CustomReports, segment = countryName!=France]
DEBUG [2023-11-28 09:24:52] 2657838  Skipping invalidated archive 10414831, archive date range (2023-11-23,2023-11-23) is not within --force-date-range: [idinvalidation = 10414831, idsite = 1, period = day(2023-11-23 - 2023-11-23), name = donee5a38b4afa7a61a09cee0443b5c75de6.CustomReports, segment = countryCode==be]
DEBUG [2023-11-28 09:24:53] 2657838  Skipping invalidated archive 10400439, archive date range (2023-11-23,2023-11-23) is not within --force-date-range: [idinvalidation = 10400439, idsite = 1, period = day(2023-11-23 - 2023-11-23), name = done54c2107b87d963c1a0b72c1fed325364.CustomReports, segment = countryName!=France]
DEBUG [2023-11-28 09:36:07] 2657838  Skipping invalidated archive 23456177, archive date range (2023-11-21,2023-11-21) is not within --force-date-range: [idinvalidation = 23456177, idsite = 1, period = day(2023-11-21 - 2023-11-21), name = done54c2107b87d963c1a0b72c1fed325364.CustomReports, segment = countryName!=France]
DEBUG [2023-11-28 09:36:08] 2657838  Could not find stored segment for done flag hash: done3e1c5dbf5d5b746122bbb3c860b15763
DEBUG [2023-11-28 09:36:08] 2657838  Found invalidation for segment that does not have auto archiving enabled, skipping: 23447926
DEBUG [2023-11-28 09:36:09] 2657838  Skipping invalidated archive 23427393, archive date range (2023-11-21,2023-11-21) is not within --force-date-range: [idinvalidation = 23427393, idsite = 1, period = day(2023-11-21 - 2023-11-21), name = donee5a38b4afa7a61a09cee0443b5c75de6.CustomReports, segment = countryCode==be]
DEBUG [2023-11-28 09:36:33] 2657838  Skipping invalidated archive 22960645, archive date range (2023-11-21,2023-11-21) is not within --force-date-range: [idinvalidation = 22960645, idsite = 1, period = day(2023-11-21 - 2023-11-21), name = done54c2107b87d963c1a0b72c1fed325364.CustomReports, segment = countryName!=France]
DEBUG [2023-11-28 09:36:34] 2657838  Could not find stored segment for done flag hash: done5a6b717f70711c229ec9e929535144f4
DEBUG [2023-11-28 09:36:34] 2657838  Found invalidation for segment that does not have auto archiving enabled, skipping: 22938065
DEBUG [2023-11-28 09:36:35] 2657838  Skipping invalidated archive 22931861, archive date range (2023-11-21,2023-11-21) is not within --force-date-range: [idinvalidation = 22931861, idsite = 1, period = day(2023-11-21 - 2023-11-21), name = donee5a38b4afa7a61a09cee0443b5c75de6.CustomReports, segment = countryCode==be]
DEBUG [2023-11-28 09:36:35] 2657838  Skipping invalidated archive 22909281, archive date range (2023-11-21,2023-11-21) is not within --force-date-range: [idinvalidation = 22909281, idsite = 1, period = day(2023-11-21 - 2023-11-21), name = done54c2107b87d963c1a0b72c1fed325364.CustomReports, segment = countryName!=France]
DEBUG [2023-11-28 09:36:36] 2657838  Could not find stored segment for done flag hash: done3e1c5dbf5d5b746122bbb3c860b15763
DEBUG [2023-11-28 09:36:36] 2657838  Found invalidation for segment that does not have auto archiving enabled, skipping: 22901030
DEBUG [2023-11-28 09:36:37] 2657838  Skipping invalidated archive 22880497, archive date range (2023-11-21,2023-11-21) is not within --force-date-range: [idinvalidation = 22880497, idsite = 1, period = day(2023-11-21 - 2023-11-21), name = donee5a38b4afa7a61a09cee0443b5c75de6.CustomReports, segment = countryCode==be]
DEBUG [2023-11-28 09:36:38] 2657838  Could not find stored segment for done flag hash: donec37ef65bd54dede51753d10512e60ad2
DEBUG [2023-11-28 09:36:38] 2657838  Found invalidation for segment that does not have auto archiving enabled, skipping: 22874293
DEBUG [2023-11-28 09:36:38] 2657838  Skipping invalidated archive 22849603, archive date range (2023-11-21,2023-11-21) is not within --force-date-range: [idinvalidation = 22849603, idsite = 1, period = day(2023-11-21 - 2023-11-21), name = done54c2107b87d963c1a0b72c1fed325364.CustomReports, segment = countryName!=France]

I ended up stopping the process because it was taking way too long.
I started again several times, trying different things like --skip-all-segments, etc.
But each time the processing took hours, generated thousands of lines of logs, etc.

While wondering about the fact that the reports are indeed invalidated (via core:invalidate-report-data), I wanted to try the method which consists of deleting certain tables matomo_archive_*
So that led me to go look at the database!
There, I discovered a table containing more than 14,500,000 rows: the ‘matomo_archive_invalidations’ table!

I then realized that something was seriously wrong.
There were approximately 280,000 rows for each site. And all the lines had the ‘name’ column set to the same value ‘done.CustomReports’!

e.g.

COUNT(idsite) idsite
285605 4
327504 6
275912 30

So I looked for information on this ‘matomo_archive_invalidations’ table. This brought me to this page Archiving is very slow - possibly relating to UsersFlow plugin? which enlightened me a little but above all reassured me in my idea of clearing this table.
I started by deleting the lines with idsite = 1
then I restarted the invalidation of reports (core:invalidate-report-data) for this site only over the period of the last 5 days. After the invalidation, the ‘matomo_archive_invalidations’ table contained only 27 rows with idsite = 1
Then I launched archiving (core:archive) for this site only over the period of the last 5 days, with the verbose option at --vvv. It went very well!
So I did this again for a few sites and I’m now going to completely empty this table in order to put my cron back in place.

I think there was a problem during the first processing (core:archive) after creating the CustomReports. This resulted in these millions of rows being created in the ‘matomo_archive_invalidations’ table.
Probably repeating the cron (hourly) added to the problem.

I hope I don’t have any problems like this, and above all that this testimony can be useful to others.

Hi @CPWeb
Thanks for your experience return… :slight_smile:

In config file, maybe some parameters could also help in the invalidation process:

  • custom_reports_disabled_dimensions
  • process_new_segments_from
  • rearchive_reports_in_past_exclude_segments
  • rearchive_reports_in_past_last_n_months