Custom Reports, historical data invalidation and unlock of custom reports - 100 % server load?!

Hello,

as described at Error "Too many connections" in log viewer / Server CPU on 100 % for days I had (and have) some trouble with data invalidation and data-reprocessing.

Now the server had finished the reprocessing. For this I look at the matomo_archive_invalidations-table, also this information is visible in system check:

Total Invalidation Count: 114680

I was very surprised that this number goes from nearly zero to around 140.000. I don’t invalidated the historical reports as before but I created some new segments and also changend some existing custom reports (and unlocked the reports for this).

Is this really so much work for matomo/the server just by creating about 5 new segments and 5 new custom reports? Our server needs 3 till 5 days to handle this.

Is this normal or is the a problem/mistake I made? I do I missunderstand anything in this process?

Thanks for some help!

Best regards,

Timo

Hi @iparker
What are the configurations for:

[Debug]
; if set to 1, the archiving process will always be triggered, even if the archive has already been computed
; this is useful when making changes to the archiving code so we can force the archiving process
always_archive_data_period = 0;
always_archive_data_day = 0;
; Force archiving Custom date range (without re-archiving sub-periods used to process this date range)
always_archive_data_range = 0;

/.../

[General]
; When archiving segments for the first time, this determines the oldest date that will be archived.
; This option can be used to avoid archiving (for instance) the lastN years for every new segment.
; Valid option values include: "beginning_of_time" (start date of archiving will not be changed)
;                              "segment_last_edit_time" (start date of archiving will be the earliest last edit date found,
;                                                        if none is found, the created date is used)
;                              "segment_creation_time" (start date of archiving will be the creation date of the segment)
;                              editLastN where N is an integer (eg "editLast10" to archive for 10 days before the segment last edit date)
;                              lastN where N is an integer (eg "last10" to archive for 10 days before the segment creation date)
process_new_segments_from = "beginning_of_time"

:question:

Hello Philippe,

thanks for your reply!

The debug-options in configuration have all the value 0.

The value for “process_new_segments_from” is “beginning_of_time”.

Currently the system is again at the limit. I created 6 new custom reports and a new segment and this leads to more thank 200k “invalidation counts” which the system is handeling now very slowly…

Hope you can tell me and help me how to improve this!

Best regards,

Timo

Hi @iparker
Since when do you track data?
How many custom reports do you have in total?
On how many measurable is the new segment applied?

I don’t know. Where can I see this? For the site we need the custom reports/segments for we track since beginning of 2022. But the matomo instance is older. For another page we track the data for more year. I think since 2019.

65 segments (56 pre-processed, 9 processed in real-time)
38 goals
35 custom reports

How do mean this? Do you mean how many conditions are in the new segment? I think just one (channel type).

Most segments have the following setting:

HIS SEGMENT IS VISIBLE TO: ALL USERS
AND PROCESSED FOR THIS WEBSITE ONLY AND
SEGMENTED REPORTS ARE PRE-PROCESSED (FASTER, REQUIRES CRON)

OK: AND PROCESSED FOR THIS WEBSITE ONLY

15 months of tracking… means 2 years + 15 months + 63 weeks + 365 + 74 days, then 519 periods… :thinking: I think the answer is not there (I would need a 400 factor to reach 200k)…

On my side, when I add a new segment my Matomo doesn’t do the same as yours

@innocraft, any idea?

It’s a good time to put in a plug for the config – to keep both archiving and invalidating within some limits, to avoid surprises:

(If you have a common.config.ini.php then these settings may be applied there):

[General]
; Requests with a &segment= parameter will not trigger archiving.
; Ensures that no unexpected data processing triggers from UI or API.
browser_archiving_disabled_enforce = 1

; All new Segments created in the future will be set to:
; “Pre-processed (faster, requires cron core:archive command)”
enable_create_realtime_segments = 0

; By default we process a new segment’s reports from the
; beginning of time (“beginning_of_time”).
; When you have a lot of historical data, we recommend to
; process new segment’s reports from the segment’s creation time.
process_new_segments_from = “segment_creation_time”

; When processing the number of unique visitors across large datasets
; some performance issues may be experienced. In this case we would
; recommend to disable the Unique visitors metrics processing.
enable_processing_unique_visitors_day = 0
enable_processing_unique_visitors_week = 0
enable_processing_unique_visitors_month = 0
enable_processing_unique_visitors_year = 0
enable_processing_unique_visitors_range = 0

; Settings below ensure high performance archiving
; for Roll-ups and other sites
time_before_today_archive_considered_outdated = 10800
time_before_week_archive_considered_outdated = 43200
time_before_month_archive_considered_outdated = 43200
time_before_year_archive_considered_outdated = 64800
time_before_range_archive_considered_outdated = 43200

Thanks for your reply!

It seems that all my settings are different to you recommendations:

browser_archiving_disabled_enforce 0
enable_create_realtime_segments 1
process_new_segments_from beginning_of_time
enable_processing_unique_visitors_day 1
enable_processing_unique_visitors_week 1
enable_processing_unique_visitors_month 1
enable_processing_unique_visitors_year 0 (oh ;-))
enable_processing_unique_visitors_range 0
time_before_today_archive_considered_outdated 900
time_before_week_archive_considered_outdated -1
time_before_month_archive_considered_outdated -1
time_before_year_archive_considered_outdated -1
time_before_range_archive_considered_outdated -1

Next to this:
rearchive_reports_in_past_last_n_months 18

The only thing I want to mention is that I want the segment data since beginning of 2022 and not since “segment_creation_time”. Thanks why we set this to beginning.

Do you have any other ideas how to speed up the currently running processes? There are still a lot of processes to handle:

Total Invalidation Count 155079
In Progress Invalidation Count 8
Scheduled Invalidation Count 155071
Earliest invalidation ts_started 2023-03-17 23:49:07
Latest invalidation ts_started 2023-03-18 06:26:28
Earliest invalidation ts_invalidated 2023-03-12 14:05:01
Latest invalidation ts_invalidated 2023-03-17 22:05:26
Number of segment invalidations 152229
Number of plugin invalidations 154839

Thanks for your help!

Hi @iparker
The number of invalidation reduced a few (25%)
I think there is some multiplying factor somewhere that we missed.
Are ou the only user of Matomo? Are you sure there is only 65 segments in total? (if other users defined some, then the real number can be higher! :wink: )

Hi Philippe,

thanks for your reply.

Now the system has handled all data:

Total Invalidation Count 0
In Progress Invalidation Count 0
Scheduled Invalidation Count 0
Earliest invalidation ts_started
Latest invalidation ts_started
Earliest invalidation ts_invalidated
Latest invalidation ts_invalidated
Number of segment invalidations 0
Number of plugin invalidations 0

But I’m still unsafe if this happens again when I create a new segment, custom report or change a segment/custom report.

I think there is some multiplying factor somewhere that we missed.
Are ou the only user of Matomo? Are you sure there is only 65 segments in total? (if other users defined some, then the real number can be higher! :wink: )

Me too. Yes there are just 65 segments (or now: 67). The system summary says:

System Summary
11 users
67 segments
38 goals
35 custom reports
0 tracking failures
6 websites
55 activated plugins
7 containers (in tag manager)
Matomo version: 4.13.3
MySQL version: 8.0.18
PHP version: 7.3.13

Best regards,

Timo

Hi @iparker
:thinking:
I see:

67 segments
6 websites

Are these segments “shared” across websites? Then it could be the explanation: 67 segments * 6 websites = 402 * 519 periods = 208K invalidations to proceed…

Also I see some plugins. Do some of them “create” things to be invalidated?

Hi Philippe,

thanks for your reply. As I see the segments are defined for “this website only”. So I think the segments are not the reasoin.

I don’t know. How can I check this? I think the custom report plugin also triggers the invalidation.

Best regards,

Timo

Hi @MisterGenest, do you know if custom reports “generates” some data to invalidate (in case of data invalidation)? This could answer the question from @iparker

Hi @heurteph-ei ,

Yes, if the invalidation happens, it would process all the report when the archive runs next, however it is possible to limit the archive based on the plugin for example:

sudo -u apache -E bash -c "php /var/www/html/console core:archive --force-all-websites --force-date-last-n=1000 --matomo-domain=YOUR-DOMAIN --plugin=Funnels"

1 Like

Hello,

I’m currently having exactly the same problem again. On top of that, we’re now testing the funnel plugin, which obviously also requires data recalculation when a funnel changes.

The situation is the same as last time: there are over 200k invalidations to be recalculated/created. The process has been dragging on for several days now and I have had to restart the MySQL service twice because of the “too much connected” problem.

Yes, I know that we have a lot of data (segments, reports, etc.) - but I’m still surprised that the recalculation takes so extremely long.

System Summary:

7 users
96 segments
44 goals
42 custom reports
1 tracking error
6 sites
56 activated plugins
7 containers (in tag manager)
Matomo version: 4.14.2
MySQL version: 8.0.18
PHP version: 7.3.13

#### Total number of invalidations:  28115
#### Invalidation count in process: 134
#### Planned Number of Invalidations: 27981
#### Earliest invalidation ts_started: 2023-08-17 06:17:49
#### Last invalidation ts_started: 2023-08-18 06:18:28
#### Earliest invalidation ts_invalidated: 2023-08-13 15:06:23
#### Last invalidation ts_invalidated: 2023-08-18 06:05:41
#### Number of segment invalidations: 27780
#### Number of plugin invalidations: 27761
#### List of plugins to be invalidated: Funnel, CustomReports

It would be great to get help for an improvement here. This behavior makes it very difficult for us to use Matomo effectively.

Best regards

Timo

Hi @iparker ,
Can you please get in touch with our support team at shop@matomo.org

Thanks for your reply. I just wrote to the support-team.