Custom Reports, historical data invalidation and unlock of custom reports - 100 % server load?!

Hello,

as described at Error "Too many connections" in log viewer / Server CPU on 100 % for days I had (and have) some trouble with data invalidation and data-reprocessing.

Now the server had finished the reprocessing. For this I look at the matomo_archive_invalidations-table, also this information is visible in system check:

Total Invalidation Count: 114680

I was very surprised that this number goes from nearly zero to around 140.000. I don’t invalidated the historical reports as before but I created some new segments and also changend some existing custom reports (and unlocked the reports for this).

Is this really so much work for matomo/the server just by creating about 5 new segments and 5 new custom reports? Our server needs 3 till 5 days to handle this.

Is this normal or is the a problem/mistake I made? I do I missunderstand anything in this process?

Thanks for some help!

Best regards,

Timo

Hi @iparker
What are the configurations for:

[Debug]
; if set to 1, the archiving process will always be triggered, even if the archive has already been computed
; this is useful when making changes to the archiving code so we can force the archiving process
always_archive_data_period = 0;
always_archive_data_day = 0;
; Force archiving Custom date range (without re-archiving sub-periods used to process this date range)
always_archive_data_range = 0;

/.../

[General]
; When archiving segments for the first time, this determines the oldest date that will be archived.
; This option can be used to avoid archiving (for instance) the lastN years for every new segment.
; Valid option values include: "beginning_of_time" (start date of archiving will not be changed)
;                              "segment_last_edit_time" (start date of archiving will be the earliest last edit date found,
;                                                        if none is found, the created date is used)
;                              "segment_creation_time" (start date of archiving will be the creation date of the segment)
;                              editLastN where N is an integer (eg "editLast10" to archive for 10 days before the segment last edit date)
;                              lastN where N is an integer (eg "last10" to archive for 10 days before the segment creation date)
process_new_segments_from = "beginning_of_time"

:question:

Hello Philippe,

thanks for your reply!

The debug-options in configuration have all the value 0.

The value for “process_new_segments_from” is “beginning_of_time”.

Currently the system is again at the limit. I created 6 new custom reports and a new segment and this leads to more thank 200k “invalidation counts” which the system is handeling now very slowly…

Hope you can tell me and help me how to improve this!

Best regards,

Timo

Hi @iparker
Since when do you track data?
How many custom reports do you have in total?
On how many measurable is the new segment applied?

I don’t know. Where can I see this? For the site we need the custom reports/segments for we track since beginning of 2022. But the matomo instance is older. For another page we track the data for more year. I think since 2019.

65 segments (56 pre-processed, 9 processed in real-time)
38 goals
35 custom reports

How do mean this? Do you mean how many conditions are in the new segment? I think just one (channel type).

Most segments have the following setting:

HIS SEGMENT IS VISIBLE TO: ALL USERS
AND PROCESSED FOR THIS WEBSITE ONLY AND
SEGMENTED REPORTS ARE PRE-PROCESSED (FASTER, REQUIRES CRON)

OK: AND PROCESSED FOR THIS WEBSITE ONLY

15 months of tracking… means 2 years + 15 months + 63 weeks + 365 + 74 days, then 519 periods… :thinking: I think the answer is not there (I would need a 400 factor to reach 200k)…

On my side, when I add a new segment my Matomo doesn’t do the same as yours

@innocraft, any idea?

It’s a good time to put in a plug for the config – to keep both archiving and invalidating within some limits, to avoid surprises:

(If you have a common.config.ini.php then these settings may be applied there):

[General]
; Requests with a &segment= parameter will not trigger archiving.
; Ensures that no unexpected data processing triggers from UI or API.
browser_archiving_disabled_enforce = 1

; All new Segments created in the future will be set to:
; “Pre-processed (faster, requires cron core:archive command)”
enable_create_realtime_segments = 0

; By default we process a new segment’s reports from the
; beginning of time (“beginning_of_time”).
; When you have a lot of historical data, we recommend to
; process new segment’s reports from the segment’s creation time.
process_new_segments_from = “segment_creation_time”

; When processing the number of unique visitors across large datasets
; some performance issues may be experienced. In this case we would
; recommend to disable the Unique visitors metrics processing.
enable_processing_unique_visitors_day = 0
enable_processing_unique_visitors_week = 0
enable_processing_unique_visitors_month = 0
enable_processing_unique_visitors_year = 0
enable_processing_unique_visitors_range = 0

; Settings below ensure high performance archiving
; for Roll-ups and other sites
time_before_today_archive_considered_outdated = 10800
time_before_week_archive_considered_outdated = 43200
time_before_month_archive_considered_outdated = 43200
time_before_year_archive_considered_outdated = 64800
time_before_range_archive_considered_outdated = 43200

Thanks for your reply!

It seems that all my settings are different to you recommendations:

browser_archiving_disabled_enforce 0
enable_create_realtime_segments 1
process_new_segments_from beginning_of_time
enable_processing_unique_visitors_day 1
enable_processing_unique_visitors_week 1
enable_processing_unique_visitors_month 1
enable_processing_unique_visitors_year 0 (oh ;-))
enable_processing_unique_visitors_range 0
time_before_today_archive_considered_outdated 900
time_before_week_archive_considered_outdated -1
time_before_month_archive_considered_outdated -1
time_before_year_archive_considered_outdated -1
time_before_range_archive_considered_outdated -1

Next to this:
rearchive_reports_in_past_last_n_months 18

The only thing I want to mention is that I want the segment data since beginning of 2022 and not since “segment_creation_time”. Thanks why we set this to beginning.

Do you have any other ideas how to speed up the currently running processes? There are still a lot of processes to handle:

Total Invalidation Count 155079
In Progress Invalidation Count 8
Scheduled Invalidation Count 155071
Earliest invalidation ts_started 2023-03-17 23:49:07
Latest invalidation ts_started 2023-03-18 06:26:28
Earliest invalidation ts_invalidated 2023-03-12 14:05:01
Latest invalidation ts_invalidated 2023-03-17 22:05:26
Number of segment invalidations 152229
Number of plugin invalidations 154839

Thanks for your help!