Long running archives for new segments and custom reports

Treftz · November 20, 2024, 6:24pm

We have an on-premise hosted instance of Matomo v5.0.2. The application and archiving process are running on the same server that has 8 vCPU and 32GB memory. The database is running on AWS Aurora with the following engine 8.0.mysql_aurora.3.05.2. The reports are pre-processed with the core:archive cron, previously running once per hour but now set to only run once per day.

There are a total of 43 different sites that we are tracking. The one site in question has ~100k visits per day with ~90 segments. When a new segment is created we noticed that around 30k records are enqueued to the matomo_archive_invalidations table. This causes issues with running the core:archive command hourly as the process isn’t completed in time resulting in multiple invocations and failed processing of other valid new visit data.

We have since paused creating any new segments until this issue can be addressed. It takes 2-3 days to process the invalidation records for a new segment which is not manageable.

Is it the expected behavior for this amount of data to be enqueued into the archive_invalidations table? If yes, is there a more efficient way to process the records in a timely manner? If not, what steps can we take to determine the root cause and solution to this behavior?

metrprime · November 21, 2024, 5:22am

have a look here:

in this section:

Important Tips for Medium to High Traffic Websites

let us know if there’s anything from there that you re-applied or did not apply and it’s making a difference.

I think your issue is bcz Matomo processes segmentation and archives historical data for new segments.

By default, Matomo attempts to reprocess all historical data. To avoid this, set enable_processing_new_segments_from = lastX (where X is a manageable number of days) in your config.ini.php file. This will limit the historical data reprocessed for new segments.

Try that and let us know.

There’s a few other options other than this with core:archive but curious if that above will do the trick.

Treftz · November 22, 2024, 4:37pm

Thank you for the response! I’ll update the processing_new_segments_from config and report back on the results.

Treftz · December 2, 2024, 2:30pm

@metrprime It looks change has greatly reduced the number of records we see in the archive_invalidation table, thank you!

The same issue also occurs with custom reports. With a newly created custom report there are ~40k records sent to the archive_invalidation table. I’m not sure if the following settings we have in the config.ini.php file would be the cause:

[CustomReports]
datatable_archiving_maximum_rows_custom_reports = 500
datatable_archiving_maximum_rows_subtable_custom_reports = 500
custom_reports_validate_report_content_all_websites = 1
custom_reports_always_show_unique_visitors = 0
custom_reports_max_execution_time = 0
custom_reports_disabled_dimensions = “”

custom_reports_periods_force_aggregate_report_unique_metrics_evolution = “”

metrprime · December 3, 2024, 1:40pm

woohooo!

This is great news!

Check this doc on the other CustomReports lines:

I’d say try out a few changes on them.

I wonder if the issue is with the lower values you’ve setup 500 for instance.

I’d test doing a few tweaks to it, running the reports, testing again.

If you are using the Custom Reports paid feature, you can reach out to Matomo support too FYI.

skingth · February 12, 2025, 4:24am

@Treftz

Did you make any changes to the CustomReports section that improved those?