Long running archives for new segments and custom reports

We have an on-premise hosted instance of Matomo v5.0.2. The application and archiving process are running on the same server that has 8 vCPU and 32GB memory. The database is running on AWS Aurora with the following engine 8.0.mysql_aurora.3.05.2. The reports are pre-processed with the core:archive cron, previously running once per hour but now set to only run once per day.

There are a total of 43 different sites that we are tracking. The one site in question has ~100k visits per day with ~90 segments. When a new segment is created we noticed that around 30k records are enqueued to the matomo_archive_invalidations table. This causes issues with running the core:archive command hourly as the process isn’t completed in time resulting in multiple invocations and failed processing of other valid new visit data.

We have since paused creating any new segments until this issue can be addressed. It takes 2-3 days to process the invalidation records for a new segment which is not manageable.

Is it the expected behavior for this amount of data to be enqueued into the archive_invalidations table? If yes, is there a more efficient way to process the records in a timely manner? If not, what steps can we take to determine the root cause and solution to this behavior?

have a look here:

in this section:

Important Tips for Medium to High Traffic Websites

let us know if there’s anything from there that you re-applied or did not apply and it’s making a difference.

I think your issue is bcz Matomo processes segmentation and archives historical data for new segments.

By default, Matomo attempts to reprocess all historical data. To avoid this, set enable_processing_new_segments_from = lastX (where X is a manageable number of days) in your config.ini.php file. This will limit the historical data reprocessed for new segments.

Try that and let us know.

There’s a few other options other than this with core:archive but curious if that above will do the trick.

1 Like

Thank you for the response! I’ll update the processing_new_segments_from config and report back on the results.