Pre-processed segmented reports cause very high CPU usage and crash Matomo

CaiL · June 14, 2023, 10:59am

We self host Matomo using an Azure VM.
So far we are unable to create a single segmented report that is not real time without causing CPU usage to ramp up to 100% and Matomo to freeze then crash shortly after.

We then have to restart the VM and delete the reports as quickly as possible before it starts the process over again and causes another crash.

Aphotrax · June 15, 2023, 10:30am

Never had that issue, could you share some more information like vm specs you have and number of visits/actions you are tracking?

CaiL · June 15, 2023, 10:50am

Morning, Our current VM size is set to 2V cores with 4GiB RAM, 8GiB of temp storage and 1280 max IOPS. It runs on Linux Debian 11.
The current matomo version is 4.14.2.
At the moment we track just over 100K a month and have only had it up and running for a couple of months and slowly increasing the VM size to increase performance.

When the issue occurs the logs show a lot of connection refused and the occasional : MySQL server has gone away errors

We can consistently make this happen by creating a pre-processed segment

Aphotrax · June 19, 2023, 12:08pm

And what do you find it your MySQL logs, my guess is that the specs are ok. But maybe the configuration or connection limitiations on the DB VM are causing the conflict when the pre-processing happens.

CaiL · June 21, 2023, 8:20am

Morning,
Where can i find the MySQL specific logs?
I have logging to file enabled in matomo using the plugin to view logs via the backoffice.

Can i see the settings in question under systems check?

Aphotrax · June 26, 2023, 11:53am

In general they are located here: /var/log/
But to be clear this would not be related to Matomo but to your Mysql configuration.

For the other mysql settings they could be in the config file that’s mostly located here: /etc/mysql/
Could also be /etc/ or /var/lib/mysql/

CaiL · June 28, 2023, 9:13am

On your advice that the configuration or connection limits may need looking it i followed this article here:

We can now successfully move segments over to pre-processed and matomo doesn’t crash. However i have been doing them slowly, one at a time and waiting for the CPU usage to drop before starting the next one.

I have 5 moved over now but have noticed that the hourly CPU spike for archiving reports has gotten higher and hits 55% when it used to hit around 5% when all segments were realtime. I’m slightly concerned that as more segments are added this spike will start hitting 100% and may cause issues

CaiL · June 28, 2023, 11:58am

Unfortunately it crashed after adding the 7th segment. It ramped up to 100% cpu usage stayed there for about 20 minutes then became unresponsive and went down.

Aphotrax · July 10, 2023, 1:33pm

I’d say increase the values even further, but also make sure your server is powerful enough.
If you are running both Matomo and MySQL on the same server they obviously share the resources.
If you are in the category of “high traffic website with millions of views” I highly doubt your 2 cores and 4GB of ram are sufficient.

How do you call the archiver cron and with what frequency? Make sure that you don’t call multiple archivers at once if the server is not powerfull enough. Best is to setup a cronjob doing that and disable archive reports from browser.

melbao · July 10, 2023, 3:39pm

My experience with Matomo Archiving is that the interval of the cronjob has to fit the tracking traffic of the website. So, rarely the entire traffic of a website is tracked, because bots and co. are excluded. In addition, further parameters can exclude further “viewers”. However, additional settings (events, goals) can generate additional tracking traffic. So it depends on the amount of data to be archived.

The interval is OK if the archiving logs are always about the same size and the cronjobs do not overlap. This may be the case when archiving is started for the first time (big size logs, cronjob overlap). In addition, archiving should/must be disabled by browser triggering. Only when the interval is OK, then the CPU load must be observed. If the CPU load for archiving is very high, then Matomo should run on its own server or even only archiving should run on its own server. In very special cases archiving could also be split on several servers.

The retrieval of data for displaying statistics is insofar independent of archiving, if archiving does not have too high a CPU load.

So:

First: Adjust cronjob interval.
Second: Monitor CPU load.

Example for to the 5. Minute in every hour: 14:05, 15:05, 16:05, …

5 * * * * php /usr/www/users/USERNAME/matomo.example.com/console core:archive --url=https://matomo.example.com/

* * * * *    
┬ ┬ ┬ ┬ ┬
│ │ │ │ └ day in week (0-7) (0 or 7 = Sunday)
│ │ │ └── month (1-12)
│ │ └──── day in month (1-31)
│ └────── hour (0-23)
└──────── minute (0-59)

/ → every 5 minutes: */5 * * * *
, → 0,15,30,45-th minute in every hour: 0,15,30,45 * * * *
- → sometime in the 15-30-th minute in every hour: 15-30 * * * *

@hourly → at every full hour, like: 0 * * * *    
@daily → at every midnight, like: 0 0 * * *
@weekly → once in a week, like 0 0 * * 0
@monthly → once in a month, like  0 0 1 * *
@yearly → once in a year, like 0 0 1 1 *
@annually → same as @yearly
@reboot → after every server reboot

With Logs saving:
create directory: /usr/home/USERNAME/matomo-cronlogs/

5 * * * * php /usr/www/users/USERNAME/matomo.example.com/console core:archive --url=https://matomo.example.com/ > /usr/home/USERNAME/matomo-cronlogs/matomo-archive-$(date +"\%Y\%m\%d\%H\%M\%S").log

Quite a bit accumulates there. So check and empty the folder regularly.

CaiL · July 21, 2023, 1:31pm

Afternoon,
I don’t know too much about the configuration as we used the bitnami VM package on azure but i can assume that MySql is running using the same resources as no separate azure database has been created.

We don’t currently hit 1 Million but i do estimate that we will eventually so i would put us in the High traffic area. What Minimum specs should we have to accommodate that? As it stands we are currently using the B2s Azure VM size which is General purpose.
The next size up with 4Vcores is B4ms which costs 5 times as much