Performance Queue Processing - Cron jobs

siva538 · March 3, 2019, 3:29am

Hello Matomo team/@matthieu/@thomas_matomo/@Lukas/@TylerD & community,

We are currently facing issues with the scalability/performance of Cron jobs.

We have setup 16 tracking queues in Redis, and 16 cron jobs to process the contents from them.

Threshold no. Of requests where processing starts - 10

16 Cron Jobs are configured to run for every 2 mins, in crontab.

So, for 5010 (500 iterations * 10) requests, it is taking only 16 sec, IF there is only one queue getting processed . But for 16 queues, it is taking around 4 mins (since they are run in parallel).

We have also implemented the Opcache in PHP for the Cron jobs command line (enable_cli=1), even then it is not helping with improved performance.

Is there anything else we should be doing to address this?

In case if Redis doesn’t scale, only option we see is to setup twemproxy.

Thanks in advance for your help.

Regds,
Sivakumar

thomas_matomo · March 3, 2019, 7:40pm

I understand that it is a lot faster to run only one queue compared to 16 queues?

Hard to say why this is. Does your server have for example 16 cores/CPUs? If not, I would likely lower the amount. I would likely also increase the number to be processed at once from 10 to 100 or so. Make sure in general that you have a fast server.

Also make sure to register each cronjob with a different queue-id like :process-queued-requests... queue-id=0 and another cronjob with --queue-id=1 and many more up to one with queue-id=15. This should generally improve speed as well.

In general it shouldn’t slow done like this crazy with 16 queues. I can only think that you have a too small server for 16 queues.

siva538 · March 12, 2019, 7:24am

Thanks much Thomas. We were able to get 16 Queues running with 100 requests per batch. Issue was that all the visits are treated to be same because of the performance tool that is generating the requests and the start of each request is not having new_visit flag set to 1. Because of which all the requests are hitting the same table Matomo_log_visit and causing blockings in the DB server.

The servers are properly sized, however from the Redis perspective, we have implemented the configuration of compressing the lists depth to 100 which suits the tracking Queues, that made server memory smooth in the Redis side.

Is this something you will be able to include in the documentation, which would help others as well?

Thanks again for all your help and swift response.

Regds,
Sivakumar

thomas_matomo · March 12, 2019, 7:47pm

because of the performance tool that is generating the requests

so each request had like the same idvisitor or was each request actually different? if all requests are the same, or from same IP ad idivisitor etc then it might put all the requests in the same queue.

Because of which all the requests are hitting the same table Matomo_log_visit and causing blockings in the DB server.

This sounds like the DB might be not too fast or maybe MyISAM DB engine is used? It shouldn’t really be a problem. We’re often running it here with 16 workers and DB is still fine with that and there isn’t much blocking happening.

siva538 · March 27, 2019, 8:48am

Hi Thomas. All those requests are having different visitor ID. However Matomo treats all of them from the same IP as the same “visit” which was causing the issue with being a single entry in Matomo_log_visit table and is a point of contention for same record in it.

We are using innodb itself. However the new visit is considered for each iteration from the perf. load tool, it started creating new entries in matomo_log_visit and reduced the contentions.

Thanks again for all the help.