High pagehit [50M+/month] Installation Redis Cache Bottleneck

fkaufmann · March 9, 2018, 6:52pm

Hi everybody,

we run a load balanced Matomo Installation with more than 50M Page hits per Month.

We have set up a separate Redis Server Instance for Tracking Request Queing as well as a Cache Backend as described in How to configure Redis as a shared cache in a load balanced environment? - Analytics Platform - Matomo

Just recently we noticed that the Network Link of the Redis Instance can become utilized with more than 100 MB/s outgoing data and ~ 2 MB Incoming data. The Redis Instance becomes a Bottleneck in the System.
If we move from a shared Redis Cache to a File based Cache Backend the Link saturation immediately dropps significantly

Is this normal that it needs to cache this much data and are there any suggestion to improve this setup ?
What are the Drawbacks to using file caches on every loadbalanced tracking request server against the recommendations of the faq ?
Would it help to chain in the file cache (array -> file -> redis-server) in the chained setup ?

Thank you very much in advance for any tipps.

thomas_matomo · March 10, 2018, 7:06pm

Yes that’s possible. You can chain the file caches, however, the problem is you need to synchronize the deletion of file caches across servers. When you have several servers, the file cache would be only deleted on one server but not another. I’d recommend to get in touch with www.innocraft.com where they possible have some features around that.

fkaufmann · March 12, 2018, 10:58am

Hi Thomas,

thanks for the quick reply. Can you expand on if and how often the file caches get deleted ? Is this only done manually (e.g. Cache Buster Plugin) or is this done by matomo itself regularily ?

Also what kind of information (Dashboard and Report as far as i know but is Tracking also impacted) is put in this cache and what functionality is impacted by having a separate cache per server ?

Thanks,
Florian

thomas_matomo · March 12, 2018, 6:19pm

Matomo itself deletes those caches regularly. There are many different caches with different TTLs. Some of them have a TTL of 5 min, some 1-4 hours etc. Also they get deleted based on certain actions in the UI and if files are then not deleted across servers, there will be some random problems occurring very likely. Yes tracking is also using this cache (especially tracking, important for fast tracking) plus the reporting parts.

fkaufmann · March 13, 2018, 1:33pm

Hi Thomas,

thank you for the detailed view into the caching layer. We will experiment a bit with chaining in the file cache. The caches should be eventual consisten since their ttl runs out at different points.

I will also reach out to innocraft if possible to see if they have additional recommendations.

toredash · March 28, 2019, 5:08pm

Hi @thomas_matomo and @fkaufmann

We are in the same situation as described above.

Our m5.large redis instance at AWS is at peaks transferring 500GB/hour, 10GB/minute, 170MB/second. The number of read-only (get, hget, scard, lrange, etc.) operations are at the same time ~9000 ops/sec

We run matomo in a HA, multi az setup where each of our components are split into the following components/containers which scale independently:
tracker api - 1-32 instances, receives all tracker api (matomo.js and matomo.php) requests and puts them into redis (queuedtracking plugin)
web ui - handles all non-tracker-api
process containers - 1 to 16 instances based on the current load (queuedtracking plugin)
archive container - 1, doing core:archive command every 1800s

Currently this is a killer cost, cross-AZ traffic is 40-50$ a day, and currently we only have one site where we test Matomo.

Quick fix for us is to run everything in one AZ to limit cross-AZ transfer cost, but we would like to know more about the mechanics around the redis cache, since this won’t scale for us long term.

@fkaufmann did you find a solution that was sufficient for your workload and requirements ?

@thomas_matomo is there anything around rediscache that can be improved ?

toredash · April 1, 2019, 7:53pm

Hi,

We have been able to pinpoint this to be an issue with QueuedTracking of some sort.

When we activate the plugin, we see an uneven distribution of network bytes in vs out on our Redis instance. There is 310MB/min of traffic going out of Redis, while there is only 10MB/min going in to Redis. During our peaks, the values are 7.82GB/min out vs 250MB/min in. Thats a 31x difference in out vs in traffic, doesn’t make sense.

We will continue to debug this issue and report back if we identify anything. Our initial research indicates a high number of GET operations from items managed by the RedisCache function in matomo core, just after a tracking request is LPUSHed to redis.

If the change the Cache=>backend type in Matomo Core to “file”, we don’t see this traffic pattern at all (as expected). But it is unclear to us why there needs to be so many GET operations from redis just after an LPUSH.

Our config for RedisCache is as such:
[Cache]
backend = chained

[ChainedCache]
backends[] = array
backends[] = redis
backends[] = file

siva538 · April 2, 2019, 8:01am

Tore, we had a similar challenge and we managed to get higher throughput with the help of 16 worker queues processing each 100 req. Refer to this thread for more details.

It also depends on where you are running the CRON jobs (queue workers) to process the queues. If you are having it in the same machine as the DB server, it would have an impact on the mysql processing speed, rather it is recommended to have a separate server only for the cron jobs processing.

Hope this helps.

toredash · April 2, 2019, 1:46pm

Hi,

Processing of the queue is not the problem, our issue is that Matomo is requsting 31x more data from redis when inserting a tracking request into redis. A tracking request of e.g. 10 KB turns into 10KB written to redis and 310KB read from redis, which IMO doesn’t make sense for a single tracking request written to the queue.

siva538 · April 3, 2019, 10:28am

Tore, can you also confirm if you have selected the option “Process during Tracking Request”, which could be one of the reasons of pulling more data during the time of data insert.

this option is generally not recommended when you have high traffic sites.

toredash · April 4, 2019, 6:30am

Hi Siva,

Yes, that option is set to 0. We use the EnvironmentVariable plugin and the value i set to:
MATOMO_QUEUEDTRACKING_PROCESSDURINGTRACKINGREQUEST=0

toredash · April 9, 2019, 6:52am

Hi all,

We have identified the issue to be a “by-design” feature of Matomo.

We currently do not use the RedisCache in matomo core, instead we use file-based cache. One of the resulting facts by using file cache is that changes to tracking might take +5 min to become active.

I made an issue with QueuedTracking about this (Bug / Issue: plugins fetches unneeded data from redis when doing LPUSH/LRANGE with RedisCache as Cache backend · Issue #100 · matomo-org/plugin-QueuedTracking · GitHub) where our issues were discussed. Happy to answer questions here if needed