Matomo + MariaDB Galera Cluster Cross Colo (East/West Coast), Op Timeout/DeadLock

When finally running under load, in an East-West Coast USA MariaDB Galera Cluster, I get pretty regularly -

[Warning] WSREP: Failed to report last committed 66266a7c-f62f-11ec-86bf-bfd29572294b:33450999, -60 (Operation timed out)

(which from MariaDB/Galera … is ok, as this is just a warning, it is suppose to be retried, and then must success, as doesn’t log an error. )

The config is

  • Baremetal/“internal” cloud (Xen XCP-NG), own colos.
    ** Cogent Transit, 10Gbps, over IPSEC VPN via Juniper SRX
    ** ping over IPSEC ~ 70 ms
  • MariaDB (10.6.10) with galera Cluster 26.4.12
    ** five nodes, two WEST, three EAST,
    ** Using HaProxy to distribute load, but by connections, so 90%+ DB connections go to node1.
  • FreeBSD 13-stable (as a jail) on bare metal
  • Web servers (6x, three each coast)
    ** FreeBSD 13-stable
    ** running under Xen (XCP-NG)

At some point, some process will block or lock or be long running, then ultimately

[Warning] Aborted connection 0 to db: 'unconnected' user: 'unauthenticated' host: 'connecting host' (Too many connections)

And the entire cluster locks up.

The work around, right now, that seems to be running for about 7-8 days without issue, is that I redirected all the traffic to the three EAST Coast web servers, and thus all inserts happening on the EAST Coast DB Nodes. This is obviously not a solution.

I suspect this is a lock or block in …
console core:archive

As I write this, I had the core:archive running in both colos, thus, different DB nodes, every 5 minutes. It is possible that some of these jobs ran longer than 5 minutes, and core:archive may not have locking, and should ensure that only one core:archive process is running at a time.

I’m looking at also possibly moving to plugin-QueuedTracking - though would love this to be RabbitMQ based, with active pub/sub instead of polling/batch.

Is anyone else using Matomo + MariaDB Galera Cluster across medium to high latency connections? Any tips/hints or share your config/setup ?