Matomo has higher Hardware requirements and is still crashing multiple times daily

tmcmm · April 16, 2020, 2:51pm

Hello community we are experiencing alot of unavailability multiple times in daily use.
We have about ~20 million pageviews/month.

According to the website recommendations:
Tracking 100 million page views per month or less
Three servers at minimum recommended:
3 x App servers (or only 2x), with each: 16 CPUs, 16+ GB RAM, 100GB SSD disk.
1 x Database server, at least 16 CPUs, 32 GB RAM, 1 TB SSD disk.
optionally 2 x DB servers: second one replicated and configured as reader/slave

We have 3 virtual app servers for serving the PHP:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
Memory : 64GB RAM
HDD : cat /sys/block/sda/queue/rotational
1

And 1 database MariaDB (physical host):
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8

[mysql@host apps]$ free -h
total used free shared buff/cache available
Mem: 62G 60G 363M 14M 2.3G 2.0G
Swap: 15G 6.4G 9.6G
HDD (280GB)

process hits 90% memory very often:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8641 mysql 20 0 70.8g 56.8g 7768 S 500.0 90.6 54641:47 mysqld

on php logs we are getting:
[16-Apr-2020 10:41:53 UTC] PHP Fatal error: Uncaught PDOException: SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction in /apps/matomo/matomo/libs/Zend/Db/Statement/Pdo.php:228

on mariadb logs:
2020-04-08 12:30:23 3654387 [Warning] Aborted connection 3654387 to db: ‘piwik’ user: ‘matomo’ host: ‘host.domain.com’ (Got an error reading communication packets)

We are currently out of memory to increase on database and still we are very low on recommendation, so are we doing anything wrong in configurations?

our cnf:
innodb_flush_method=O_DIRECT
innodb_file_per_table=ON
innodb_flush_log_at_trx_commit = 2
innodb_log_file_size = 6G
innodb_log_group_home_dir = /TEMPDB
innodb_buffer_pool_size=51G
innodb_buffer_pool_instances=51
innodb_print_all_deadlocks=1
query_cache_type = 0
query_cache_limit = 5M
query_cache_size = 0
join_buffer_size = 1024K
tmp_table_size=32M
max_heap_table_size=32M
performance_schema=ON
max_allowed_packet=268435456
max_connections = 400
wait_timeout = 28800
interactive_timeout = 14400
table_open_cache = 1000

our php config:
pm = ondemand
pm.process_idle_timeout = 20s;
pm.max_children = 50
pm.start_servers = 16
pm.min_spare_servers = 8
pm.max_spare_servers = 16
pm.max_requests = 50
pm.process_idle_timeout = 5
pm.status_path = /status
php_admin_flag[log_errors] = on
php_admin_value[memory_limit] = 512M

perl script recommendations:
-------- Recommendations ---------------------------------------------------------------------------
General recommendations:
Control warning line(s) into /var/log/mariadb/mariadb.err file
Control error line(s) into /var/log/mariadb/mariadb.err file
Reduce your overall MySQL memory footprint for system stability
Dedicate this server to your database for highest performance.
Configure your accounts with ip or subnets only, then update your configuration with skip-name-resolve=1
We will suggest raising the ‘join_buffer_size’ until JOINs not using indexes are found.
See MySQL :: MySQL Internals Manual :: 10.5 How MySQL Uses the Join Buffer Cache
(specially the conclusions at the bottom of the page).
Increase table_open_cache gradually to avoid file descriptor limits
Read this before increasing table_open_cache over 64: https://bit.ly/2Fulv7r
Read this before increasing for MariaDB https://mariadb.com/kb/en/library/optimizing-table_open_cache/
This is MyISAM only table_cache scalability problem, InnoDB not affected.
See more details here: MySQL Bugs: #49177: MyISAM share list scalability problems
This bug already fixed in MySQL 5.7.9 and newer MySQL versions.
Beware that open_files_limit (2463) variable
should be greater than table_open_cache (1000)
Consider installing Sys schema from GitHub - mysql/mysql-sys: The MySQL sys schema for MySQL
Consider installing Sys schema from GitHub - FromDual/mariadb-sys: The MariaDB sys schema for MariaDB
Variables to adjust:
*** MySQL’s maximum memory usage is dangerously high ***

Add RAM before increasing MySQL buffer variables *
join_buffer_size (> 1.0M, or always use indexes with JOINs)
table_open_cache (> 1000)
innodb_buffer_pool_size (>= 101.2G) if possible.

versions:
Matomo version: 3.13.4
MySQL version: 10.4.10-MariaDB-log
PHP version: 7.2.10

Can you guys provide me with a hint?
Very much aprettiate

siva538 · April 17, 2020, 11:34am

Hello,

What seems to be missing is the “OpCache” enablement for the PHP. Did you do that by any chance?

the general recommendations for the performance tuning are available - How to configure Matomo for speed User Guide - Analytics Platform - Matomo

And we have used that to scale Matomo premised installation. We track close to 90 million page views per month.

Hope this helps.

tmcmm · April 17, 2020, 2:25pm

I have already done those configurations and it is still showing unavailability status very often.
We have 3 hosts for php that are in a LB, how do you have your architecture Siva?
Can you post your configurations please?

Regarding php-cache:

PHP 7 come with this PHP cache included by default. How do I know if I am caching?

siva538 · April 17, 2020, 3:55pm

Hello tmcmm,

This is what we have used to check the opcache status - GitHub - rlerdorf/opcache-status: A one-page opcache status page and that helped effectively handle the php problems.

would recommend you to run a performance test in-house with and without this enabled. You should be doing this change in the php.ini file and that should help assess the effectiveness.

Regds,
Sivakumar

tmcmm · April 23, 2020, 9:49am

After implementing those recommendations we are still getting unavailability downtimes.
PHP log errors display locks on the mariadb database:

23-Apr-2020 09:19:16 UTC] PHP Fatal error: Uncaught PDOException: SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction in /apps/matomo/matomo/libs/Zend/Db/Statement/Pdo.php:228
Stack trace:
#0 /apps/matomo/matomo/libs/Zend/Db/Statement/Pdo.php(228): PDOStatement->execute(Array)

Do you have any other recommendation on sql side?
Thank you

peterbo · April 24, 2020, 12:57am

This error is not directly connected to the hardware/software configuration but rather to the heavy INSERT/UPDATE/SELECT workload of the database that can lead to deadlocks.

Here’s the discussion concerning the issue: Deadlock found when trying to get lock · Issue #15545 · matomo-org/matomo · GitHub