This procedure is the result of much consternation. I’m posting here in the hopes it might help others. If your Matomo installation is more that a few dozen GB, this may provide some insight into issues particular to very large installations.
Caveat: This is our internal procedure and has some nuance related to our particular setup, don’t follow these steps blindly.
Performing an analytics update can be quite complicated and must be approached with care to avoid data loss. The general steps are:
- Notify Customers
- Choose a schedule for performing the update. Allow for multiple days without access to analytics.
- Warn users
- In any customer-facing notifications be sure to allow plenty of padding.
- Pre-Preparation (system still available)
- Create a folder under /var/www/html/upgrade, and download the latest matomo or target version into that folder. unzip matomo.zip
- Back-up the existing configuration under /var/www/html/config/config.ini.php
- Ensure that the time window for the upgrade is smaller than the Web server’s log retention period. E.g. if you have 10 days of log file retention, and the upgrade takes 11 days, you will lose one day of data.
- Ensure MySQL’s innodb_buffer_pool_size is slightly lower than the total amount of memory available to the system.
- Run console core:purge-old-archive-data all
- Ensure you have twice the disk space free as currently in use. E.g. 100GB used requires at least 200GB of total disk space for the upgrade.
- Ensure a swap file is available on the server, at least twice the available memory. 16GB
- Ensure unprocessed log data has been properly purged, php /var/www/html/console core:delete-logs-data --dates=2021-01-01,2021-12-31 --limit 1000. This step may take multiple days depending on the date range and data.
- Preparation 1 - System Unavailable
- Enable Maintenance Mode in the config/config.ini.php. (maintenance_mode=1), this will disable the UI but tracking will continue
- (Not appliable to the general public)
- Disable the setting “Process during tracking request” in the Tracking Queue settings in the Matomo Admin, logged in as root. This will stack up requests in the Tracking Queue.
- Optimize all tables to reduce space consumption via mysqlcheck -o analytics -uroot. This should free up disk space but will put in multiple,long term, multi-day table locks
- Enable the setting “Process during tracking request” in the Tracking Queue settings in the Matomo Admin, logged in as root. Run /var/www/html/console queuedtracking:monitor to watch all pending tracking logs to be processed.
- Disable the Tracking Queue plugin. !!At this stage apache access logs are responsible recording tracking data.!!
- With the above preparations in place, perform the software update:
1. sudo su - apache-user
2. cd /var/www/html/upgrade
3. cp -R matomo/* …/
4. cd /var/www/html
5. Run the “date” command and note the time in UTC as “Start Time”
6. nohup console core:update --yes
- Allow the update to run, and re-start if any core-dumps
- Enable the Tracking Queue plugin.
- Enable the setting “Process during tracking request” in the Tracking Queue settings
- Run the “date” command and note the time in UTC as “End Time”
- Obtain the apache log files for the date/time range marked by “Start Time” and “End Time”. This may involve parsing multiple log files. In our recent update, it involved 1 full log file archive (e.g. access_logs.1.gz) and 2 partials, a time range from the start of access_logs to the “End Time” and the acess_logs.2.gz from the “Start time” to the end of the file). Pipe these through grep and redirect the output into a single file that is chronologically ordered. E.g. cat saturday.log + sunday.log > gapfile.log
- Import the gap file with python3 misc/log-analytics/import_logs.py --replay-tracking --enable-http-errors --url=https://analytics.XXXXXXXXXXXX.com:443 tmp/logs/archive/gapfile.log
1. Note, you must manually revert this code change in import_logs.py : Log file importer Error ‘utf-8’ codec can’t decode byte 0x80 in position 10: invalid start byte
- Invalidate the date range using full days as ./console core:invalidate-report-data --dates=2021-09-13,2021-09-15 (optionally limit by a test ID site --sites=3329)
- Re-process the archives with ./console core:archive --force-date-range=2021-09-13,2021-09-15 --url=https://analytics.XXXXXXXXXXXX.com (optionally limit by a test ID site --force-idsites=3329)
- You may need to check the contents of piwik_archive_invalidations and possibly truncate it. If Matomo has automatically flagged a large number of sites for re-archiving through the last 6 months, the following archive task will never complete and will run super slow.
- Do not trust commands like “database:optimize-archive-tables”, any time a MySQL comparable direct command is available use the native one.