We have 5 million visitors on our app and website and we are using Matomo On Premise. How do we make sure that we retain data for long term like 1-2 years and we do not get errors due to that?
What are the best practices for data retensions and everything?
I can’t speak for the absolute best practice but here is one way it can be handled.
Firstly, if you haven’t seen. There is a setting in Privacy → Anonymize data → " Regularly delete old raw data" Where you can set the maximum number of days which matomo will keep logs for.
If you don’t need to access the data going back very long, you could just maintain regular backups of the database and restore it as needed.
If you do need to access data and performance becomes an issue you could create your own export job to export the data to a separate and larger database while matomo purges data from the primary database. With that larger database you can stand up a separate instance of matomo that is just used for analysis or use your own query/dashboard tools to run analytics on the data. I’ve had a better experience running clickhouse as a database for matomo (for analysis, not primary db) than with Mariadb.
Database growth/bloat seems to be a big issue if you are trying to retain all of the data. In the future I plan to archive data from older years (via the log_visit, log_link_visit_action tables) But I’m not sure how best to handle purging old log_actions records for actions that are no longer needed.
Hopefully this helps give you an idea of some options.