In advance I want to apologize in case this is a duplicate question. I couldn’t find any answers to my questions via search.
I’ve got the problem, that the archive_blobs for some months are getting absolutely huge. There where no major changes to the amount of visitors tracked in these months. I attach a screenshot of the database size.
I have tried ./console core:purge-old-archive-data all without any improvement result.
I also tried the following SQL query: SELECT idsite, date1, date2, period, name, COUNT(*) as count FROM piwik_archive_blob_2020_02 GROUP BY idsite, date1, date2, period, name HAVING count > 1;
And got many duplicate entries. Am I right that this query should return zilch?
If you are not using the latest version of Matomo, please upgrade.
if you’re using the latest version, it may be normal, and the size should not keen increasing. overall the DB size should only slowly grow. There will always be a few tables that have a lot more data (the previous month, current month, and january table of the current year, ending with 2020_01 ). So nothing to worry about it. If you find the size keeps growing a few Gb each month let us know…
I am running Version 3.13.4 and the integrated system check gives me “green”.
The server log shows that something changed a week ago.
It was exactly at the beginning of April.
Can you confirm that SELECT idsite, date1, date2, period, name, COUNT(*) as count FROM piwik_archive_blob_2020_02 GROUP BY idsite, date1, date2, period, name HAVING count > 1; shouldn’t return any results? That would maybe be a path I could follow.
thank you very much in advance,
best regards,
manuel
edit: everytime i rund the the archive command the database even grows in size. somehow there seems to be a problem with the deletion of lines.
@alexander: thanks for your contribution! my setup was also updated from a piwik instance. mine started in 2013
The january blob has now gained another gigabyte and is now up to 8.1 gb.
the years before where always below 0.5gb.
I am not sure if I am even on the right path. But the mentioned sql statement SELECT idsite, date1, date2, period, name, COUNT(*) as count FROM piwik_archive_blob_2019_10 GROUP BY idsite, date1, date2, period, name HAVING count > 1; does return next to no results before October 2019. All months after return results way above 100.000 entries. The blob of January has 500.000 duplicate entries. Some exist as often as 1.080 (!) times.
Am I right, that i should be able to delete all duplicate entries without losing any data?
Edit: Looks like the duplicate entries are really duplicates as the values match as well. As I couldn’t find any relations-table they are probably purgeable. I just can’t push myself to delete the rows yet, as I am tracking for more than 500 clients. But once the server runs full I will try a manual delete. fingers crossed!
Hi, before you begin deleting things, please wait for a bit if possible. I am looking into this with @diosmosis as I can see something similar on my Matomo instance.
Thank you very much for your help and your contribution to the community!
I just swapped the file and executed ./console core:purge-old-archive-data all again.
The database size somewhat decreased but is far from normal. The line count even increased for some tables. The number of duplicate entries rose even further and some entries are duplicated as often as 1.152 times.
Is there any other command I can run to execute the purging more effectively?
edit: after crawling the code a little bit I noticed that the changed code isn’t even executed in my case. In the piwik_option table the following option is empty. is there an option to force run the code for alle dates/pages?
Hi @manuzoli1 we’ve updated the pull request, it should clear up your archive tables (@Lukas did a quick test on his instance and it appears to work). Can you check if it works for you? I would recommend backing up your databases first and trying it on a single table first.
Thanks also from me! The patch seems to work, i have cloned my production vm for testing. Database size was close to 20 G, now 6.2 G:
January was 6.5 G before purging.
Is it safe to replace ArchivePurge.php and Model.php in the production system? Or better waiting for 3.13.5?
It also takes some time for purging:
Purging invalidated archives for 2020_04…Done. [Time elapsed: 512.815s]
Purging invalidated archives for 2020_03…Done. [Time elapsed: 314.431s]
Purging invalidated archives for 2020_01…Done. [Time elapsed: 707.861s]
our Database is also very huge now and has caused our Matomo Server to not working anymore. We had to upgrade Database space but its now getting full again at a unnormal speed.
Can you recommend to use these archivepurge.php and model.php files for productive systems?
The change above has been merged and is available in the latest beta2 or if you wait a few weeks in the next Matomo update. But as always: Make a backup!
Thank you for the bug report. We found that this bug was likely already fixed in our latest beta version.
-> Could you please upgrade to the latest beta version of Matomo to benefit from many fixes and security improvements? You can find out how to easily upgrade to the beta here: https://matomo.org/faq/how-to-update/faq_159/
Please try using the beta and if you still experience the issue let us know.