Very huge archive_blobs in database

Hello fellow matomo fans,

In advance I want to apologize in case this is a duplicate question. I couldn’t find any answers to my questions via search.

I’ve got the problem, that the archive_blobs for some months are getting absolutely huge. There where no major changes to the amount of visitors tracked in these months. I attach a screenshot of the database size.
Screenshot 2020-04-07 at 10.23.31

I have tried ./console core:purge-old-archive-data all without any improvement result.

I also tried the following SQL query:
SELECT idsite, date1, date2, period, name, COUNT(*) as count FROM piwik_archive_blob_2020_02 GROUP BY idsite, date1, date2, period, name HAVING count > 1;
And got many duplicate entries. Am I right that this query should return zilch?

thanks in advance,
a very desperate db-admin

1 Like

Hello,

If you are not using the latest version of Matomo, please upgrade.

if you’re using the latest version, it may be normal, and the size should not keen increasing. overall the DB size should only slowly grow. There will always be a few tables that have a lot more data (the previous month, current month, and january table of the current year, ending with 2020_01 ). So nothing to worry about it. If you find the size keeps growing a few Gb each month let us know…

Thank you very much for your reply!

I am running Version 3.13.4 and the integrated system check gives me “green”.

The server log shows that something changed a week ago.
It was exactly at the beginning of April.
Screenshot 2020-04-08 at 12.17.00

Can you confirm that SELECT idsite, date1, date2, period, name, COUNT(*) as count FROM piwik_archive_blob_2020_02 GROUP BY idsite, date1, date2, period, name HAVING count > 1; shouldn’t return any results? That would maybe be a path I could follow.

thank you very much in advance,
best regards,
manuel

edit: everytime i rund the the archive command the database even grows in size. somehow there seems to be a problem with the deletion of lines.

Hi,

same problem with database size here.
I have updated Matomo from 3.13.2 to 3.13.4 on Mar 24 2020 and the database has doubled in size in 2 weeks.

Screenshot of my monitoring graph for the mariadb matomo database:

This setup runs since 2015 (installed as piwik) and the normal database growing rate is 1 gb/year, not 0.5 gb/day.

I had a similar increase in size last year with Matomo 3.12.0, but this was fixed when upgrading to 3.13.0, shrinking the database from 16 gb to 6 gb.

Regards,
Alexander

@alexander: thanks for your contribution! my setup was also updated from a piwik instance. mine started in 2013

The january blob has now gained another gigabyte and is now up to 8.1 gb.
the years before where always below 0.5gb.

I am not sure if I am even on the right path. But the mentioned sql statement SELECT idsite, date1, date2, period, name, COUNT(*) as count FROM piwik_archive_blob_2019_10 GROUP BY idsite, date1, date2, period, name HAVING count > 1; does return next to no results before October 2019. All months after return results way above 100.000 entries. The blob of January has 500.000 duplicate entries. Some exist as often as 1.080 (!) times.

Am I right, that i should be able to delete all duplicate entries without losing any data?

Edit: Looks like the duplicate entries are really duplicates as the values match as well. As I couldn’t find any relations-table they are probably purgeable. I just can’t push myself to delete the rows yet, as I am tracking for more than 500 clients. But once the server runs full I will try a manual delete. fingers crossed! :slight_smile:

Screenshot 2020-04-09 at 10.11.03

Hi, before you begin deleting things, please wait for a bit if possible. I am looking into this with @diosmosis as I can see something similar on my Matomo instance.

1 Like

This fix should solve the issue:

(but as always, better make a backup before)

1 Like

Thank you very much for your help and your contribution to the community!

I just swapped the file and executed ./console core:purge-old-archive-data all again.

The database size somewhat decreased but is far from normal. The line count even increased for some tables. The number of duplicate entries rose even further and some entries are duplicated as often as 1.152 times.

Is there any other command I can run to execute the purging more effectively?

@alexander: did the solution work for you?

Thanks to everyone!

edit: after crawling the code a little bit I noticed that the changed code isn’t even executed in my case. In the piwik_option table the following option is empty. is there an option to force run the code for alle dates/pages?

Screenshot 2020-04-10 at 13.19.52

@manuzoli1,

Indeed, the same is happening for me, so we are investigating how the issue can be solved completely.

1 Like

Thank you very much! If I can provide any information or test data from my install, let me know.

Hi @manuzoli1 we’ve updated the pull request, it should clear up your archive tables (@Lukas did a quick test on his instance and it appears to work). Can you check if it works for you? I would recommend backing up your databases first and trying it on a single table first.

Thank you very much! You are awesome!
My databased shrinked from 37 to 20 gigabytes.

The last four months are still somewhat bigger than the months before, but january alone dropped from 7 to less than 1 gigabyte.

Thanks!!!

Screenshot 2020-04-14 at 11.30.56

Thanks also from me! The patch seems to work, i have cloned my production vm for testing. Database size was close to 20 G, now 6.2 G:
matomo_purged
January was 6.5 G before purging.

Is it safe to replace ArchivePurge.php and Model.php in the production system? Or better waiting for 3.13.5?

It also takes some time for purging:
Purging invalidated archives for 2020_04…Done. [Time elapsed: 512.815s]
Purging invalidated archives for 2020_03…Done. [Time elapsed: 314.431s]
Purging invalidated archives for 2020_01…Done. [Time elapsed: 707.861s]

Regards,
Alexander

Hi there,

our Database is also very huge now and has caused our Matomo Server to not working anymore. We had to upgrade Database space but its now getting full again at a unnormal speed.
Can you recommend to use these archivepurge.php and model.php files for productive systems?

Best regards

Christoph

Hi,

The change above has been merged and is available in the latest beta2 or if you wait a few weeks in the next Matomo update. But as always: Make a backup!

It worked for me. I made a backup, but as far as I could test it, everything works as expected.

The size of April and January increased again by some gigabytes but executing the purge command again brings it back to normal.

2 Likes

Thank you for the bug report. We found that this bug was likely already fixed in our latest beta version.
-> Could you please upgrade to the latest beta version of Matomo to benefit from many fixes and security improvements? You can find out how to easily upgrade to the beta here: https://matomo.org/faq/how-to-update/faq_159/

Please try using the beta and if you still experience the issue let us know.

Thanks,

1 Like

I think you should upgrade your version it will resolved your issue.

Thank you very much. Everything is working fine now!

Hi there,

i have a table matomo_archive_blob_2020_01 which is huge (20G) and the duplicates are not getting removed when i use

./console core:purge-old-archive-data all

in the Matomo version 3.14.0
Any Idea what I should do next?

Thanks!

1 Like