Archive Blob tables suddenly got very big, out of relation to traffic increases

We are running a large installation of Piwik. Tracking approaching 700 sites, relatively low traffic individually.

We have archive script running successfully, until recently we’ve had no problems wit this, and all running smoothly. In the last few months we’ve noticed the Archive_blob report tables have become very large, despite no corresponding increase in traffic.

  • As an example, in May we tracked

Total: 570779 visits, 2747116 pageviews
resulting in the following archive table with total size 1.3G
archive_blob_2013_05 1.1 G 159 M 1155463 1.3 G

  • In June thus far (as at 26th june) we have tracked

Total: 450520 visits, 2057853 pageviews
resulting in the following archive table with total size 7G
archive_blob_2013_06 6.3 G 751.9 M 6882319 7 G

January this year was an even more extreme anomoly in the same respect, we tracked…
Total: 202093 visits, 776893 pageviews
resulting in a vast archive table taking up 19.2GB
archive_blob_2013_01 17.2 G 2.1 G 22422061 19.2 G

How can 776,893 page views result in an archive of 19.2 GB in January, but in May nearly 4 times the traffic 2747116, results in an archive of only approx 5% of the size.

Anyone have any ideas on why the report archival process is producing such vast Archive tables. It’s filling Hard disk space allocated rapidly. We’ve tried removing the report tables affected in the vain hope there was some sort of temporary issue, but they are recreated (as would be expected) at the same size.

Really appreciate any thoughts!

you can enable the “DBStats” plugin, the repotrs in the Admin > DB stats would maybe help give you better idea of what is using the space?

Thanks for your reply.

The DBStats plugin is installed and is how we are getting the reported stats on data usage. This doesn’t give visibility beyond the table sizes, or into the monhtly usage within the tracking tables.

The key problem is the archive tables generated are vastly inflated, and crippling the disk usage. For 776,893 pageviews in January of this year, the archive table generated is 19.2GB. This contrasts with 2,747,116 pageviews in May, resulting in an archive table of just total size 1.3G. This means the archive requirement (based on archive size per page view) is in the region of 4,500% greater from one month to another, it’s such a vast anomoly that something is clearly going seriously awry. Presumably I should be looking at another metric than page views or visit to see where this extra data is coming from.

Just not sure how to get visibility on what is causing the Archive table itself to be so inflated, I’ve tried to look through the data but it’s proving very difficult to isolate where all this extra data is coming from.

This means the archive requirement (based on archive size per page view) is in the region of 4,500% greater from one month to another,

One “explanation” for this difference, is the fact that YEARLY archives are stored in the January table. Could you try "DELETE FROM $january-table WHERE period = 4

This will delete “yearly” archives. Is the table size now “similar” to the other monthly tables?

Thanks for suggestion, I don’t think it is this as the June tables are also huge. When I attempted the delete, it failed with the error message “The total number of locks exceeds the lock table size”, could this be related perhaps?

Screenshots of both the archive data tables settings is below…

The strange thing is that the size of the archive tables seems to be continuing to grow.

Really appreciate any ideas!

Thanks for the tip on the yearly reports, that does seem to be part of the critical issue. The usage of that table had actually risen to in excess of 35gb (seemed to be continuing to grow). Removing yearly reports dropped the usage down to 1.5gb.

Any thoughts on whether yearly reports could be anticipated t be that large, or there was perhaps an issue there.

There seems to be an issue here with the Archive_blob… tables, possible since the 1.12 upgrade?

Some of our archive tables just continue to increase in size. The January table has grown to over 40GB at one point, our July Archive just 5 days in is already taking up 2.2GB. Our total data recorded in the tracker tables is 5.6GB for the whole of the last year.

This increase in archive usage is totally out of line to any increase in traffic. With the archive increase 45x from month to month based on archive storage required against the traffic recorded.

Thanks for the report. On the demo I’m seeing max twice as much data as normal, but nothing abnormal.

one explanation is: the creation of “Segment” that are pre-processed: these add to the database size, in case you are using Custom Segment.