INNODB Table corruptions


(Andreas Schnederle-Wagner) #1

Hello,
on our Piwik installation we got massive Problems with INNODB Table corruptions.
Every few Days an old piwik_archive_blob_xx_xx Table gets corrupted. Already got an open Case with MariaDB on that (https://jira.mariadb.org/browse/MDEV-12434) - but I noticed that those corruptions always happen between 03:00 - 04:00.
May I ask if there are automated Tastks on the Archive Blob Tables running then? (File-Change Date on the Archive Blob Tables is always between 3-4)
If there are operations running on them - what’s doing it? And why still touching such old Tables? (Or is it something from MariaDB touching those Tables?)

Piwik: 3.0.4
[root@piwikdb mariadb]# cat /etc/redhat-release
CentOS Linux release 7.3.1611 (Core)
[root@piwikdb mariadb]# mysql -V
mysql Ver 15.1 Distrib 10.1.24-MariaDB, for Linux (x86_64) using readline 5.1
[root@piwikdb mariadb]# php -v
PHP 7.0.19 (cli) (built: May 11 2017 10:39:08) ( NTS )
Copyright © 1997-2017 The PHP Group
Zend Engine v3.0.0, Copyright © 1998-2017 Zend Technologies
with Zend OPcache v7.0.19, Copyright © 1999-2017, by Zend Technologies

MariaDB Error:

2017-06-26  3:32:08 140133463947008 [ERROR] InnoDB: Space id and page n:o stored in the page read in are 1358817207:3293897363, should be 62801:13951!
2017-06-26  3:32:08 140133463947008 [Note] InnoDB: Log sequence number at the start 2841626724 and the end 2277888994 do not match.
2017-06-26  3:32:08 140133463947008 [ERROR] InnoDB: Database page corruption on disk or a failed file read of tablespace piwik/piwik_archive_blob_2016_03 page  [page id: space=62801, page number=13951]. You may have to recover from a backup.
2017-06-26 03:32:08 7f735d56f700 InnoDB: Page dump in ascii and hex (16384 bytes):
 len 16384; hex e4d4c698c454e29399cb79a5fc8ff2d4563ebcdaa95fc8641c091ca8d3fe5723f53c50fde7b7ebfbbb89987838e9124e241a7efdf13c6d813c7900c8b5bf18050b8122000005295a56697369746f72496e7465726573745f6461797353696e63654c61737456697369740000462645bddc00002664137a0$
InnoDB: End of page dump
2017-06-26 03:32:09 7f735d56f700 InnoDB: uncompressed page, stored checksum in field1 3839149720, calculated checksums for field1: crc32 1464224319, innodb 1101069064, none 3735928559, stored checksum in field2 1288165860, calculated checksums for field2:$
InnoDB: page type 7177 meaning PAGE TYPE CORRUPTED

Thank you for any information which could lead me in the right direction on stopping those corruptions … :wink:
Andreas Schnederle-Wagner


GeoIP2 + libmaxminddb + MaxMind-DB-Reader-php C Module Performance
(Lukas Winkler) #2

Hi,

unfortunately I have no idea about the corruptions, but maybe this plugin helps answering which tasks are running in piwik: https://plugins.piwik.org/TasksTimetable


(Andreas Schnederle-Wagner) #3

Hi @Findus23,
thx - according to TasksTimetable no Task is running at that time.
So I can safely assume that Piwik itself is not touching those Tables as those Times? (no “hidden” System Tasks or something like that?)
Andreas


(Andreas Schnederle-Wagner) #4

alright - just got Feedback from Virtuozzo Support (using their Software-Defined-Storage):

Answer Virtuozzo Support

since vzkernel-2.6.32-042stab122.3 and newer kernel, we changed inode preallocation, before that kernel may skip drop_preallocation if file is sparse.

Some customers reported that on earlier kernels(before 122.3) in rare cases, container’s data may be corrupted after defragmentation in ploop.

The issue I mentioned is weird coincident of ploop defragmentation and MariaDB activity that causes some none-0 blocks to be treated as 0 and reused incorrectly.

The exact reproduce condition was never found, but no one yet reported this issue to happen on new kernels with new preallocation technique.

So it seems the Problem is burried here … just in Case someone also stumbles over this Problem … :slight_smile:

Andreas


(Editions Brandon) #5

Hello, I have same issue, and I run Archilinux with 4.17.3-1.0-ARCH kernel. The table that gets corrupted is
piwik_archive_blob_2017_03 When I run mysqld in recovery mode, it is no longer corrupt. Whatever this means :frowning:

[Edit a while later] OK, I got it, I dropped both blob and numeric archive table for march 2017, and let matomo regenerate the archive (dropping just blob archive didn’t suffice, although there was nothing wrong with numeric archive). Now I don’t see errors.


(Andreas Schnederle-Wagner) #6

seems like the Problem is back … just noticed several crashed Tables which are unrepairable … :frowning:

Repairing tables
piwik.piwik_archive_blob_2015_01
Error    : Table 'piwik.piwik_archive_blob_2015_01' doesn't exist in engine
status   : Operation failed
piwik.piwik_archive_blob_2015_02
Error    : Table 'piwik.piwik_archive_blob_2015_02' doesn't exist in engine
status   : Operation failed
piwik.piwik_archive_blob_2016_09
Error    : Table 'piwik.piwik_archive_blob_2016_09' doesn't exist in engine
status   : Operation failed
piwik.piwik_archive_blob_2016_10
Error    : Table 'piwik.piwik_archive_blob_2016_10' doesn't exist in engine
status   : Operation failed
piwik.piwik_archive_blob_2016_12
Error    : Table 'piwik.piwik_archive_blob_2016_12' doesn't exist in engine
status   : Operation failed
piwik.piwik_archive_blob_2018_02
Error    : Table 'piwik.piwik_archive_blob_2018_02' doesn't exist in engine
status   : Operation failed
piwik.piwik_archive_blob_2018_03
Error    : Table 'piwik.piwik_archive_blob_2018_03' doesn't exist in engine
status   : Operation failed

------------------------------------------------------------------------------------------

2018-08-15 3:16:37 139983050487552 [ERROR] InnoDB: Read operation failed for tablespace ./piwik/piwik_archive_blob_2018_03.ibd offset 5310 with error Page read from tablespace is corrupted.
2018-08-15 3:16:37 139983050487552 [ERROR] InnoDB: Space id and page n:o stored in the page read in are 1813933666:1650418019, should be 145550:5311!
2018-08-15 3:16:37 139983050487552 [Note] InnoDB: Log sequence number at the start 2412039055 and the end 2412040847 do not match.
2018-08-15 3:16:37 139983050487552 [ERROR] InnoDB: Database page corruption on disk or a failed file read of tablespace piwik/piwik_archive_blob_2018_03 page [page id: space=145550, page number=5311]. You may have to recover from a backup.
2018-08-15 03:16:37 7f5057ffb700 InnoDB: Page dump in ascii and hex (16384 bytes):

* truncated Page dump here*

2018-08-15 03:16:37 7f5057ffb700 InnoDB: uncompressed page, stored checksum in field1 359407214, calculated checksums for field1: crc32 214706179, innodb 3078363601, none 3735928559, stored checksum in field2 1044, calculated checksums for field2: crc32 214706179, innodb 549285168, none 3735928559, page LSN 1044 2412039055, low 4 bytes of LSN at page end 2412040847, page number (if stored to page already) 1650418019, space id (if created with >= MySQL-4.1.1 and stored already) 1813933666
InnoDB: page type 50385 meaning PAGE TYPE CORRUPTED
2018-08-15 3:16:37 139983050487552 [Note] InnoDB: It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to scan your table for corruption. Please refer to http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html for information about forcing recovery.
2018-08-15 3:16:37 139983050487552 [ERROR] InnoDB: Read operation failed for tablespace ./piwik/piwik_archive_blob_2018_03.ibd offset 5311 with error Page read from tablespace is corrupted.
2018-08-16 11:50:09 139991896667904 [ERROR] Got error 180 when reading table './piwik/piwik_archive_blob_2018_03'

(Andreas Schnederle-Wagner) #7

Was able to restore all crashes Tables except 1 out of Backups … unfortunately piwik_archive_blob_2016_12 is also crashed in my Backup Files (noticed it too late …)
Guess there is no other possibility to get old Data from 2016 back … ?!?

Andreas


(Peterbo) #8

You can regenerate the archive tables if your raw data tables have not been purged. See How do I force the reports to be re-processed from the logs? - Analytics Platform - Matomo

And check if your server does some kind of backup jobs between 3 and 4 oclock. There are some engines that can corrupt datafiles.


(Andreas Schnederle-Wagner) #9

well - since purge is set to 180 days … I guess the Data of this month is lost … damn! :-/

DB Backup-Job is running at 05:00AM using innobackupex - do you maybe have some more insight into “here are some engines that can corrupt datafiles”?

thx