Experiences with Piwik on MySQL 5.5.x?

Hey folks,

has anyone had experience with Piwik on MySQL 5.5.x?
I’m actually using MariaDB 5.5.x but it should be similar.

I don’t see any worse behaviour in the tracking process but my archiving time more than doubled from 250 minutes to 650 minutes. Looking at my munin output I see that the number of temporary tables (which is regularly very high during archiving) has gone down as if it can’t handle as many temporary tables per second as before.

I haven’t heard such report before, that is certainly surprising. Maybe it needs more memory allocated?

that the number of temporary tables
What do you mean exactly? what metric is it?

I don’t think that “more memory” is the solution. When the problem started I changed the innodb_buffer_pool from 2GB to 9GB (my tables are in innodb for backup and scalability reasons) and it didn’t give any help.
Temporary tables created is just the mysql variable Created_tmp_tables over time (5 minute interval to be exact).

For now I can just assume that the best performing mysql for piwik in my case was mysql-5.1 but I already upgraded completely :-(.

As a followup.
I see the following query (with different dates and idsites) a lot of time in the processlist when archiving runs.
It takes 0.13 - 0.15 seconds most of the time as you can see below.
But if I remove the DESC on the sorting its running at 0.00 seconds also seen below (repeatable).
As that query returns only 1 item anyway - why is a sorting needed at all? Could it be removed for higher archiving performance?


SELECT idarchive, value, name, date1 as startDate FROM piwik_archive_numeric_2012_04 WHERE idsite = '4982' AND date1 = '2012-04-03' AND date2 = '2012-04-03' AND period = '1' AND ( (name = 'done' AND value = 1) OR (name = 'done' AND value = 3) OR name = 'nb_visits') AND ts_archived >= '2012-04-03 21:59:59' ORDER BY ts_archived DESC
    -> ;
+-----------+-------+------+------------+
| idarchive | value | name | startDate  |
+-----------+-------+------+------------+
|    106890 |     1 | done | 2012-04-03 |
+-----------+-------+------+------------+
1 row in set (0.13 sec)

SELECT idarchive, value, name, date1 as startDate FROM piwik_archive_numeric_2012_04 WHERE idsite = '4982' AND date1 = '2012-04-03' AND date2 = '2012-04-03' AND period = '1' AND ( (name = 'done' AND value = 1) OR (name = 'done' AND value = 3) OR name = 'nb_visits') AND ts_archived >= '2012-04-03 21:59:59' ORDER BY ts_archived;
+-----------+-------+------+------------+
| idarchive | value | name | startDate  |
+-----------+-------+------+------------+
|    106890 |     1 | done | 2012-04-03 |
+-----------+-------+------+------------+
1 row in set (0.00 sec)

Edit: Maybe I can answer the question already from looking at the code: as the function is quite generic and used for lots of different periods and it uses the order to get the most recent entry - no, it can’t just be removed, bummer :(.

Thanks in advance,

thomas

How many rows have you got in these archive tables that show a slow behavior?

Maybe this is caused by the bug: Huge DB Disk space savings! Database purge should trigger on all archive_* tables at least once a day · Issue #3003 · matomo-org/matomo · GitHub

How many entries should I expect? :wink:
The numbers are similar to the other tables seen in the db - check the attached screenshot from phpMyAdmin.

Btw. for testing I changed the code in isArchived to the following - assuming that it would return only one row for daily archiving for one day:


$sqlQuery = "   SELECT idarchive, value, name, date1 as startDate
                                                FROM ".$this->tableArchiveNumeric->getTableName()."
                                                WHERE idsite = ?
                                                        AND date1 = ?
                                                        AND date2 = ?
                                                        AND period = ?
                                                        AND ( (name = '".$done."' AND value = ".Piwik_ArchiveProcessing::DONE_OK.")
                                                                        OR (name = '".$done."' AND value = ".Piwik_ArchiveProcessing::DONE_OK_TEMPORARY.")
                                                                        $sqlSegmentsFindArchiveAllPlugins
                                                                        OR name = 'nb_visits')
                                                        $timeStampWhere";
                // updated TS77 
                // we are guessing here with date1 = date2 and period being 1 (for day) that it returns one row only anyway
                if($bindSQL[1] != $bindSQL[2] || $bindSQL[3] != 1) {
                     $sqlQuery.=" ORDER BY ts_archived DESC";
                }

That brought the archiving time for my 7k sites down to 250 minutes again.

Thanks for your tests. Can you please instead try to leave the ORDER BY but replace it with: ORDER BY idarchive DESC ?

because it is primary key it might store faster somehow, I would be curious to know if it helps and if it is as good as without the ORDER?

Thanks for the suggestion.
I’m gonna try it with the archiving run next night.

From the manual test it also returns in 0.00 seconds but explain shows a filesort:


+------+-------------+-------------------------------+-------+-------------------------------------------------+---------------------------+---------+------+------+----------------------------------------------------+
| id   | select_type | table                         | type  | possible_keys                                   | key                       | key_len | ref  | rows | Extra                                              |
+------+-------------+-------------------------------+-------+-------------------------------------------------+---------------------------+---------+------+------+----------------------------------------------------+
|    1 | SIMPLE      | piwik_archive_numeric_2012_04 | range | index_idsite_dates_period,index_period_archived | index_idsite_dates_period | 24      | NULL |    1 | Using index condition; Using where; Using filesort |
+------+-------------+-------------------------------+-------+-------------------------------------------------+---------------------------+---------+------+------+----------------------------------------------------+

That could give problems with a larger resultset. Probably as its not in the same index - but for innodb that index is a secondary index so it always have to do two index lookups (also for the original problem).

Awesome but could it break something?
I’m now down to 135 minutes for my full daily archiving run!
Great idea - but it looks to good to be true ;-).

Lol 135 min from 250 min ? are you sure about your test protocol ?? :slight_smile:

[quote=matt]
Lol 135 min from 250 min ? are you sure about your test protocol ?? :)[/quote]

Yes, I just forgot that I also disabled optimization of the tables - not much sense in it with innodb ;).

Thanks for the report, I created a bug in API Performance improvements on very large setup: SQL query tweak · Issue #3140 · matomo-org/matomo · GitHub

I fixed the issue in: API Performance improvements on very large setup: SQL query tweak · Issue #3140 · matomo-org/matomo · GitHub

I hope it will help :slight_smile: