Archiving very broken after 0.5.1 update

I have a single Piwik installation used across 23 sites (some inactive) generating a reasonable amount of traffic (10-2000 uniques / site / day). Stats are not accessed daily for all sites so I have the archive script run nightly at 2AM. I clear old archived actions monthly using the query in the FAQ. This was already done for November before this update.

Everything was working great on 0.4.5. After updating to 0.5.1 the archive script runs for hours and hours. I managed to get it to archive all daily stats yesterday but that took at least an hour to run. Last night’s cron was still running at 11AM when I checked it (9 hours). This is on a quad core Xeon server. Here are the number of rows in the main tables:

piwik_log_action: 6267
piwik_log_link_visit_action: 111901
piwik_log_visit: 45021

This wouldn’t be a big deal if it didn’t LOCK TABLES and require extensive use of temp tables. It completely locks out any use of Piwik resulting in issues for anyone accessing a site that uses Piwik or anyone trying to view stats. It also puts a high load on the server slowing general operation.

Here’s a sample of some of the slow queries from my logs:

# Time: 091212 11:03:02
# User@Host: piwik[piwik] @  [10.2.1.239]
# Query_time: 60  Lock_time: 0  Rows_sent: 6  Rows_examined: 44481023
SELECT  name,
                                                        type,
                                                        count(distinct t1.idvisit) as nb_visits,
                                                        count(distinct visitor_idcookie) as nb_uniq_visitors,
                                                        count(*) as nb_hits
                                        FROM (piwik_log_visit as t1
                                                LEFT JOIN piwik_log_link_visit_action as t2 USING (idvisit))
                                                        LEFT JOIN piwik_log_action as t3 ON (t2.idaction_url = t3.idaction)
                                        WHERE visit_server_date = '2009-12-12'
                                                AND idsite = '7'
                                        GROUP BY t3.idaction
                                        ORDER BY nb_hits DESC;

# Time: 091212 11:08:39
# User@Host: piwik[piwik] @  [10.2.1.239]
# Query_time: 36  Lock_time: 0  Rows_sent: 3  Rows_examined: 26729656
SELECT  name,
                                                        type,
                                                        count(distinct t1.idvisit) as nb_visits,
                                                        count(distinct visitor_idcookie) as nb_uniq_visitors,
                                                        count(*) as nb_hits
                                        FROM (piwik_log_visit as t1
                                                LEFT JOIN piwik_log_link_visit_action as t2 USING (idvisit))
                                                        LEFT JOIN piwik_log_action as t3 ON (t2.idaction_name = t3.idaction)
                                        WHERE visit_server_date = '2009-12-12'
                                                AND idsite = '39'
                                        GROUP BY t3.idaction
                                        ORDER BY nb_hits DESC;

44 million rows & 26 million rows examined.

If there’s anything I can do or provide to help solve this let me know.

Thanks for the report. I created a ticket with a proposed solution at: http://dev.piwik.org/trac/ticket/1075 please let’s keep the discussion in there if any

I’m slightly surprised that this is a new issue for you as previously, the query that returns 26M rows was already there in Piwik and didn’t seem to cause you issue, but it should have. However the new query joining on idaction_url is new and produces the report Pages by URL introduced in 0.5

I can’t comment on tickets so I’ll reply here.

I think I found the problem. The index for idvisit in table piwik_log_link_visit_action was disabled. Reindexing this field turned 90 sec queries into 0.1 sec queries. I haven’t fully tested this yet because I can’t risk causing additional problems on our production server but I think this will do it.

I still think the performance issue outlined in the ticket will be an issue at some point. Btw, how did the index get disable?

Also, why can’t you comment on tickets in trac? you have to create a trac username first.

I have an account, DriverDan. I see an error on most of the pages. Here’s what I see on this ticket (#1075):

Warning: <acct_mgr.web_ui.MessageWrapper object at 0x19f4d10>

The object address changes on refreshes.

for trac, maybe try to create a new account and look for the verification email if any - this seems to be a known issue in trac, fixed in the latest version.

We have fixed the trac issue, you should now be able to login and create tickets.