Heavy query is doint nothing..?


#1

Hello,

I have since version 2.0.X the following query that takes about 40% of my CPU when running, but it looks like it is doing nothing:


INSERT IGNORE INTO piwik_tmp_log_actions_to_keep
SELECT idaction_url
FROM piwik_log_link_visit_action
WHERE idlink_va >= ?
AND idlink_va < ? 

I don’t have the table piwik_tmp_log_actions_to_keep, so all the results will be dropped, because of the ‘INSERT IGNORE INTO’ ? But the query is running just for hours, and looks that it is never ending. Should the table piwik_tmp_log_actions_to_keep be in Piwik?

And what is this query doing? It has to do with the cleanup of the old visitors log. For now I daily kill the query, in order to get the full performance back. Else my server goes slower and slower.

So is there anybody who can explain why I have this query running for hours? And is it needed, because the table is never there. The table is not created, at least it does not show up in my PHPMyAdmin.


(Matthieu Aubry) #2

this query is part of the “Log deletion” where Piwik is trying to delete your old data, and leave only the useful data in the database


#3

Ok, that is clear. But I don’t have the table ‘piwik_tmp_log_actions_to_keep’? I don’t remember seeing it at all. Should it be there? I also could not find the MySQL schema in the piwik code to verify.

So I get the feeling that the database is doing a lot of select, and insert, but it has the IGNORE setting, which allows to insert into a non existing table. Because it just ignores the errors? So I am a bit confused what it does.

I do see the delete queries that goes with the same clean up schedule task. Those are running normal.

So my concern is that it does realy slow down my mysql server, and sometimes I am still importing log lines with the log importer, and the import speed is going down by certain 33%. Which is pretty much…

Can you give a bit more insight on this?

Kind regards


#4

To bad that there is no reply on above :(.
So I add some extra information so that other people can benefit from it.

As it turns out, the cleanup task does delete first all entries in the ‘piwik_log_link_visit_action’ table that are older then X days. That goes fast. A few minutes in my case. But then it will collect all valid records from that table in a temporary table. Why this is done, is unclear to me. Even when it turns out that it does it 7 times with almost the same query. Because the cleanup task does a query for the fields: idaction_url, idaction_url_ref, idaction_name, idaction_name_ref, idaction_event_category, idaction_event_action, time_spent_ref_action. That are seven cleanup queries.

And here is why I don’t get why it has to do for every field a new query. Because the valid range for selecting is always the same. So why can there not be just one single query, that select all the seven fields at once? That is a speed increase of around seven times. And for deleting the invallid recods from the table piwik_log_link_visit_action, you can just join the temporary table seven times, so that you still can delete the null entries from piwik_log_link_visit_action.

In my case, I have at the moment 1.389.125.294 records. The query for one field takes about 2,5 hours. So the clean up will take 7 * 2,5 hours is 17,5 hours in total. That is almost the hole day! And that is on a Core i7-4790 @3.6Ghz and a SSD drive. My old sever could not even handle it.

So could this be changed? I think that the same logic of cleaning can be achieved with one select query, and a bigger join on the delete query. That this action will take 2 till 4 hours in total is fine with me. That is a speed increase of 4 times.

Kind regards


(Matthieu Aubry) #5

why can there not be just one single query, that select all the seven fields at once? That is a speed increase of around seven times

it’s possible that performance optimisations are possible. This is always the case. But we have 400 tickets opened and a backlog very full. So we have no time to work on such issues at this stage. So if you are developer please consider submitting pull request with such improvement. Thanks!