Deleting old logs apparently not working


#1

We’re using piwik 1.12 and had quite a large DB (over 30GB ). I finally got a chance to manually cleanup the logs by running queries like

DELETE log_visit, log_link_visit_action FROM log_visit INNER JOIN log_link_visit_action WHERE log_visit.idvisit = log_link_visit_action.idvisit AND visit_first_action_time <= “2013-02-04”;

which took quite a while, and then running OPTIMIZE on log_visit & log_link_visit_action which took each time nearly two hours.

The database size is now down to 24GB and I’ve set up a daily run of “delete old visitor logs” to delete logs more than 365 days old. This should have run some time late last night / early this morning (as I type it’s 09:40 and the admin now says that the next scheduled deletion is in 14 hours 24 min) but it doesn’t appear to have run because running

select min(visit_first_action_time) from log_visit;

gives me

±-----------------------------+
| min(visit_first_action_time) |
±-----------------------------+
| 2013-02-05 00:00:14 |
±-----------------------------+

Note that my last manual deletion deleted everything <= “2013-02-04”

What causes the piwik scheduled deletion to run when it should? We are running Piwik as a replacement for Urchin so we use no real time tracking - it only post processes log files, which is done daily at midnight after rotating the apache logs. We run import_logs.py for each of 6 websites, and then run archive.php --force-all-websites --force-all-periods.

So, if something in the real time tracking is used to kick off scheduled deletion of old log data, that might explain why it apparently didn’t happen.


(Matthieu Aubry) #2

Please try with 2.1 RC 301 Moved Permanently

we’ve made many improvements since your version and this may work


#3

While I would like to upgrade to a newer version, I don’t want to do that immediately, so I had decided I’d turn off the “delete old visitor logs” option in Piwik and just run the above SQL query daily.

However, I hadn’t scheduled this yet and I just now had a look at the DB and saw that it appears that piwik’s scheduled deletion has worked, as min(visit_first_action_time) from log_visit is now a year ago. But what struck me much more is the huge reduction database size - down from 24GB to 3GB.

It seems that the the “delete old visitor logs” option in Piwik does rather more than the manual query I mentioned above.

I’m a little concerned that I may have deleted more than I really wanted to, though I do NOT have “Regularly delete old reports from the database” ticked , but I am able to go back and view old reports, so it would seem OK.

Can I be sure that I really didn’t need that 20 or so GB of data ?


(Matthieu Aubry) #4

I cant say if you’re using old version because there could be bug… we can only support latest version and even please use latest RC: 301 Moved Permanently