We’re using piwik 1.12 and had quite a large DB (over 30GB ). I finally got a chance to manually cleanup the logs by running queries like
DELETE log_visit, log_link_visit_action FROM log_visit INNER JOIN log_link_visit_action WHERE log_visit.idvisit = log_link_visit_action.idvisit AND visit_first_action_time <= “2013-02-04”;
which took quite a while, and then running OPTIMIZE on log_visit & log_link_visit_action which took each time nearly two hours.
The database size is now down to 24GB and I’ve set up a daily run of “delete old visitor logs” to delete logs more than 365 days old. This should have run some time late last night / early this morning (as I type it’s 09:40 and the admin now says that the next scheduled deletion is in 14 hours 24 min) but it doesn’t appear to have run because running
select min(visit_first_action_time) from log_visit;
gives me
±-----------------------------+
| min(visit_first_action_time) |
±-----------------------------+
| 2013-02-05 00:00:14 |
±-----------------------------+
Note that my last manual deletion deleted everything <= “2013-02-04”
What causes the piwik scheduled deletion to run when it should? We are running Piwik as a replacement for Urchin so we use no real time tracking - it only post processes log files, which is done daily at midnight after rotating the apache logs. We run import_logs.py for each of 6 websites, and then run archive.php --force-all-websites --force-all-periods.
So, if something in the real time tracking is used to kick off scheduled deletion of old log data, that might explain why it apparently didn’t happen.