Possible BUG in regular deleting old visitor logs from the database


We have implemented an Apache log based data import, and we have configured Piwik (Piwik 2.9.1) to regularly purge visitor logs older than 31 days.
We use crontab service (24x7) to process visitor logs after the data are imported hourly from Apache logs.

Yesterday we have processed an Apache log file to import data with 40 days ago date (17 December 2014).
We noted the idvisit field in piwik_log* tables associated to these new entries is strictly sequential (let’s say, to the first row is associated idvist=111111).
This means the new entries idvisit (whose date is 17 December 2014) is near the idvist value with current date.

When Piwik starts purging db, it executes this query to delete oldest visitor logs:

DELETE FROM piwik_log_link* WHERE idvisit <= ‘111111’

The process of deleting visitor logs older than xx days (31 in our case) seems to work like this:

  1. Piwik finds on piwik_log* tables the first row with server_time older than xx days (17 December in our case)
  2. ithe idvisit field of the matching row is the referential value (111111 in our case)
  3. Piwik starts deleting all rows whose idvisit is lower than ‘111111’ (in our case that means all visitor logs imported before the import process of yesterday!!!)

The expected behavior should be to delete all rows where server_time is older than 31 days ago (17 December):

DELETE FROM piwik_log_* WHERE server_time < 31 days ago (ie: ‘2014-12-28 00:00:00’)

or something like that.

Is there a way to import past days older than retention period (31 days) and to enable log purging execution without the deletion of visitor logs with recent dates?

Thanks in advance


Anyone on this ??

(Matthieu Aubry) #3

Hi there,

If Piwik deletes WHERE idvisit < X this may be a bug indeed, as we really should deleted based on date.

Can you please create an issue in github for this bug? Issues · matomo-org/piwik · GitHub