Purging Piwik Logs Documentation Error, or What?

On a site I work on, we have a privacy policy that says we will delete all personally identifiable information older than 12 weeks. I’m trying to make sure I’ve done this in Piwik, but I’m a bit confused by the docs.

In this FAQ, it says that the way to purge logs is by turning on the archive.sh script, AND running a custom SQL command [1].

That makes sense to me, but I have archiving set to run once a day, and looking at what the archive code does, it seems like the above SQL is completed by the archive.sh script anyway.

For example, when I look at my logs[2], they go back to the first of each month, and then end. And when I look at the code in archive.sh, it seems to indicate logging for the day, week, and month. And, I don’t have any other commands that would be truncating the logs at the first of each month.

Is this an error in the documentation, or am I missing something? If the archive.sh script is doing what I want it to, then I’m pleased as punch, and we should update the FAQ mentioned above. If not, then I’m really confused.

Help? Thoughts? Developers?

[1] DELETE piwik_log_visit, piwik_log_link_visit_action FROM piwik_log_visit INNER JOIN piwik_log_link_visit_action WHERE piwik_log_visit.idvisit = piwik_log_link_visit_action.idvisit AND visit_server_date <= CURRENT_DATE() - 30;
[2] SELECT visit_server_date FROM piwik_log_visit INNER JOIN piwik_log_link_visit_action WHERE piwik_log_visit.idvisit = piwik_log_link_visit_action.idvisit ORDER BY -visit_server_date;

Still no reply here, though I’ve noticed that 67 people have looked at this post. Any ideas?

My next step will be to file a bug report linking to this post, since I do think this is a documentation bug, but I don’t know the Piwik system nearly as well as others here, so I’d like to avoid putting my foot in my mouth if possible…

Any help would be simply lovely.

Log deletion is not done by the archive.sh script and has to be done manually as explained in FAQ

How would the script look like?
Are there any dependencies to the 3 logfiles except from the Liveaction plugin?

Best regards,
KBergsoe

[quote=KBergsoe @ Aug 2 2010, 11:57 AM]How would the script look like?
Are there any dependencies to the 3 logfiles except from the Liveaction plugin?

Best regards,
KBergsoe[/quote]

Well, I got a totally simple shell script wrapper around the archiving (run daily here) based on the faq information:

/bin/bash <path-to-piwik>/misc/cron/archive.sh >> /tmp/tracking.log
mysql -u<mysql-username> -p<mysql-password> -h<mysql-host> <mysql-db> -e "DELETE piwik_log_visit, piwik_log_link_visit_action FROM piwik_log_visit INNER JOIN piwik_log_link_visit_action WHERE piwik_log_visit.idvisit = piwik_log_link_visit_action.idvisit AND visit_server_date <= CURRENT_DATE() - 30"
mysql -u<mysql-username> -p<mysql-password> -h<mysql-host> <mysql-db> -e "optimize table piwik_log_visit"
mysql  -u<mysql-username> -p<mysql-password> -h<mysql-host> <mysql-db> -e "optimize table piwik_log_link_visit_action"

I’m still confused then. I am running the archive script, and when I look into my logs with:

SELECT visit_server_date FROM piwik_log_visit INNER JOIN piwik_log_link_visit_action WHERE piwik_log_visit.idvisit = piwik_log_link_visit_action.idvisit ORDER BY -visit_server_date;

I see that they have been truncated at the beginning of the month, which makes it seem like the archive.sh script is doing what the FAQ is saying I must do manually…

I tried looking through the source code to find where the archive script is, but I promptly got lost, since I don’t do PHP normally.

The feature of automatically deleting old older than 7/30/N days is now available in Piwik, under Settings > Privacy > Delete old logs from the database.

This is available in the latest 1.5 RC release, check it out now and report if you have suggestions, directly in this post: 301 Moved Permanently