Data Archiving / Data Growth Management

I currently run Piwik taking metrics from multiple sites which log nearly 300,000 page views per day. The issue is that the database continues to grow in size. Even though the archive job helps in cleaning up past report data, the database continues to grow in size by about 300MB-400MB per every 2 weeks.

The tables of concern are piwik_log_link_visit_action and piwik_log_visit. Here are the current stats of these 2 tables:

piwik_log_link_visit_action = 30 million records, 1.2 GB of data
piwik_log_visit = 14 million records, 2.6 GB of data

I’ve been told that this will be addressed in a future release of the application. However, at this time I am in need of clean up. Here is my question:

Is there a way I can manually clean out this data?

I’m ok with losing some of the historical data if needed.

I believe that the script misc/cron/archive.sh should be doing the disk cleanup for you. On our system it makes a large difference.

A few months ago the script failed (out of memory) - I understand that the error was resolved @ http://dev.piwik.org/trac/ticket/422, and have updated to the current SVN.

How can I instruct Piwik to archive the data for prior months? Eg, I want to archive data for 2008-01 and 2008-06.

Thanks!

I added a new FAQ entry following your question: http://piwik.org/faq/troubleshooting/#faq_42
let me know if that’s enough

we have just released a new feature, “Page Transitions” which shows the path of a visitor from a given page, and where visitors click from there. It’s pretty awesome! Check it out: Clickpath analytics