Delete historical Matomo Data

HI,

When we are configuring the historical data, the settings available for that are the following:

  • Regularly delete old raw data

  • Delete old aggregated report data

  • Schedule old data deletion

In this faq, https://matomo.org/faq/troubleshooting/faq_42/, there are only the two first options, but we don’t know how to use “Schedule old data deletion” and what is the purpose about this setting.

We have set up these configurations to test how to use in a production enviroment :

  • Regularly delete old raw data: 2 days (we will change to six months in production)

  • Delete old aggregated report data: 12 months

  • Schedule old data deletion: week.

We also have set up the script for every one hour to auto-archiving our reports

So, What is the porpuse about Shedule old data deletion? Because if the shedule old data deletion is before one of the other configuration, will be they affected?

Moreover, in our tests, we have seen that there is no difference for the user while reading the reports: they don’t know if there are an old one that have been processed or a one directly from logs.

But we have several doubts about the reports information. Because we have to be sure to give our client the differences between the reports directly from logs or proccesed ones. Reading the faq, these are the main differences:

  • Transitions report : when you are viewing the transitions report, this report is directly from logs. If you have not logs at all, you have not reports even if you have processed. But there is a bug with the counter of the views.

  • Unique visitors : it is supposed to be the same as with transitions report: if there are not logs, there aren’t any report. But we detected in some reports that there are still information about unique visitors, for example, Visits over time:

And for many other reports like Device type:

Why is still visible the unique visitors? It is ok for us to make it visible if they are processing with the script, but some information in the faq make it clear that they must have logs to be visible.

Regards,

Hi,

To clear up the main confusion:

The third section only appears if you have enabled one of the other two options.

Here you don’t configure what data is deleted, but rather how often Matomo schedules a task to apply the above settings and actually delete data.

So if set it to “month” only once a month the data that is older than the settings above will be deleted, while when you set it to “day” Matomo is deleting data every day.

Hi,

Many thanks for your response. But it is still confusing for me.

For example, if you enable “Regularly delete old raw data”, there is an specific configuration field inside this setting where it must be specified the maximum number of days a log should have.

I guess that this field schedule the data deletion: every two days the logs will be deleted, after running the script to auto-archiving our reports.

So, if “Schedule old data deletion” set the data deletion for every month, there will be not data older than two days because the setting above stablish the data deletion every two days. So for me, it has not sense at all.

Moreover, we have to confirmed which is the main difference between the reports directly from logs or proccesed ones:

  • Transitions report: bug about counter of the views.

  • Unique visitors: are there still visible after processing the script?

Thanks!

No, if you set Delete old data every to month and Delete logs older than to 2 than Matomo will run a job once a month that will go through all logs and delete those that are older than 2 days at this time.

So the Delete logs older than setting allows you to set what logs Matomo considers to be old (and therefore should be deleted, but it does not influence how often this deletion should happen.

1 Like

Ok, now it’s clear for me. Sorry for missunderstanding the settings.

If you could review the other questions about reports, I would greatly appreciate it.

Thanks!