Console core:archive explanation


(Pedro Estevão) #1

Hi,

There is some documentation on this? I mean how it works, what it does, how, when, etc?

Since the beginning fo our use of piwik, now matomo, that this was the major issue. It is almost as heavy as MySQL that supports all queries and data and use almost as system resources as MySQL does, not talking about the time that it takes to process the week and month data (year, forget it, “is stopped”).

For instance, why does week use by default lastN=260 and month lastN=50? Why we look at 260 weeks and 50 months? Why not lastN=1 (the same as passing --force-date-last-n=1 to it)?

Another one: does week use the already archived data for day? And the month for day and week? The same for year using month, week and day archived data? If not, imagine that in our matomo configuration only keep log_* data for 15 days and some month archive have failed for… 7 days? The month data wil never be complete because since the fail and the re-beginning of the archive process wil have at least 7 days of “empty data”. Why not use the already processed data from the archive_* tables for the perid bellow it (as said, week will use day, month will use week and day, etc)?

This and some other questions can’t find echo in any documentation or forum post and we are willing to help not only try to make core:archive and all archiving process better and try to explain it too. Even with debug and the more verbose level available the not so much output that can help the users to understand it.

The errors too are somewhat generic (“MySQL has gone away” or “Error while sending QUERY packet”) that can link the issues to memory used by php or netwok issues linking to MySQL server or even max_allowed_packet exaustion on MySQL my.ini configuration.

Hope to ear from you!

Greetings, Pedro


(Matthieu Aubry) #2

Hi Pedro

the core:archive should automatically pick the right “last N” and will most of the time use last2 . if you’re not seeing this behavior, let us know (maybe create a bug report on GitHub - matomo-org/matomo: Liberating Web Analytics. Star us on Github? +1. Matomo is the leading open alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites, apps & the IoT and visualise this data and extract insights. Privacy is built-in. We love Pull Requests! )

Another one: does week use the already archived data for day? And the month for day and week? The same for year using month, week and day archived data? If not, imagine that in our matomo configuration only keep log_* data for 15 days and some month archive have failed for… 7 days?

Yes it does. Matomo will pick the smallest possible number of periods combination to make up for the total period.

This and some other questions can’t find echo in any documentation or forum post and we are willing to help not only try to make core:archive and all archiving process better and try to explain it too. Even with debug and the more verbose level available the not so much output that can help the users to understand it.

Absolutely, it could be better documented. So far not many people asked questions about it, as it’s supposed to work well most of the time :slight_smile: So if you have these questions and more, it would be useful if you can create a ticket on github and suggest us to improve the doc at: How to Set up Auto-Archiving of Your Reports - Analytics Platform - Matomo
and list the questions you would like to see answered?

looking forward to continuing discussion on Github and make improvements together,
thanks,