Usage: /path/to/cli/php /var/www/vhost/misc/cron/archive.php [<current_periods_timeout> [<reset|forceall>]]
: Piwik hostname eg. localhost, localhost/piwik
<current_periods_timeout>: Current week/month/year will be processed at most every <current_periods_timeout> seconds. Defaults to 3600.
<reset[window_back_seconds]|forceall>: you can either specify
- reset: the script will run as if it was never executed before, therefore will trigger archiving on all websites with some traffic in the last 7 days.
You can specify a number of seconds to use instead of 7 days window, for example call archive.php 1 reset 86400 to archive all reports for all sites that had visits in the last 24 hours
- forceall: the script will trigger archiving on all websites for all periods, sequentially
- reset+forceall: you can also specify both, which is effectively the same behavior
as the slower script archive.sh. The only added optimization: it does not trigger archiving for periods
if the last 52 days have no data at all.
This script should be executed every hour, or as a deamon.
For more help and documentation, try $ /path/to/cli/php /var/www/analytics.partcommunity.com/misc/cron/archive.php help
= Description =
This script will automatically process all reports for websites tracked in Piwik.
See for more information How to Set up Auto-Archiving of Your Reports - Analytics Platform - Matomo
= Example usage =
$ /usr/bin/php /path/to/piwik/misc/cron/archive.php localhost/piwik 6200
This call will archive all websites reports calling the API on http://localhost/piwik/index.php?.…
It will only process the current week / current month / current year more if the existing reports are older than 2 hours (6200s).
Setting a large timeout for periods ensures best performance when Piwik tracks thousands of websites or a few very high traffic sites.
$ /usr/bin/php /path/to/piwik/misc/cron/archive.php localhost/piwik 1
Setting <current_periods_timeout> to 1 ensures that whenever today’s reports are processed, the current week/month/year will
also be reprocessed. This is less efficient than setting a timeout, but ensures that all reports are kept up to date as often as possible.
= Sample output =
See this link for a sample output:
= Requirements =
= More information =
This script is an optimized rewrite of archive.sh in PHP, allowing for more flexibility
and better near real-time performance when Piwik tracks thousands of websites.
When executed, this script does the following:
- Fetches Super User token_auth from config file
- Calls API to get the list of all websites Ids with new visits since the last archive.php succesful run
- Calls API to get the list of segments to pre-process
The script then loops over these websites & segments and calls the API to pre-process these reports.
At the end, some basic metrics and processing time are logged on screen.
Notes about the algorithm:
- The first time it runs, all websites with traffic in the last 7 days will be processed
- To improve performance, API is called with date=last2 (to query yesterday and today) whenever possible, instead of last52.
To do so, the script logs the last time it executed correctly.
- The script tries to achieve Near real time for “today” reports, processing “period=day” as frequently as possible.
- The script will only process (or re-process) reports for Current week / Current month
or Current year at most once per hour. To do so, the script logs last execution time for each website.
You can change this <current_periods_timeout> timeout as a parameter when calling archive.php script.
The concept is to archive daily report as often as possible, to stay near real time on “daily” reports,
while allowing more stale data for the current week/month/year reports.
= Ideas for improvements =
Once an hour max, and on request: run archiving for previousN for websites which days have just
finished in the last 2 hours in their timezones, then TODO uncomment when implemented full archiving
Bug: when adding new segments to preprocess, script will assume that data was processed for this segment in the past
FAQ + doc update, for using this archive.php instead of archive.sh/.ps1 to deprecate
FAQ for daemon like process. Run 2 separate for days and week/month/year?
‘reset’ not compatible with concurrent threads
scheduled task send multiple reports when concurrent threads
prepare script to start multiple processes
Run websites archiving in parallel, currently only segments are ran in parallel
Queue Period archiving to be executed after today’s reports with lower priority
Core: check that on first day of month, if request last month from UI,
it returns last temporary monthly report generated, if the last month haven’t yet been processed / finalized
Optimization: Run first most often requested websites, weighted by visits in the site (and/or time to generate the report)
to run more often websites that are faster to process while processing often for power users using frequently piwik.
UI: Add ‘report last processed X s ago’ in UI grey box ‘About’