we are using something similar like “Log Analytics” to upload data to Piwik. This is executed every half an hour (at minute 10 and 40) for 15 sites with 150k visitors in total. The archiving is done via cronjob as mentioned in the FAQ every hour.
We have noticed we are missing some ecommerce data in the overview. The aggregation seems to fail as the data itself shows up fine in the ecommerce log. After further investigation it turned out the missing data for that day occured between 23:40 and 0:00. So the cronjob running at midnight archived the site but the import running at 0:10 delivered some more data between 23:40 and 0:10 which the archiving at 1:00 did not properly recognize as “new visit”. We assume this is not only the case for ecommerce but also for page views, searches etc. but were not able to verify that properly.
If I understand the output of “./console core:archive” correctly, only the visits after the last run is archived. However this is not valid for day processing - the FAQ linked above states: “Piwik archiving for today’s reports is not incremental:[…]Piwik will read all logs for the full day to process a report for that day.” This would explain our issue and also why we have those gaps only at the end of the day. Is this assumption correct?
After removing the cronjob at midnight and continue archiving at 1am, it seems to work. However I would be interested how to force a reporting for a specific date. If I execute the command for reprocessing mentioned in the FAQ, the data is still missing. Is the only option here to drop the “piwik_archive” tables and force a reprocessing?
Would it be possible to mark archived rows in database so that the archiving process checks for unprocessed data and processes the whole day again after purging it?