I’m the admin for hundreds of client websites . We’ve outgrown Urchin and we like Piwik as an alternative for analytics/reporting for our sites. So far it looks awesome and meets all of our needs for features and API. I need to write a script to reproduce the functionality of Urchin’s scheduler, to import apache logs for each site with import_logs.py on a scheduled basis. I already have piwik installed and import_logs.py is working well for select sites and logs. But there are a few things I need to understand before making import_logs.py our long-term strategy for analytics data.
- When I import the same apache log later to collect new data, I get:
Purging Piwik archives for dates: 2012-11-27 2012-11-25 2012-11-28 2012-11-26 2012-11-29
To re-process these reports with your new update data, execute the piwik/misc/cron/archive.php script, or see: How to Set up Auto-Archiving of Your Reports - Analytics Platform - Matomo for more info.
Does purging archives mean that Piiwk is deleting data for duplicate dates? So if Piwik encounters a data that it already has data for in a log, it will delete all data for that data? Or, does purging archives mean that its leaving the data in place but purging only piwik’s internal logs for for those dates from it creates the data reports?
- In general what is the behavior of Piwik if we import a log that has some dates that have already been imported for a site? Will duplicate dates be ignored, or will they be imported as normal, adding more data to the same dates?
If our apache logs are rotated once a week, we don’t want to wait a week before being able to import new data for a site. But if importing the same log again in order to get new data from the same log causes all the previous data to dupliate, then we’d have to wait for logs to be rotated and no new data being written to them before importing to Piwik, correct?
Good job on piwik! Thanks!