The Piwik install we are using tracks about 13,000 sites which together have about 12.5 million hits a year. Each site we create has a relatively short life (about six months or less, usually), and we most commonly are only interested in one date range - from the time the site is “activated” (after it has been publish and QA tested) until the time the site is “deactivated” (before the domain is revoked). There are two points of functionality that we would like to achieve with Piwik:
[ul]
[li] Pre process custom segments that are specific to each site. If a site is activated on 6/10 and is deactivated on 6/20, we would like the core:archive command to process that range automatically.
[/li][li] Once a site has been deactivated, move it to a separate database / piwik instance.
[/li][/ul]
The former point seems to have been looked in to, but this seems to give the option to have piwik-wide pre processed ranges, instead of ones that are unique to each site. Is having unique pre processed ranges possible currently? Is it something that could be added?
The latter point may be something that needs to be done by a script outside of piwik, but maybe you all have some insight. We still want to track the deactivated sites (to see if people are using them after they should), but it seems like a waste of time to process them as often as the active sites. All we really need to know is “has this site been hit within the last week”. Splitting them into a separate database would theoretically decrease the archive time for the active sites, would would be a boon. Has anyone seen a similar setup?
Anything would be possible but it would require some time to understand fully your use case, and implement solution with tests etc. At this moment we are fully booked with lots of work and I can’t help further, sorry about that.
For the time being, we’re in the process of setting up a cron job python script that accepts a csv of site ids and dates of when a site is deactivated. If the date given is > a year, it runs “console core:delete-logs-data --idsite=ID”. If the the deactivation date is between three weeks old and a year old, we’ll use the “console migration:site ID” plugin to move the data to a secondary piwik instance. The secondary instance can go as slow as it likes, while the primary instance will be greatly sped up by lightening the data load.
Then on the tracked sites we’ll have a slightly longer javascript snippet that says “is it after the deactivation date? if so, report to the secondary server. otherwise report to the primary server”.
Honestly whatever works for you sounds good. Ideally Piwik would work well out of the box even for your “crazy” use case but it will take us few years to adress all those use cases (we are making heaps of progress with regards to performance, but it’s a long road)