I want to move from Urchin to Piwik. There’s about a dozen websites from which I’d like to import historical data for about the last year.
One of them is quite busy (in Piwik terms, anyway) - I’ve about 230 million lines of historical log data to read in, and this takes more than a day to do (no doubt a huge contributor to this is doing it via the Piwik http API - I expect it’d be orders of magnitude faster if there were an offline importer, but then I guess lots more code would need to have been written).
Having read in the data Piwik is unusably slow and I gather the answer is to archive it which is where I run into problems. The archiver fails and gives errors about being out of memroy, even though I’ve set PHP’s memory max. to 1GB and watching htop, the biggest I’ve seen a PHP process get to was over 400MB - but I could have missed its getting bigger, as the process takes a while.
I though I could maybe read in a day’s data, archive, and then do the next day’s data but I have to use --force-all-periods to archive the old data, and then on every run, it’s going to reprocess all the old data. I also have to use --force-all-websites, which means it’s trying to repeatedly reprocess all data for all websites.
What’s the correct approach to read in a reasonably large amount of historical data and then archive it?
One way I did think of was to fire up a new VM and then to reset its date to day N+1, read in the data for day N, run archive.php, reboot, and repeat - but that seems very convoluted, and I don’t even really know that it would work - though it seems plausible.