Fastest way to import and archive historical data

Kev · September 29, 2016, 4:54pm

Hi,

I have a few web sites that we want to import historical data for, but it is taking a really long time. I’m importing individual days with this:

import_logs.py --url=THEURL --idsite=THEID --recorders=4 --enable-http-errors --enable-http-redirects --enable-static --enable-bots LOGFILE --output=RESULTLOG

And after I import both logs (load balanced servers) into Piwik I then run the archive for them:

console core:archive -vvv --force-idsites=THEID --force-all-periods=315576000 --force-date-last-n=1000 --url=‘THEURL’ >> RESULTLOG

As a result it is taking about 9 minutes for each individual log to import and then 50 minutes for archive.

Is it better to import a month of logs and then run that archive line? Should the archive command only be run with --force-date-last-n=30? We tried throwing more hardware at it and it only helped marginally. I should mention that we are using an Amazon EC2 instance for the Piwik server and Amazon RDS for the mysql db. Is there a better way to import large amounts of archive logs faster?

Thanks!

-Kevin