I have a few web sites that we want to import historical data for, but it is taking a really long time. I’m importing individual days with this:
import_logs.py --url=THEURL --idsite=THEID --recorders=4 --enable-http-errors --enable-http-redirects --enable-static --enable-bots LOGFILE --output=RESULTLOG
And after I import both logs (load balanced servers) into Piwik I then run the archive for them:
console core:archive -vvv --force-idsites=THEID --force-all-periods=315576000 --force-date-last-n=1000 --url=‘THEURL’ >> RESULTLOG
As a result it is taking about 9 minutes for each individual log to import and then 50 minutes for archive.
Is it better to import a month of logs and then run that archive line? Should the archive command only be run with --force-date-last-n=30? We tried throwing more hardware at it and it only helped marginally. I should mention that we are using an Amazon EC2 instance for the Piwik server and Amazon RDS for the mysql db. Is there a better way to import large amounts of archive logs faster?