Loading a year's worth of data: how?


#1

How does one load an entire year’s worth of data in a reasonable time? Is there an option to bulk-load it?

I’ve got 16 clustered web hosts.

We have the logs on disk, at /var/logs/2014/hostname

What I’m doing now is using import_logs.py, pointing it to the logs on disk, but that’s pretty slow …

  • About 3 hours for one month’s’ worth of data, from one server. That’s pretty slow, and that is not making anyone happy.

  • I can run two imports at once, cutting the time in half, theoretically.

  • When I tried to run a third I started getting deadlock errors in SQL and eventually the third process crapped out.

So … what am I doing wrong? Is there a better way?


(Matthieu Aubry) #2

Hi there,

it’s hard to answer this question since you don’t provide any number. Generally it takes time to import a lot of data, and the import time can be greatly reduced by having a very powerful / well tuned Piwik DB server