How does one load an entire year’s worth of data in a reasonable time? Is there an option to bulk-load it?
I’ve got 16 clustered web hosts.
We have the logs on disk, at /var/logs/2014/hostname
What I’m doing now is using import_logs.py, pointing it to the logs on disk, but that’s pretty slow …
About 3 hours for one month’s’ worth of data, from one server. That’s pretty slow, and that is not making anyone happy.
I can run two imports at once, cutting the time in half, theoretically.
When I tried to run a third I started getting deadlock errors in SQL and eventually the third process crapped out.
So … what am I doing wrong? Is there a better way?