Logfile import question

After reading the instructions on importing old logfiles, I am still unclear on some of the steps in running the access_log import script.

My Piwik install is not on the same server as some of the websites I’m analyzing, so I would need to copy the logfiles from one server to another, then ask the python script to import them. Do I need to put the logfile in the same directory with the python script, or someplace else specific?

I also think I have a huge stress test for log importing on deck. One of the logfiles I want to import hasn’t been rolled over in over a year. Not sure if it was missed when someone else did mods on logrotate or what, but it’s currently sitting at 25Gb and still growing.

I’m toying with the idea of splitting it into 500Mb or 1Gb chunks, but part of me just wants to see what happens if I ask it to crunch a 25Gb file all at once :slight_smile:

But the basic question remains: where do I put these files on the Piwk server in order to run the import script?

it’s recommended to put on the same server as piwik as to avoid the network transfert of data between the python importer and the piwik API.

However it’s not required so you can also trigger from separate server.

it will beinteresting to see what importing 25G does, we have never tried yet :wink:

If I were running the import on my own server, I would try it, but since my Piwik server is on a shared Hostgator server, and I don’t know if that would make the machine cry or not.

I will try to split the file and import it in chunks, but some of the more recent log entries will duplicate information already tracked and processed by Piwik (about 3-4 weeks worth of data).

Will Piwik ignore any access_log data for dates it’s already processed, or will it just be better to not include those dates in the import at all?

Piwik will not exclude anything so better exclude the data before importing