Creating an import script for logs and I have some questions

Hello everyone,

I am in the process of creating a desktop application in VB.Net for importing a lot of my old log files into Piwik. I have a few questions, so I hope someone wouldn’t mind lending me some knowledge, as my understanding of Python is very minimal.

I’ve reviewed the python script and, to simplify it, it looks like I just need to create a json array and post that to http://myinstall/piwik.php. Is this correct? And/Or is there an example $_POST line that shows what one of the requests looks like?

I also see in the import_logs.py that hits are placed in to queues so that “they are added in the right order”… is this a piwik specific thing, or just the way that python works? I’m assuming the former, but I just wanted to be sure.

I’ll probably have more questions later, but any help that anyone can give would be much appreciated.

You should use the python script to do this rather than re-do yourself (call python script from your vb app)

see also: http://piwik.org/docs/tracking-api/reference/

Matt,

Thanks for the link, I knew I had seen the documentation somewhere.

So I’m going to look in to running the script from my vb interface but I have some additional questions. The whole reason for the interface is to give some additional feedback and allow for an easier setup of pulling log files to be imported (we have a lot of files going back for quite a few years that need to be transferred). In setting up my strategy for doing these imports, should I be running the archive script after every import? I feel like that is the right thing to do but I’m still catching up on how everything works.

The other big question is this: while we’re playing catch up, we’re going to try and use the javascript/image tracking to catch real-time stats but then plan on parsing out the server logs every night to get more information. What is the easiest path for deleting the information collected during the day before parsing the log files? I know I can run a mysql delete command with the siteid and date range, but I also read about deleting the piwik_archive_numeric table associated with the year_month that I’m replacing… is that a necessary step given that we will be tracking hundreds of sites?

Sorry if these are pretty basic questions but I’m trying to get answers with real backing instead of my supositions before we start doing our mass imports

Thanks!

Ok, unfortunatley I’ve had to go back to my VB solution as I wasn’t having much luck reading the output of the python script in my vb app in real time (the client has requested this as mandatory). So I’m back to doing my parsing of the logs in vb and then sending the information off to /piwiki.php.

I’ve read through the Tracking API reference page and I’m trying to construct the payload that goes to the server. What I’m stumbling on right now is how to send the IP Address from the logs as opposed to my own IP address. Every time I run my script I get the return that “Visitor IP 192.168.0.111 is excluded from being tracked” which is what we want for our regular visiting traffic. But how do I sent this to the server for the import to say “this was the visit so record this instead”? I thought it would have been the urlvar “cip” but that didn’t seem to work.

Thanks!

REALLLY you should NOT redo this yourself…