log_import fails on Debian Apache2 install

Hi,

I am very new to Piwik. So far it seems really cool. I like to import all my apache2 weblogs into the database but I am having issues with it.

First, I don’t know how to go through all of them, since I have the access_log and the .1, .2, .3…

First, I did this:


weinraub:/var/www/html/piwik/misc/log-analytics# ./import_logs.py --url=http://www.weinraub.net/piwik/ /var/log/apache2/access.log.1
0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
Parsing log /var/log/apache2/access.log.1...
Fatal error: Cannot guess the logs format. Please give one using either the --log-format-name or --log-format-regex option

Then it ignores it.


weinraub:/var/www/html/piwik/misc/log-analytics# ./import_logs.py --url=http://www.weinraub.net/piwik/ --log-format-name=common_vhost /var/log/apache2/access.log.1
0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
Parsing log /var/log/apache2/access.log.1...

Logs import summary
-------------------

    0 requests imported successfully
    0 requests were downloads
    290 requests ignored:
        290 invalid log lines
        0 requests done by bots, search engines, ...
        0 HTTP errors
        0 HTTP redirects
        0 requests to static resources (css, js, ...)
        0 requests did not match any known site
        0 requests did not match any requested hostname

Website import summary
----------------------

    0 requests imported to 0 sites
        0 sites already existed
        0 sites were created:

    0 distinct hostnames did not match any existing site:



Performance summary
-------------------

    Total time: 0 seconds
    Requests imported per second: 0.0 requests per second

Output of the access_log file itself.


12.34.567.89 - - [01/Jun/2011:13:21:05 -0400] "GET /html/ HTTP/1.1" 200 480 "http://foo.weinraub.net/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; MDDC; .NET4.0C; .NET4.0E)"

Any help is greatly appreciated!

Thanks so much!

After some digging and reading the httpd.conf files and such, it was determined that while I do have the common log it was using vhost log which was the one with the most amount of data, so that did the trick.

Since I hate it when other people never say what they did, I will post my own solution what worked for me.

It is on a virtually hosted Debian box where I do have root access but not kernel.


./import_logs.py --url=http://www.weinraub.net/piwik/ /var/log/apache2/other_vhosts_access.log --idsite=1 --recorders=4 --enable-http-errors --enable-http-redirects --enable-static --enable-bots

I liked how it worked on the gzipped logs too. Now I need to get this into cron and I’ll be set!