Live log analytics from Apache - format error

Hi,

I just set up a custom log format in Apache to directly pipe log output to Piwik.

However, I’m getting an Python error from the tool import_logs.py.

My according Apache config looks as follows:


To apache.conf I added:
LogFormat "%v %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" piwik_log


To my vhost config I added:
CustomLog "|/var/www/vhosts/stats.michi.su/misc/log-analytics/import_logs.py --url=https://stats.michi.su --idsite=4 --debug --recorders=2 --output=/tmp/piwik.log -" piwik_log
and
CustomLog /tmp/horde-temp.log piwik_log


In my debug log I’m getting the following:

2016-06-07 09:10:05,052: [DEBUG] Accepted hostnames: all
2016-06-07 09:10:05,052: [DEBUG] Piwik URL is: https://stats.michi.su
2016-06-07 09:10:05,052: [DEBUG] No token-auth specified
2016-06-07 09:10:05,052: [DEBUG] No credentials specified, reading them from “/var/www/vhosts/stats.michi.su/config/config.ini.php”
2016-06-07 09:10:05,096: [DEBUG] Authentication token token_auth is: removed-manually
2016-06-07 09:10:05,096: [DEBUG] Resolver: static
2016-06-07 09:10:05,213: [DEBUG] Launched recorder
2016-06-07 09:10:05,214: [DEBUG] Launched recorder
2016-06-07 09:10:29,851: [DEBUG] Detecting the log format
2016-06-07 09:10:29,852: [DEBUG] Check format shoutcast
2016-06-07 09:10:29,852: [DEBUG] Error in format checking: Traceback (most recent call last):
File “/var/www/vhosts/stats.michi.su/misc/log-analytics/import_logs.py”, line 1957, in check_format
match = candidate_format.check_format(lineOrFile)
File “/var/www/vhosts/stats.michi.su/misc/log-analytics/import_logs.py”, line 241, in check_format
file.seek(0)
IOError: [Errno 29] Illegal seek

2016-06-07 09:10:29,852: [DEBUG] Format shoutcast does not match
2016-06-07 09:10:29,852: [DEBUG] Check format iis
2016-06-07 09:10:30,167: [DEBUG] Error in format checking: Traceback (most recent call last):
File “/var/www/vhosts/stats.michi.su/misc/log-analytics/import_logs.py”, line 1957, in check_format
match = candidate_format.check_format(lineOrFile)
File “/var/www/vhosts/stats.michi.su/misc/log-analytics/import_logs.py”, line 241, in check_format
file.seek(0)
IOError: [Errno 29] Illegal seek

2016-06-07 09:10:30,167: [DEBUG] Format iis does not match
2016-06-07 09:10:30,168: [DEBUG] Check format common_complete


While piping the log directly to the Piwik Python Log parser, Apache also writes the same log to a file. When feeding this file to the Python script, everything works fine:

$ /var/www/vhosts/stats.neurohr.at/misc/log-analytics/import_logs.py --url=https://stats.michi.su --idsite=4 --debug --recorders=2 /tmp/horde-temp.log
2016-06-07 09:55:35,904: [DEBUG] Accepted hostnames: all
2016-06-07 09:55:35,904: [DEBUG] Piwik URL is: https://stats.michi.su
2016-06-07 09:55:35,904: [DEBUG] No token-auth specified
2016-06-07 09:55:35,904: [DEBUG] No credentials specified, reading them from “/var/www/vhosts/stats.neurohr.at/config/config.ini.php”
2016-06-07 09:55:35,950: [DEBUG] Authentication token token_auth is: manually-removed
2016-06-07 09:55:35,950: [DEBUG] Resolver: static
0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
2016-06-07 09:55:35,996: [DEBUG] Launched recorder
2016-06-07 09:55:35,996: [DEBUG] Launched recorder
Parsing log /tmp/horde-temp.log…
2016-06-07 09:55:35,996: [DEBUG] Detecting the log format
2016-06-07 09:55:35,996: [DEBUG] Check format shoutcast
2016-06-07 09:55:35,996: [DEBUG] Format shoutcast does not match
2016-06-07 09:55:35,996: [DEBUG] Check format iis
2016-06-07 09:55:35,996: [DEBUG] Format iis does not match
2016-06-07 09:55:35,996: [DEBUG] Check format common_complete
2016-06-07 09:55:35,996: [DEBUG] Format common_complete matches
2016-06-07 09:55:35,996: [DEBUG] Format match contains 10 groups
2016-06-07 09:55:35,996: [DEBUG] Check format amazon_cloudfront
2016-06-07 09:55:35,996: [DEBUG] Format amazon_cloudfront does not match
2016-06-07 09:55:35,996: [DEBUG] Check format w3c_extended
2016-06-07 09:55:35,996: [DEBUG] Format w3c_extended does not match
2016-06-07 09:55:35,996: [DEBUG] Check format icecast2
2016-06-07 09:55:35,996: [DEBUG] Format icecast2 does not match
2016-06-07 09:55:35,996: [DEBUG] Check format nginx_json
2016-06-07 09:55:35,996: [DEBUG] Format nginx_json does not match
2016-06-07 09:55:35,996: [DEBUG] Check format s3
2016-06-07 09:55:35,996: [DEBUG] Format s3 does not match
2016-06-07 09:55:35,996: [DEBUG] Check format common
2016-06-07 09:55:35,996: [DEBUG] Format common does not match
2016-06-07 09:55:35,996: [DEBUG] Check format common_vhost
2016-06-07 09:55:35,996: [DEBUG] Format common_vhost matches
2016-06-07 09:55:35,996: [DEBUG] Format match contains 8 groups
2016-06-07 09:55:35,996: [DEBUG] Check format ncsa_extended
2016-06-07 09:55:35,996: [DEBUG] Format ncsa_extended does not match
2016-06-07 09:55:35,996: [DEBUG] Format common_complete is the best match

Logs import summary

4 requests imported successfully
0 requests were downloads
3 requests ignored:
    0 HTTP errors
    2 HTTP redirects
    0 invalid log lines
    0 requests did not match any known site
    0 requests did not match any --hostname
    1 requests done by bots, search engines...
    0 requests to static resources (css, js, images, ico, ttf...)
    0 requests to file downloads did not match any --download-extensions

Website import summary

4 requests imported to 1 sites
    1 sites already existed
    0 sites were created:
0 distinct hostnames did not match any existing site:

Performance summary

Total time: 0 seconds
Requests imported per second: 32.08 requests per second

Processing your log data

In order for your logs to be processed by Piwik, you may need to run the following command:
 ./console core:archive --force-all-websites --force-all-periods=315576000 --force-date-last-n=1000 --url='https://stats.michi.su'

What can i do, in order to get the live feeding working?

Thanks,
newpipe

This looks like a bug in the log analytics tool maybe. Could you please create a bug report on the log analytics issue tracker? https://github.com/piwik/piwik-log-analytics/issues

Ok, issue submitted.
https://github.com/piwik/piwik-log-analytics/issues/142

Thanks