Adding page generation time - regex format


(James) #1

Hi all,

I’m currently using the below snippet to log my apache logs


LogFormat "%v %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" piwik

Now i know I can add the %D to the end so I can get page generation time like so


LogFormat "%v %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %D" piwik

But how do I then format that in regex so my piwik import script imports it correctly? I know I can add it through --log-format-regex"blah" but I’ve got no idea how to format that in regex. Can anyone help?

Cheers,
Mooash


#2

I have nginx log configuration:


log_format piwik '$host $remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_time';

File output is roughly this:


my_tracked_domain.org 178.122.228.76 - - [03/Nov/2013:06:22:41 +0400] "GET /backend/service/index HTTP/1.1" 200 2464 "http://my_tracked_domain.org/backend/news/index" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36" 0.100

I first ran the following command:


python /var/www/s.rent-shop.org/misc/log-analytics/import_logs.py --url=http://s.rent-shop.org/ /var/www/s.rent-shop.org/tmp/logs/test777-31-01nov.log --recorders=4 --enable-http-errors --enable-http-redirects --enable-static --enable-bots --log-format-regex='(?P<host>[\w\-\.]*)(?::\d+)? (?P<ip>\S+) \S+ \S+ \[(?P<date>.*?) (?P<timezone>.*?)\] "\S+ (?P<path>.*?) \S+" (?P<status>\S+) (?P<length>\S+) "(?P<referrer>.*?)" "(?P<user_agent>.*?)" (?P<generation_time_milli>\S+)

But there was an error and I had to make a patch:


$  git diff misc/log-analytics/import_logs.py
diff --git a/misc/log-analytics/import_logs.py b/misc/log-analytics/import_logs.py
index 64f3738..8acf924 100755
--- a/misc/log-analytics/import_logs.py
+++ b/misc/log-analytics/import_logs.py
@@ -1539,7 +1539,7 @@ class Parser(object):
                 hit.length = 0

             try:
-                hit.generation_time_milli = int(format.get('generation_time_milli'))
+                hit.generation_time_milli = int(float(format.get('generation_time_milli')) * 1000)
             except BaseFormatException:
                 try:
                     hit.generation_time_milli = int(format.get('generation_time_micro')) / 1000

Now for me import works fine (just slowly)…