IIS logs analyze problems

Hi,

I am trying to implement log analyzing also for logs from IIS.

We historically have this fields in logs:

#Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) cs(Referer) cs-host sc-status sc-substatus sc-win32-status sc-bytes cs-bytes time-taken

So import_logs.py not recognize log format and with –log-format-name=iis I receive this error: AttributeError: ‘IisFormat’ object has no attribute ‘regex’.

I added to import_logs.py:


_OWN_IIS_LOG_FORMAT = (
        '(?P<date>^[\d+]+[-\d+]+[\s]+[:\d+]+) \S+ \S+ (?P<path>\S*) (?P<query_string>\S*) \S+ \S+ (?P<ip>([\d.]|[\S:])*) (?P<user_agent>\S+) (?P<referrer>\S+) (?P<host>\S+) (?P<status>\d+) \d+ \d+ (?P<length>\S+) \S+ \d+'
)

and add this line to FORMATS:


      'iis2':  RegexFormat('iis2', _OWN_IIS_LOG_FORMAT, '%Y-%m-%d %H:%M:%S'),

Then use –log-format-name=iis2. Unfortunately I don’t know how to use regex with parameter --log-format-regex .

Other problem is IIS save - (dash) to field “cs-uri-query” when page is called without query string. Because then page in Piwik looks like index.php?- , it was necessary to change this:


if hit.query_string and not config.options.strip_query_string:
            path += config.options.query_string_delimiter + hit.query_string

to this:


if hit.query_string and not config.options.strip_query_string and hit.query_string!="-":
            path += config.options.query_string_delimiter + hit.query_string

Another way it would be to change regex.

I am not Python developer and worry about my changes with Piwik update …

Jiri

Please ask your question in: Log analytics list of improvements · Issue #3163 · matomo-org/matomo · GitHub

There is an example of REGEX code in: http://dev.piwik.org/svn/trunk/misc/log-analytics/README