I have my Windows system logs, including IIS logs in w3c format sent to a syslog-ng collector.
The syslog-ng collector uses a child process to import them automatically via the python script.
I’ve downloaded the latest import_logs.py from git for the syslog server, but my piwik base is 2.4.1.
The syslog server’s python is 2.7.5.
The log Fields are: date time cs-method cs-uri-stem cs-uri-query cs-username c-ip cs(User-Agent) cs(Referer) cs-host sc-status sc-bytes time-taken
My syslog-ng script is:
exec python /tools/piwik/scripts/import_logs.py
–idsite-fallback=13 --url=https://myinternal.server.yo/piwik/
–config=/tools/piwik/config/config.ini.php --enable-http-errors
–enable-http-redirects --enable-static --enable-bots --token-auth=****
-dd
–log-format-name=w3c_extended --w3c-time-taken-millisecs
–log-format-regex=’(?P^\d+[-\d+]+[\d+:]+) \S+ (?P/\S*) (?P<query_string>\S*) (?P\S+) (?P[\d*.]) (?P<user_agent>".?"|\S+) (?P\S+) (?P\S+) (?P\d+) (?P\S+) (?P<generation_time_secs>[.\d]+)’ \
That regex is what I pieced together from the latest import_logs.py script looking at the w3c section.
And the error output is:
cat iis.log |./test_iis.sh
2015-01-07 16:21:34,188: [DEBUG] Accepted hostnames: all
2015-01-07 16:21:34,189: [DEBUG] Piwik URL is: https://myinternal.server.yo/piwik/
2015-01-07 16:21:34,189: [DEBUG] Authentication token token_auth is: *****
2015-01-07 16:21:34,189: [DEBUG] Resolver: dynamic
0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
2015-01-07 16:21:34,190: [DEBUG] Launched recorder
Parsing log (stdin)…
2015-01-07 16:21:34,190: [DEBUG] Invalid line detected (line did not match): 2015-01-07 19:47:39 POST /onebanana/sp3/rifd/ query=fuseaction=planData.manf_d johndoe 192.168.86.240 Mozilla/5.0+(Windows+NT+6.1;+WOW64;+rv:31.0)+Gecko/20100101+Firefox/31.0 https://myinternal.server.yo/oranges/sp3/rifd/?fuseaction=planData.incView myinternal.server.yo 200 37080 370
2015-01-07 16:21:34,191: [DEBUG] Invalid line detected (line did not match): 2015-01-07 19:47:39 GET /twobanana/sp3/rifd/_scripts/showGrid.js _=1420660061830 johndoe 192.168.86.240 Mozilla/5.0+(Windows+NT+6.1;+WOW64;+rv:31.0)+Gecko/20100101+Firefox/31.0 https://myinternal.server.yo/oranges/sp3/rifd/?fuseaction=planData.incView myinternal.server.yo 200 8602 17
An earlier attempt using the python script from 2.4.1, I was able to get closed except it said “invalid date”. That regex was:
–log-format-regex=’(?P^\d+[-\d+]+[\d+:]+) \S+ (?P.?) (?P<query_string>\S) (?P\S+) (?P[\d*.]) (?P<user_agent>.?) (?P.?) ((?P[\w-.])(?::\d+)?) (?P\d+) (?P\d+) (?P<generation_time_secs>\d+)’ - \
So… What I was wondering is… what am I doing wrong?
Or, is the regex-format method able to process the w3c date field? I looked at the python and I see entries like so –
self.date_format = '%d/%b/%Y:%H:%M:%S’
self.date_format = '%Y-%m-%dT%H:%M:%S’
super(W3cExtendedFormat, self).init(‘w3c_extended’, None, ‘%Y-%m-%d %H:%M:%S’)
And you can see I tried to tell it to expect the w3c_extended format before the regex… but I’m at a loss.
When I tried using just format-name=w3c_extended I get a whole different set of errors:
Parsing log (stdin)…
Traceback (most recent call last):
File “/tools/piwik/scripts/import_logs.py”, line 1900, in
main()
File “/tools/piwik/scripts/import_logs.py”, line 1871, in main
parser.parse(filename)
File “/tools/piwik/scripts/import_logs.py”, line 1689, in parse
resolver.check_format(format)
File “/tools/piwik/scripts/import_logs.py”, line 1234, in check_format
elif ‘host’ not in format.regex.groupindex and not config.options.log_hostname:
AttributeError: ‘NoneType’ object has no attribute ‘groupindex’