Import logs but non-URI specific only


#1

Hello,
I am using a syslog-ng server to collect multiple weblogs from multiple web app servers. We use a reverse proxy to keep everything on a minimal amount of front end hostnames for the users:
www.example.com (with app1, app2, app3, app4)
app.example.com (with app5 and app6)
web.example.com (with piwik, app2, app4)

The format comes in cleanly from all hosts using the apache style virtualhost log format with a few minor tweaks:
%v %{X-Forwarded-For}i appservername %u %t “%r” %>s %b %D “%{Referer}i” "%{User-agent}i"
I have the microsecond TimeTaken field and in place of the unused identd field I am sticking the application server’s name.

I have the syslog-ng server running the import script continually. This is what it looks like:
exec python /tools/piwik/misc/log-analytics/import_logs.py --url=https://web.example.com/piwik/ --idsite-fallback=13 --config=/tools/piwik/config/config.ini.php --enable-http-errors --enable-http-redirects --enable-static --enable-bots --log-format-name=common_vhost --output=/tools/piwik/script.log -

The piwik site admin looks like this:
ID NAME URLS EXCLUDED IPS EXCLUDED PARAMETERS SITE SEARCH
13 Foo bar - - - Yes
8 web web.example.com - - - Yes
1 app1 www.example.com/app1 - - - Yes
(another app1 url): */app1
2 app www.example.com/app2 - - - Yes
3 piwik web.example.com/piwik - - - Yes
(another piwik url): */piwik

It sort of works, but just with hostname matching such as for “web”, “app”, and the default site “foo” as defined by the import script options. The problem is that none of the others, the URI specific definitions, work. Everything goes into “foo” if I don’t have a hostname-only definition. Nothing into the more URI specific definitions.

What am I doing wrong?

I thought maybe it was the format, so I’ve tried a regex formatting:
exec python /tools/piwik/misc/log-analytics/import_logs.py --url=https://web.example.com/piwik/ --idsite-fallback=13 --config=/tools/piwik/config/config.ini.php --enable-http-errors --enable-http-redirects --enable-static --enable-bots --log-format-regex=’(?P[\w-.])(?::\d+)? (?P[(\d.)]+) (.?) (.?) [(?P.?) (?P.?)] "(?P.?) \S+" (?P\d+) (?P\d+) (?P<generation_time_milli>\S+) (?P.?) (?P<user_agent>.?)’ --debug --debug --output=/tools/piwik/script.log -

But it is pretty much the same. I do mysteriously get an error stating that a line does not match the format but it seems perfect. I’ve even replayed them by tail/grep from the logfile and they work… Very strange.