Tomcat import log issue

Hi all,
I am new to this forum. So, I installed piwik on an apache webserver and I tried to import a log file from a Tomcat webserver but I get the following error:
Fatal error: Cannot guess the logs format. Please give one using either the --log-format-name or --log-format-regex option
This is the command that I used:
python /var/www/piwik/misc/log-analytics/import_logs.py --url=http://192.168.1.100/piwik/ /home/user/app1/catalina.2012-12-10.log --idsite=1 --recorders=1 --enable-http-errors --enable-http-redirects --enable-static --enable-bots
And this is what the log file contains:
Dec 10, 2012 12:02:50 AM org.apache.catalina.core.StandardWrapperValve invoke
INFO: 2012-12-10 00:02:50,000 - DEBUG InOutCallableStatementCreator# - Call: AdminReports.GETAPPLICATIONINFO(?)

I tried googling it but I didn’t find much. Also I tried this forum but the same. Can you help me? What parrameter shall use with --log-format-name or --log-format-regex option?

Please see the ticket (and report the bug there with sample log file to reproduce): Log analytics list of improvements · Issue #3163 · matomo-org/matomo · GitHub

(if possible provide patch or documentation udpate)

I got no response from there. This thing is urgent. Can somebody help me?

what server version? PHP version? Python version?

Server version:Ubuntu 10.04.4 LTS
Python on version: Python 2.6.5
PHP version:PHP 5.3.2-1ubuntu4.18

I tried to install piwik on the tomcat server but piwik need php5 and tomcat does not support php5. Can you help?

see here

http://piwik.org/docs/requirements/

I would update the pyhton to at least v 2.7.x your php version should be ok. try and see if that alone helps.

I updated it to 2.7 but the same:
0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
Parsing log /home/asentinel/applogs/app1/catalina.2012-12-10.log…
Fatal error: Cannot guess the logs format. Please give one using either the --log-format-name or --log-format-regex option

We can help you, contact us at: http://piwik.org/contact-professional-support

It isnt easy but you could try this… get ready to be a trailblazer…

[quote=lesjokolat]
It isnt easy but you could try this… get ready to be a trailblazer…

Thanks for the reply but why do I have to make tomcat run php applications? How will this help?

I am far from an expert in tomcat, in fact i would say im a poor user of it. My thought was if you could make php run on the tomcat, piwik could be installed locally and not on another webserver, it might also help as you are having import issues (server to server formatting or permission issues). I have no idea of your setup but seemed if by combining the 2 would simplify trying to bridge logs from tomcat into piwik…

regards

I tried this but it seems piwik’s minimum requirements is PHP version 5.1.3 or greater. But Tomcat runs with php version 4 or less. It does not run on php 5.

Maybe I can use --log-format-name or --log-format-regex option. Can someone tell me what are the parameters of these options? I tried to look at the phyton import_logs.py file but I didn’t find anything. I am not that good programmer and maybe I missed something?

I hope this is helpful! (This is still a work in progress and I encourage feed back or help! )

This was most useful in working the live regex custom log format option:

http://ksamuel.pythonanywhere.com/

if you know the valve variables from server.xml (tomcat), like:

common - %h %l %u %t “%r” %s %b
combined - %h %l %u %t “%r” %s %b “%{Referer}i” “%{User-Agent}i”

in my case I used:
pattern=’%h %S %t %s %b %D %m %U “%{User-Agent}i”’

I identified what was currently in the code pulling this from the import_log.py (so I had a clue about what I was attempting to do):

_HOST_PREFIX = ‘(?P[\w-.])(?::\d+)? ‘
_COMMON_LOG_FORMAT = (
’(?P\S+) \S+ \S+ [(?P.
?) (?P.?)] ‘
’"\S+ (?P.
?) \S+" (?P\S+) (?P\S+)’
)
_NCSA_EXTENDED_LOG_FORMAT = (_COMMON_LOG_FORMAT +
’ “(?P.?)" "(?P<user_agent>.?)”’
)
_S3_LOG_FORMAT = (
’\S+ (?P\S+) [(?P.?) (?P.?)] (?P\S+) ‘
’\S+ \S+ \S+ \S+ “\S+ (?P.?) \S+" (?P\S+) \S+ (?P\S+) ‘
’\S+ \S+ \S+ "(?P.
?)” “(?P<user_agent>.*?)”’
)
_ICECAST2_LOG_FORMAT = ( _NCSA_EXTENDED_LOG_FORMAT +
’ (?P<session_time>\S+)’
)

FORMATS = {
‘common’: RegexFormat(‘common’, _COMMON_LOG_FORMAT),
‘common_vhost’: RegexFormat(‘common_vhost’, _HOST_PREFIX + _COMMON_LOG_FORMAT),
‘ncsa_extended’: RegexFormat(‘ncsa_extended’, _NCSA_EXTENDED_LOG_FORMAT),
‘common_complete’: RegexFormat(‘common_complete’, _HOST_PREFIX + _NCSA_EXTENDED_LOG_FORMAT),
‘iis’: IisFormat(),
‘s3’: RegexFormat(‘s3’, _S3_LOG_FORMAT),
‘icecast2’: RegexFormat(‘icecast2’, _ICECAST2_LOG_FORMAT),
}

Then pieced this together:

(?P[\w-.])(?::\d+)? \S+ [(?P.?) (?P.?)] (?P\S+)? \S+ (?P\S+) (?P\S+) (?P.?) “(?P<user_agent>.*?)”

and looking at one log line:

Raw:
10.88.168.198 - [15/May/2013:19:55:38 +0000] 302 - 64 GET / “Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31”

match.group():
u’10.88.168.198 - [15/May/2013:19:55:38 +0000] 302 - 64 GET / “Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31”’

match.groupdict():
{u’date’: u’15/May/2013:19:55:38’, u’host’: u’10.88.168.198’, u’length’: u’64’, u’path’: u’/’, u’request’: u’GET’, u’status’: u’302’, u’timezone’: u’+0000’, u’user_agent’: u’Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31’}

and adding it back to --log-format-regex=’(?P[\w-.])(?::\d+)? \S+ [(?P.?) (?P.?)] (?P\S+)? \S+ (?P\S+) (?P\S+) (?P.?) “(?P<user_agent>.*?)”’

BOOM … Logs imported. although I’m having an issue with the actual browser type. I’ll update the final once I have it.

If the regular expressions work for “default” log type, or if you think this example would make a nice addition to our README file, please consider submitting pull request with your add! :slight_smile: