Import_logs.py custom format with -log-format-regex

Hello,

I am trying to create a custom log-format-regex for my Apache SSL log, configured as follows:

CustomLog /var/webs/hesperia-web/log/ssl_access_log \
   "%t %h %{SSL_PROTOCOL}x %{SSL_CIPHER}x \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\""

Here are some log entries, for your reference:

[29/Apr/2022:18:42:01 +0000] 2001:648:2ffc:1115::30 TLSv1.3 TLS_AES_256_GCM_SHA384 "GET / HTTP/1.1" 302 460 "-" "curl/7.61.1"
[29/Apr/2022:18:42:01 +0000] 2001:648:2ffc:1115::30 TLSv1.3 TLS_AES_256_GCM_SHA384 "GET /release.php HTTP/1.1" 302 471 "-" "curl/7.61.1"
[29/Apr/2022:18:42:43 +0000] 193.190.144.8 TLSv1.3 TLS_AES_256_GCM_SHA384 "GET /images/umasep/realtime/result500.gif HTTP/1.1" 200 10750 "-" "Wget/1.20.3 (linux-gnu)"
[29/Apr/2022:18:42:45 +0000] 131.176.243.10 TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384 "POST /UpdateAgentCacheServlet?shortcircuit=false HTTP/1.1" 200 2 "-" "Java/1.8.0_222"
[29/Apr/2022:18:44:48 +0000] 167.99.209.234 TLSv1.3 TLS_AES_256_GCM_SHA384 "GET / HTTP/1.1" 302 460 "https://www.hesperia2.astro.noa.gr" "Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)"
[29/Apr/2022:18:44:51 +0000] 138.197.150.151 TLSv1.3 TLS_AES_256_GCM_SHA384 "GET / HTTP/1.1" 302 460 "https://www.hesperia2.astro.noa.gr" "Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)"
[29/Apr/2022:18:45:01 +0000] 2001:648:2ffc:1115::30 TLSv1.3 TLS_AES_256_GCM_SHA384 "GET /release.php HTTP/1.1" 302 471 "-" "curl/7.61.1"
[29/Apr/2022:18:45:01 +0000] 2001:648:2ffc:1115::30 TLSv1.3 TLS_AES_256_GCM_SHA384 "GET / HTTP/1.1" 302 460 "-" "curl/7.61.1"

I don’t have a clue on how to filter the %{SSL_PROTOCOL}x and %{SSL_CIPHER}x parts of the log.

I am trying with:

"\[(?P<date>.*?) (?P<timezone>.*?)\] (?P[\w-.])(?::\d+)? \S+ \S+ \"\S+ (?P<path>.*?) \S+\" (?P<status>\S+) (?P<length>\S+) \"(?P<referrer>.*?)\" \"(?P<user_agent>.*?)\""

i.e. by using simple \S+ for each of the two parts in question, but it won’t work.

Here is the output from running import_logs.py:

Traceback (most recent call last):
  File "/var/webs/matomo/www/misc/log-analytics/import_logs.py", line 2678, in <module>
    config = Configuration()
  File "/var/webs/matomo/www/misc/log-analytics/import_logs.py", line 1037, in __init__
    self._parse_args(self._create_parser(), argv)
  File "/var/webs/matomo/www/misc/log-analytics/import_logs.py", line 985, in _parse_args
    self.format = RegexFormat('custom', self.options.log_format_regex, self.options.log_date_format)
  File "/var/webs/matomo/www/misc/log-analytics/import_logs.py", line 201, in __init__
    self.regex = re.compile(regex)
  File "/usr/lib64/python3.6/re.py", line 233, in compile
    return _compile(pattern, flags)
  File "/usr/lib64/python3.6/re.py", line 301, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/usr/lib64/python3.6/sre_compile.py", line 562, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/lib64/python3.6/sre_parse.py", line 855, in parse
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
  File "/usr/lib64/python3.6/sre_parse.py", line 416, in _parse_sub
    not nested and not items))
  File "/usr/lib64/python3.6/sre_parse.py", line 669, in _parse
    len(char) + 2)
sre_constants.error: unknown extension ?P[ at position 37

Can anyone please let me know how to create a valid -log-format-regex for parsing this log file?

Of course, any additional correction to my log-format-regex is welcome!

Thanks a lot,
Nick

Anyone? No suggestions / hints whatsoever?

@innocraft, any suggestion?