Capturing custom log params and capturing only records that contain the custom param


#1

I plan to use the Python log import script to import Apache logs. I want to import only lines that contain a custom parameter (I’ll call it ‘username’ for the sake of this example).

(1) Can I SKIP importing any lines that don’t include the username parameter?
(2) Can I assign the username parameter to a piwik custom variable (for later use in various reports)?

Thanks for anyone who can help me with this.


(Matthieu Aubry) #2

We have so far the following parameters:


--exclude-path=EXCLUDED_PATHS
                        Paths to exclude. Can be specified multiple times
  --exclude-path-from=EXCLUDE_PATH_FROM
                        Each line from this file is a path to exclude
  --include-path=INCLUDED_PATHS
                        Paths to include. Can be specified multiple times. If
                        not specified, all paths are included.
  --include-path-from=INCLUDE_PATH_FROM
                        Each line from this file is a path to include

I’m not sure if they match only the path or also the query string. Where is your custom param specified?


#3

Here is one line from the log, though I’ve obscured a few peripheral details.


nnn.nnn.nnn.nnn - - [19/May/2014:22:37:53 -0700] "GET /path/to/subdirectory/ HTTP/1.1" 200 22179 "http://members.domainobscured.com/path/subdirectory/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3; Tablet PC 2.0)" i=annamaria64770:2cf7cf5cccbb69c73f49e56aa8364a5a; t=annamaria64770:2cf7cf5cccbb69c73f49e56aa8364a5a

The “t” parameter is one I’m interested in. A cgi script reads it from a first party cookie, makes it into an environment variable to be picked up by Apache for the purposes of putting it in the log as a custom parameter. Unfortunately I don’t see how the –exclude-path switch would help me.

I’m willing to solve this problem and propose a solution, although at this point I’m completely new to your log import script.


(Matthieu Aubry) #4

What you want to achieve is currently not possible, but this is a very useful use case I believe, that we can address.

So maybe we could define a custom log format element, like it is done for tracking “Page generation time”.
See doc: https://github.com/piwik/piwik/tree/master/misc/log-analytics#import-page-speed-metric-from-logs

so maybe to track your custom parameter=value, we could do similarly, and then set it as a custom variable[/url] with “page” or “visit” scope, and pass it to the Piwik [url=http://developer.piwik.org/api-reference/tracking-api]Tracking API request.

If you are not a developer, maybe best next step would be to create ticket in our issue tracker with this feature request?

Here is list of all log analytics tickets currently opened: http://dev.piwik.org/trac/query?status=assigned&status=new&status=reopened&component=Log+Analytics+(import_logs.py)&group=priority&col=id&col=summary&col=component&col=owner&col=type&col=priority&col=time&order=priority


#5

Matt, I’m a developer but new to Piwik. I’ll look at how you did this sort of thing for “Page generation time” and how custom variables are managed and I’ll try to do this for myself first. Do you suggest I add a ticket anyway?


(Matthieu Aubry) #6

a ticket is very useful so we can keep track of technical discussion around this topic (and some other team members or users may join the discussion)


#7

I’ve added a new feature request ticket. I’m looking at the code this week, hoping I can solve this myself. I’ll post to Trak if I make progress. Thanks, Matt.


(Matthieu Aubry) #8

here is the ticket for anyone interested in this feature! capture custom log param(s); capture only records with custom param(s) · Issue #5252 · matomo-org/matomo · GitHub