Custom Dimension: capture fails

I am trying to analyse the use of filters in an online search. Matomo is used as a logfile analysis tool, so the way to go is to analyse the URLs with their parameters.

Here’s a typical URL that is formed for a very simple search with just one filter:

DOMAIN/de/suchergebnisse?q=&tags=inspireidentifiziert

The expression to capture this:

.*/suchergebnisse\?.*(tags=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

Matomo does a good job here and the report reads:

tags=inspireidentifiziert

However, it gets more complicated when more than one filter of the same class is used. The URL that is formed then reads:

DOMAIN/de/suchergebnisse?tags%5B0%5D=besch%C3%A4ftigte&tags%5B1%5D=entgel-te&tags%5B2%5D=umsatz&tags%5B3%5D=betriebe&tags%5B4%5D=geleistete+arbeitsstunden

To capture these tags I use multiple expressions because only one capture group per expression is possible:

.*/suchergebnisse\?.*(tags%5B0%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

.*/suchergebnisse\?.*(tags%5B1%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

.*/suchergebnisse\?.*(tags%5B2%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

.*/suchergebnisse\?.*(tags%5B3%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

.*/suchergebnisse\?.*(tags%5B4%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

In this instance, Matomo only uses the first one and disregards the rest. It only captures:

tags[0]=beschäftigte

The other 4 uses of tags are disregarded and thus lost for the report.

I have tested the expressions in RegEx 101 (https://regex101.com/) and the capture works. So I can only surmise that there is something in Matomo that stops the analysis if the string is too similar.

Does anyone here have more information?

1 Like

Hi @vanye
I suggest you use the site search feature, by defining the search query param:

Then the Search dashboar looks like this:

1 Like

I am afraid that is not an option, as the site search tracking makes defining goals around the search an impossibility. The search query parameter cannot be used in the formulation of a goal if the site search tracking is activated.

Also, I do not see how this would enable me to capture the filter queries, as those have different parameters. The search terms are not a problem.

Best
Volker

Hi @vanye
Sorry, I read too quick.
Your search looks like to be in form of tags[0]tags[4]. Do you have only 5 search parameters?

1 Like

There are six different search parameters, three of which can have multiple values. I thought that if I capture the first five or so of those I get like 99 per cent, as it is not very likely that more than 5 are used in normal searches.

What really puzzles me is that of those 6 search parameters for filters, the first is always captured, but as soon as they get “too similar” (?), they seem to be discarded. Here are all the capture groups that I use for the custom dimension: (If a user uses only one instance of a filter (e.g. “groups”, no number in brackets is added (first expression). But as soon as more instances are added, the number in brackets ([0], [1] etc.) is added (the following 6 expressions).

.*/suchergebnisse\?.*(groups=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(groups%5B0%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(groups%5B1%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(groups%5B2%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(groups%5B3%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(groups%5B4%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

.

*/suchergebnisse\?.*(tags=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(tags%5B0%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(tags%5B1%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(tags%5B2%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(tags%5B3%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(tags%5B4%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

.*/suchergebnisse\?.*(publisher_name=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

.

*/suchergebnisse\?.*(res_format=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(res_format%5B0%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(res_format%5B1%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(res_format%5B2%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(res_format%5B3%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(res_format%5B4%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

.*/suchergebnisse\?.*(license_id=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

.*/suchergebnisse\?.*(transparency_law=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

The ones set up to capture the same tag in different instances, like

res_format[0] (=res_format%5B0%5D)
res_format[1](=res_format%5B1%5D)
res_format[2] (=res_format%5B2%5D)
... etc.

only differ with regard to the number in [ ]. Is this “too similar” for Matomo to differentiate?

Best
Volker

Hi @vanye
Not sure to understand what you mean…
In the dimension capture hint, I can read:

If multiple extractions are defined, the first extraction that matches is used.

Then your first match will be used.
You have to assign one tag per dimension (then 5 dimensions), one groups per dimension (then 5 dimensions), one res_format per dimension (then 5 dimensions).

I think also you can extract with simplier expression, eg. tags%5B0%5D=([^&]+)

1 Like

Have you found any solution?

1 Like

No solution yet, I am afraid.

About “first extraction that matches” …

Yes, I understand that. That is why I have one expression for each parameter that I want to capture:

.*/suchergebnisse\?.*(res_format%5B0%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
to capture

res_format%5B0%5D=http%3A%2F%2Fpublications.europa.eu%2Fresource%2Fauthority%2Ffile-type%2FHTML

and

.*/suchergebnisse\?.*(res_format%5B1%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
to capture

res_format%5B1%5D=http%3A%2F%2Fpublications.europa.eu%2Fresource%2Fauthority%2Ffile-type%2FWMS_SRVC

and so on …

This should work, because in the same dimension I also have

.*/suchergebnisse\?.*(publisher_name=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
to capture
publisher_name=Staatskanzlei

or

.*/suchergebnisse\?.*(license_id=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).
to capture
license_id=http%3A%2F%2Fdcat-ap.de%2Fdef%2Flicenses%2Fodbl

The latter two work. Of the first two, only the first one works. Why?

And no, the simplification of the expressions would not work, as they would capture other links that I do not want in the report.

Here’s a screengrab of some of the expressions for the dimension in Matomo:

EDIT: Another question: I see you used an exclusion (^). The last time I tried one of those, Matomo wouldn’t accept them. Has that been corrected or is this simply a difference between goals and dimensions (like e.g. +)?

1 Like

Hi @vanye
If regular expression doesn’t work as expected (eg. exclusion doesn’t work), you should create a ticket in the GitHub repo:

But first you should try if this work as the capture of all except the amperstand should work better than the capture of a complicated list of chars :wink:

If I understand well your problem, you created 15 action dimensions based on the capture of URL parameters thanks to regular expression.
Did you try if, for each custom dimension it could work “alone” or not ?
eg. just try to track:

  • .../suchergebnisse\?not-to-be-captured&groups%5B0%5D=to-be-captured
  • then .../suchergebnisse\?not-to-be-captured&groups%5B0%5D=to-be-captured&not-to-be-captured…?
  • then …
1 Like

Hi @heurteph-ei

I created 20 expressions for this one dimension. I did not try them single yet, as I think that the pattern by which they work or don’t is clear.

But you are right: to properly test is always key, so I’ll try this and delete some of the expressions to see if the others work then.

You see, this is the thing that I find so irritating: The expressions are all basically the same, but only some of them work. To me it looks as if Matomo cannot tell them apart if they are too similar.

1 Like

Hi @vanye
As previously written:

If multiple extractions are defined, the first extraction that matches is used.

Then if you have * .../suchergebnisse\?not-to-be-captured&groups%5B0%5D=to-be-captured-0&groups%5B1%5D=to-be-captured-1&groups%5B2%5D=to-be-captured-2 (and I think that in case of groups%5B2%5D you always have also 1 and 0), then only groups%5B0%5D=to-be-captured-0 will be catched…

1 Like

@heurteph-ei ,

but there is only one expression that can match, because I use one per capture. So, if I have the following link:

DOMAIN/de/suchergebnisse?tags%5B0%5D=besch%C3%A4ftigte&tags%5B1%5D=entgel-te&tags%5B2%5D=umsatz&tags%5B3%5D=betriebe&tags%5B4%5D=geleistete+arbeitsstunden

the expressions should work as follows:

.*/suchergebnisse\?.*(tags%5B0%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
captures
tags[0]=beschäftigte

.*/suchergebnisse\?.*(tags%5B1%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
captures
tags[1]=entgelte

.*/suchergebnisse\?.*(tags%5B2%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
captures
tags[2]=umsatz

I do not see why this should not work. The condition

If multiple extractions are defined, the first extraction that matches is used.

is met as there is only one extraction per expression.

Or do you mean that per dimension the parsing of the link stops as soon as the first match ist found?

If so, that would be bad! Do you have any idea how to get around this (apart from using 20 dimensions just for this)?

1 Like

That’s it! Don’t forget that for a specified custom dimension (for example custom dimension 1), you can track only 1 single value for the current tracked event. In your case, what would you track when you have multiple values? Share the value you would like Matomo stores for DOMAIN/de/suchergebnisse?tags%5B0%5D=besch%C3%A4ftigte&tags%5B1%5D=entgel-te&tags%5B2%5D=umsatz&tags%5B3%5D=betriebe&tags%5B4%5D=geleistete+arbeitsstunden?

1 Like

@heurteph-ei

For your example link

DOMAIN/de/suchergebnisse?tags%5B0%5D=besch%C3%A4ftigte&tags%5B1%5D=entgelte&tags%5B2%5D=umsatz&tags%5B3%5D=betriebe&tags%5B4%5D=geleistete+arbeitsstunden?

I would like Matomo to store the following in my custom dimension:

tags[0]=beschäftigte

tags[1]=entgelte

tags[2]=umsatz

tags[3]=betriebe

tags[4]=geleistete arbeitsstunden

That is why I added an expression for each of those in the custom dimension:

.*/suchergebnisse\?.*(tags%5B0%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(tags%5B1%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(tags%5B2%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(tags%5B3%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(tags%5B4%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

To my thinking, this would be logical: I have 20 expressions in my dimension and Matomo should run the link through each of those instead of stopping as soon as it finds one match. I cannot see a good reason why Matomo should stop there.

So now I am at a loss how to track the use of those filters.

Hi @vanye
As previouly written, you can store only one value per action and custom dimension. You want to store 5 different values at the same time, this is not possible.

1 Like

Hi @heurteph-ei ,

OK, I tested some more and my tests show that you are right. It is always only the first parameter per dimension that is parsed. After that the dimension just stops.

I just cannot fathom why! Surely I am not the first user to analyse a search function with Matomo.

Is there a reason the dimensions were designed this way? Or is this something you need a premium plug-in for?

By the way, the pages report has a similar problem. Trying to filter out the parameter for the pages report with e.g.

tags.*
(Excluded parameters)

gives for the following link

https://open.rlp.de/de/suchergebnisse?q=mainz&tags%5B0%5D=mainz.+stadt&tags%5B1%5D=museum&tags%5B2%5D=universit%C3%A4tsmedizin&tags%5B3%5D=trier.+stadt&tags%5B4%5D=landau+in+der+pfalz.+stadt

It excludes all of them, but only one per instance. And it writes all of them into the pages report.

Hi @vanye

Because custom dimension can only store one single value at the same time. Like traffic lights that can be red or green, not red and green at the same time…

How did you proceed?

1 Like

Hi @heurteph-ei

Because custom dimension can only store one single value at the same time. Like traffic lights that can be red or green, not red and green at the same time…

My question meant that software is developed to specifications. And that I cannot understand why this limitation was a requirement in the specifications, without even giving users the choice (e.g. via toggle). We use Matomo on premise, so data storage would not be a problem.

How did you proceed?

I added

tags.*

to the parameters to be excluded (among others) from the pages report. I added a screenshot of the result above.

But maybe this point is moot, as the lacking support of multiple values in custom dimensions might mean that I’ll have to analyse these via the pages report. Which is decidedly uncomfortable, but better than having no data at all about filter use …

I probably sound somewhat frustrated, but I hope you know that it has nothing to do with you. Quite the opposite - I really appreciate your help!

Hi @vanye

Sorry, but I don’t see in the page report, how you can remove query parameters.Is it in Global list of Query URL parameters to exclude of the :gear: > Measurables > Settings page? If so, I think that you have to surround your regular expression by slashes (If I read well the input hint)… eg. /tags.*/

To come back to your initial need, I suggest you use custom variables, as they can store multiple values… Even if their use will be deprecated in the future.

For the tracking itself, I suggest you track custom variable client side (not sure if it is possible server side).

1 Like

Hi @heurteph-ei

yes, I already had a look at Custom Variables but discarded the idea because it is marked as deprecated. But if it’ll be available a little longer, it might be a good idea to look into it. Thank you!

As to the exclusion of query parameters for the pages report, go to

:gear: > Measurables > Manage

and then edit the website. It looks like this (switched my account to English for the screen-shot):

As you can see, I have tried to generalize and have everything after the “?” truncated. Let’s see how that works out tomorrow (logfiles are imported over night).

From my experience it is not necessary to use regex syntax here, but you can. My long list of every parameter to exclude did not work out, as mentioned above.