Custom Dimension: capture fails

I am trying to analyse the use of filters in an online search. Matomo is used as a logfile analysis tool, so the way to go is to analyse the URLs with their parameters.

Here’s a typical URL that is formed for a very simple search with just one filter:

DOMAIN/de/suchergebnisse?q=&tags=inspireidentifiziert

The expression to capture this:

.*/suchergebnisse\?.*(tags=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

Matomo does a good job here and the report reads:

tags=inspireidentifiziert

However, it gets more complicated when more than one filter of the same class is used. The URL that is formed then reads:

DOMAIN/de/suchergebnisse?tags%5B0%5D=besch%C3%A4ftigte&tags%5B1%5D=entgel-te&tags%5B2%5D=umsatz&tags%5B3%5D=betriebe&tags%5B4%5D=geleistete+arbeitsstunden

To capture these tags I use multiple expressions because only one capture group per expression is possible:

.*/suchergebnisse\?.*(tags%5B0%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

.*/suchergebnisse\?.*(tags%5B1%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

.*/suchergebnisse\?.*(tags%5B2%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

.*/suchergebnisse\?.*(tags%5B3%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

.*/suchergebnisse\?.*(tags%5B4%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

In this instance, Matomo only uses the first one and disregards the rest. It only captures:

tags[0]=beschäftigte

The other 4 uses of tags are disregarded and thus lost for the report.

I have tested the expressions in RegEx 101 (https://regex101.com/) and the capture works. So I can only surmise that there is something in Matomo that stops the analysis if the string is too similar.

Does anyone here have more information?

Hi @vanye
I suggest you use the site search feature, by defining the search query param:

Then the Search dashboar looks like this:

I am afraid that is not an option, as the site search tracking makes defining goals around the search an impossibility. The search query parameter cannot be used in the formulation of a goal if the site search tracking is activated.

Also, I do not see how this would enable me to capture the filter queries, as those have different parameters. The search terms are not a problem.

Best
Volker

Hi @vanye
Sorry, I read too quick.
Your search looks like to be in form of tags[0]tags[4]. Do you have only 5 search parameters?

There are six different search parameters, three of which can have multiple values. I thought that if I capture the first five or so of those I get like 99 per cent, as it is not very likely that more than 5 are used in normal searches.

What really puzzles me is that of those 6 search parameters for filters, the first is always captured, but as soon as they get “too similar” (?), they seem to be discarded. Here are all the capture groups that I use for the custom dimension: (If a user uses only one instance of a filter (e.g. “groups”, no number in brackets is added (first expression). But as soon as more instances are added, the number in brackets ([0], [1] etc.) is added (the following 6 expressions).

.*/suchergebnisse\?.*(groups=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(groups%5B0%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(groups%5B1%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(groups%5B2%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(groups%5B3%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(groups%5B4%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

.

*/suchergebnisse\?.*(tags=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(tags%5B0%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(tags%5B1%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(tags%5B2%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(tags%5B3%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(tags%5B4%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

.*/suchergebnisse\?.*(publisher_name=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

.

*/suchergebnisse\?.*(res_format=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(res_format%5B0%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(res_format%5B1%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(res_format%5B2%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(res_format%5B3%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*
.*/suchergebnisse\?.*(res_format%5B4%5D=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

.*/suchergebnisse\?.*(license_id=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

.*/suchergebnisse\?.*(transparency_law=[A-Za-z0-9äÄöÖüÜß%.,\-\_\+]+).*

The ones set up to capture the same tag in different instances, like

res_format[0] (=res_format%5B0%5D)
res_format[1](=res_format%5B1%5D)
res_format[2] (=res_format%5B2%5D)
... etc.

only differ with regard to the number in [ ]. Is this “too similar” for Matomo to differentiate?

Best
Volker

Hi @vanye
Not sure to understand what you mean…
In the dimension capture hint, I can read:

If multiple extractions are defined, the first extraction that matches is used.

Then your first match will be used.
You have to assign one tag per dimension (then 5 dimensions), one groups per dimension (then 5 dimensions), one res_format per dimension (then 5 dimensions).

I think also you can extract with simplier expression, eg. tags%5B0%5D=([^&]+)

Have you found any solution?