I have a question about content grouping with RegEx URL extractions. (We use Matomo On-premise.)
The goal: we have a multi language site and I want to create a custom action dimension to be able to group the traffic of publications spread over different language pages.
The URLs are built like this:
Here are three example URLs covering a single article published in three different languages:
So, I’d need to be able to group and report any content by either their
- Language (en, de, fr, etc)
- Content category (section name, like news, articles, videos, etc)
- Content ID (various digits)
and combine these groupings in reports, like the followings:
ID 12305 (63 visits)
- en (30 visits)
- fr (12 visits)
- de (21 visits)
fr (french) (135000 visits)
- ID 1263 (123 visits)
- ID 124 (1241 visits)
- ID 4236 (1114 visits)
- en (3500 visits)
- fr (12000 visits)
- de (213400 visits)
I’ve successfully used the following RegEx (grouping only by content ID) to extract data from URLs (containing the news section) in Google Analytics, and it worked fine:
When I tested this in Matomo via Custom Dimensions (action dimension) (past reports were of course invalidated), it worked in some cases during this January, but even then, the results weren’t grouped as the regex syntax stated, but by “translated article name” which means there were as many instances as the number of different translations. This is bad. Maybe a bug or syntax error? Then after a few weeks it stopped working, even though I know there is related traffic.
Then I moved forward with testing, so I extended my regex to group by 2 digit language code (group1), content category (group2) and by content ID (group3) like this:
Could this logic with multiple capturing groups (subexpressions) work for the above explained goal or do I need to create separate custom dimensions for each of these groupings?
Would you please advise me how to solve this issue?
Is there a specific RegEx syntax to be used here?