Writing Archiver for Page Actions, grouping by Custom variable

We want to write a custom Report for one of our customers.

Our customer delivers articles of an epaper in a webApp. The WebApp opens a dedicated Window for displaying such articles. The Page/Window Title of the Page displaying the article is always the same, e.g. “ArticleOutput”. The actual Page Title is tracked via a Custom Variable with scope ‘page’.

Our goal is, to have a “Page Titles” Report with the same metrics in the stock “Page Titles” Report, but, rather than reporting by page title, we want to report by the value of a certain custom variable.

Our first -successful- approach, was to get all values of this custom variable and for each of these values, get the ‘Page Titles’ report segmenting it by the value of this custom variable, and - after that - merge them together into the final report. We did this by calling API functions of the CustomVariables and Actions Plugin. Here is the code executed in our Reports API:

if ( $segment ) {
        $segment = $segment . ";" . $add_segment; 
      }
      else {
        $segment = $add_segment;
      }

      $newTable = new DataTable();

      $articleTable = \Piwik\Plugins\CustomVariables\API::getInstance()->getCustomVariablesValuesFromNameId(
        $idSite, $period, $date, $idSubtable = 4, $segment, $_leavePriceViewedColumn = false);

      foreach ($articleTable->getRows() as $articleRow) {
        
        $articleName = $articleRow->getColumn('label');
        $articleName = urlencode($articleName);
        
        $locSegment = $segment . ";" . "customVariablePageValue4%3D%3D" . $articleName;
        
        $pageTable = \Piwik\Plugins\Actions\API::getInstance()->getPageTitles(
          $idSite, $period, $date, $locSegment);
        $pageTable->filter('ReplaceColumnNames');
        
        foreach ($pageTable->getRows() as $newRow) {  // Merge rows
          foreach ($newRow as $key => $value) {
            if (!isset($articleRow[$key])) {
              $articleRow->addColumns(array($key => $value));
            }
          }
        } // end foreach $pageTable
        $newTable->addRow($articleRow);
      } // end foreach $articleTable...
      // echo $newTable;
      return $newTable;

However the Report is unacceptably slow. It takes minutes to load a day report, even when the data is prearchived. Reports over a longer periods (week,months,years) take even longer. This is perfectly understandable, as multiple, complex queries for each page title are executed.

Our approach to tackle the performance issue, is to write an Archiver, that aggregates Reports in the same way the Actions-plugin archiver does, except it aggregates this data grouped by the value of a certain custom variable rather than the page title.

Theoritcally, this approach should give me the exact same results as in the first approach, but a lot faster, as the archiver already has made the hard work.

I was successful in writing an archiver that does this. Except I can’t reproduce the exact values for the metrics that the original, slow report produced. Especially the number of exits, and thus the exit rate gave me headaches.

After investigating, what SQL request actually get executed on the Database, I found out that Piwik changes and encapsulates the original queries written in the archiving code before executing them, especially when using segmentation in the API requests.

I still can’t get my head round of how piwik changes these Queries.
Can I group by certain Custom Variables for certain Metrics?
How do Ranking Queries change the behaviour of my Queries?
How do Segmentations change my Queries?

Additionally, after doing some tests, the function updateActionsTableWithRowQuery(…) seems to be doing some logic, i still fail to understand.
I’d love to discuss what exactly is going on in the Actions Archiver face to face or via chat.

A colleague of mine has posted in the german subforums a similar question some time earlier:
https://forum.matomo.org/t/eigener-report-viel-zu-langsam/18468?u=tst_at_ppi