Memory exhausted (?) during Archive

FYI: we plan to make Piwik more easier to use with XHProf see this ticket for future reference 301 Moved Permanently

On your httpd.conf can you paste the line… pertaining to

RLimitMEM to ???(the number)

I think you are hitting an apache memory limitation…

@lesjokolat: As I have described in the beginning of the thread, I am not using Apache, but NGiNX. The limits you mentioned earlier in mysql had already been adjusted in the past and no issues seem to arise from there.

@matt, thomas-piwik: I have added the code you describe in core/CronArchive.php, and here it is (lines 169 onwards - that’s the part of the code that matches your description more accurately):


    public function runScheduledTasksInTrackerMode()
    {

      xhprof_enable();

        $this->initPiwikHost();
        $this->initLog();
        $this->initCore();
        $this->initTokenAuth();
        $this->logInitInfo();
        $this->checkPiwikUrlIsValid();
        $this->runScheduledTasks();

     $data = xhprof_disable();

     $XHPROF_ROOT = '/usr/share/xhprof';

     include_once $XHPROF_ROOT . "/xhprof_lib/xhprof_lib.php";
     include_once $XHPROF_ROOT . "/xhprof_lib/xhprof_runs.php";

     $xhprof_runs = new XHProfRuns_Default();

    // Save the run under a namespace "xhprof".

    $run_id = $xhprof_runs->save_run($data, "xhprof");
    echo "http://wstat1.noa.gr/xhprof/index.php?run={$run_id}&source=xhprof_testing\n";

    }

I have run Archiving with the new code, but I don’t see any link at the end:


# /usr/bin/php /var/webs/wwwpiwik/www/console core:archive --url=http://wstat1.noa.gr
INFO CoreConsole[2014-08-14 12:46:31] [c685b] ---------------------------
INFO CoreConsole[2014-08-14 12:46:31] [c685b] INIT
INFO CoreConsole[2014-08-14 12:46:31] [c685b] Piwik is installed at: http://wstat1.noa.gr/index.php
INFO CoreConsole[2014-08-14 12:46:31] [c685b] Running Piwik 2.4.1 as Super User
INFO CoreConsole[2014-08-14 12:46:33] [c685b] ---------------------------
INFO CoreConsole[2014-08-14 12:46:33] [c685b] NOTES
INFO CoreConsole[2014-08-14 12:46:33] [c685b] - Reports for today will be processed at most every 21600 seconds. You can change this value in Piwik UI > Settings > General Settings.
INFO CoreConsole[2014-08-14 12:46:33] [c685b] - Reports for the current week/month/year will be refreshed at most every 3600 seconds.
INFO CoreConsole[2014-08-14 12:46:33] [c685b] - Archiving was last executed without error 22 hours 45 min ago
INFO CoreConsole[2014-08-14 12:47:03] [c685b] - Will process 1 websites with new visits since 22 hours 45 min , IDs: 1
INFO CoreConsole[2014-08-14 12:47:03] [c685b] - Will process 1 other websites because some old data reports have been invalidated (eg. using the Log Import script) , IDs: 1
INFO CoreConsole[2014-08-14 12:47:03] [c685b] ---------------------------
INFO CoreConsole[2014-08-14 12:47:03] [c685b] START
INFO CoreConsole[2014-08-14 12:47:03] [c685b] Starting Piwik reports archiving...
INFO CoreConsole[2014-08-14 12:54:37] [c685b] Archived website id = 1, period = day, 644386 visits in last last52 days, 1498 visits today, Time elapsed: 453.337s
INFO CoreConsole[2014-08-14 13:26:27] [c685b] Archived website id = 1, period = week, 1816001 visits in last last35 weeks, 36454 visits this week, Time elapsed: 1909.915s
ERROR CoreConsole[2014-08-14 14:04:32] [c685b] Got invalid response from API request: http://wstat1.noa.gr/index.php?module=API&method=API.get&idSite=1&period=month&date=last35&format=php&token_auth=a0709b20762bae2088c915aa19a461d2&trigger=archivephp. Response was 'PHP Fatal error:  Allowed memory size of 5368709120 bytes exhausted (tried to allocate 8208 bytes) in /var/webs/wwwpiwik/www/core/DataTable.php on line 1155 '
ERROR CoreConsole[2014-08-14 14:04:32] [c685b] Got invalid response from API request: http://wstat1.noa.gr/index.php?module=API&method=API.get&idSite=1&period=month&date=last35&format=php&token_auth=a0709b20762bae2088c915aa19a461d2&trigger=archivephp. Response was 'PHP Fatal error:  Allowed memory size of 5368709120 bytes exhausted (tried to allocate 8208 bytes) in /var/webs/wwwpiwik/www/core/DataTable.php on line 1155 '
INFO CoreConsole[2014-08-14 14:04:33] [c685b] Archived website id = 1, period = month, 0 visits in last last35 months, 0 visits this month, Time elapsed: 2285.563s
ERROR CoreConsole[2014-08-14 14:04:49] [c685b] Got invalid response from API request: http://wstat1.noa.gr/index.php?module=API&method=API.get&idSite=1&period=year&date=last7&format=php&token_auth=a0709b20762bae2088c915aa19a461d2&trigger=archivephp. The response was empty. This usually means a server error. This solution to this error is generally to increase the value of 'memory_limit' in your php.ini file. Please check your Web server Error Log file for more details.
ERROR CoreConsole[2014-08-14 14:04:49] [c685b] Got invalid response from API request: http://wstat1.noa.gr/index.php?module=API&method=API.get&idSite=1&period=year&date=last7&format=php&token_auth=a0709b20762bae2088c915aa19a461d2&trigger=archivephp. The response was empty. This usually means a server error. This solution to this error is generally to increase the value of 'memory_limit' in your php.ini file. Please check your Web server Error Log file for more details.
INFO CoreConsole[2014-08-14 14:04:49] [c685b] Archived website id = 1, period = year, 0 visits in last last7 years, 0 visits this year, Time elapsed: 15.873s
INFO CoreConsole[2014-08-14 14:04:49] [c685b] Archived website id = 1, 4 API requests, Time elapsed: 4664.864s [1/1 done]
INFO CoreConsole[2014-08-14 14:04:51] [c685b] Done archiving!
INFO CoreConsole[2014-08-14 14:04:51] [c685b] ---------------------------
INFO CoreConsole[2014-08-14 14:04:51] [c685b] SUMMARY
INFO CoreConsole[2014-08-14 14:04:51] [c685b] Total visits for today across archived websites: 1498
INFO CoreConsole[2014-08-14 14:04:51] [c685b] Archived today's reports for 1 websites
INFO CoreConsole[2014-08-14 14:04:51] [c685b] Archived week/month/year for 1 websites
INFO CoreConsole[2014-08-14 14:04:51] [c685b] Skipped 0 websites: no new visit since the last script execution
INFO CoreConsole[2014-08-14 14:04:51] [c685b] Skipped 0 websites day archiving: existing daily reports are less than 21600 seconds old
INFO CoreConsole[2014-08-14 14:04:51] [c685b] Skipped 0 websites week/month/year archiving: existing periods reports are less than 3600 seconds old
INFO CoreConsole[2014-08-14 14:04:51] [c685b] Total API requests: 4
INFO CoreConsole[2014-08-14 14:04:51] [c685b] done: 1/1 100%, 1498 vtoday, 1 wtoday, 1 wperiods, 4 req, 4667166 ms, 2 errors.
INFO CoreConsole[2014-08-14 14:04:51] [c685b] Time elapsed: 4667.166s
INFO CoreConsole[2014-08-14 14:04:51] [c685b] ---------------------------
INFO CoreConsole[2014-08-14 14:04:51] [c685b] SCHEDULED TASKS
INFO CoreConsole[2014-08-14 14:04:51] [c685b] Starting Scheduled tasks... 
INFO CoreConsole[2014-08-14 14:06:07] [c685b] task,output
Piwik\Plugins\CoreAdminHome\Tasks.purgeOutdatedArchives,Time elapsed: 38.058s
Piwik\Plugins\PrivacyManager\Tasks.deleteReportData,Time elapsed: 0.050s
Piwik\Plugins\PrivacyManager\Tasks.deleteLogData,Time elapsed: 0.001s
Piwik\Plugins\CorePluginsAdmin\Tasks.clearAllCacheEntries,Time elapsed: 0.103s
Piwik\Plugins\CorePluginsAdmin\Tasks.sendNotificationIfUpdatesAvailable,Time elapsed: 0.020s
Piwik\Plugins\CoreAdminHome\Tasks.optimizeArchiveTable,Time elapsed: 35.141s
Piwik\Plugins\CoreUpdater\Tasks.sendNotificationIfUpdateAvailable,Time elapsed: 0.021s
INFO CoreConsole[2014-08-14 14:06:07] [c685b] done
INFO CoreConsole[2014-08-14 14:06:07] [c685b] ---------------------------
INFO CoreConsole[2014-08-14 14:06:07] [c685b] ---------------------------
INFO CoreConsole[2014-08-14 14:06:07] [c685b] SUMMARY OF ERRORS
INFO CoreConsole[2014-08-14 14:06:07] [c685b] Error: Got invalid response from API request: http://wstat1.noa.gr/index.php?module=API&method=API.get&idSite=1&period=month&date=last35&format=php&token_auth=a0709b20762bae2088c915aa19a461d2&trigger=archivephp. Response was 'PHP Fatal error:  Allowed memory size of 5368709120 bytes exhausted (tried to allocate 8208 bytes) in /var/webs/wwwpiwik/www/core/DataTable.php on line 1155 '
INFO CoreConsole[2014-08-14 14:06:07] [c685b] Error: Got invalid response from API request: http://wstat1.noa.gr/index.php?module=API&method=API.get&idSite=1&period=year&date=last7&format=php&token_auth=a0709b20762bae2088c915aa19a461d2&trigger=archivephp. The response was empty. This usually means a server error. This solution to this error is generally to increase the value of 'memory_limit' in your php.ini file. Please check your Web server Error Log file for more details.
ERROR CoreConsole[2014-08-14 14:06:07] [c685b] 2 total errors during this script execution, please investigate and try and fix these errors.
ERROR CoreConsole[2014-08-14 14:06:07] [c685b] 2 total errors during this script execution, please investigate and try and fix these errors.

As you can see above, now only daily reports are built successfully. Week/Month/Year report building is infeasible.

I have tried visiting: http://wstat1.noa.gr/xhprof/index.php?run=c685b&source=xhprof_testing, assuming that run_id=c685b, but the web page states:

Run Report
Run #c685b: Invalid Run Id = c685b

Please correct me in whatever you think is badly set up and advise on how to proceed with profiling/debugging the Archiving procedure.

By the way, how do I determine the run_id, if it’s not displayed?

Thanks,
Nick

It shouldn’t be in runScheduledTasksInTrackerMode(). Try:


    public function __construct($piwikUrl = false)
    {
        xhprof_enable();

        $this->initLog();
        $this->initPiwikHost($piwikUrl);
    }

    /**
     * Initializes and runs the cron archiver.
     */
    public function main()
    {
        $this->init();
        $this->run();
        $this->runScheduledTasks();

        $data = xhprof_disable();

        $XHPROF_ROOT = '/usr/share/xhprof';

        include_once $XHPROF_ROOT . "/xhprof_lib/xhprof_lib.php";
        include_once $XHPROF_ROOT . "/xhprof_lib/xhprof_runs.php";

        $xhprof_runs = new \XHProfRuns_Default();

        // Save the run under a namespace "xhprof".

        $run_id = $xhprof_runs->save_run($data, "xhprof");
        echo "http://wstat1.noa.gr/xhprof/index.php?run={$run_id}&source=xhprof_testing\n";

        $this->end();
    }

what is the path of your php.ini you upped the memory on?

It is /etc/php.ini; this is used for standalone php scripts (cli), as the Archive procedure:


memory_limit = 5120M

For webpage php scripts (run under nginx), the parameter:


php_admin_value[memory_limit] = 512M

in /etc/php-fpm.d/www.conf is being used.

What company is your hosting provider?

Can you also check do you have this file on your server below?

/etc/php5/apache2

If so can you up the memory setting there?

VPS service is provided by GRnet (you can read more here: GRNET Website | GRNET).

There is no such file. The files I mentioned are the only ones with php (memory-related and other) settings.

OK, I have made the suggested changes and upgraded to 2.5.0 as well.

Now, the Archive run provides an xhprof link, but it still does not work; Visiting the link, the webpage shows:

Run Report
Run #53f09b28ece24: Invalid Run Id = 53f09b28ece24

Nevertheless, I found file: 53f09b28ece24.xhprof.xhprof in /tmp/xhprof/. You can download it (gzipped) here:

http://iweb.noa.gr/files/53f09b28ece24.xhprof.xhprof.gz

(I guess I should customize somewhere the xhprof web GUI to display the above data? Where? How?)

In any case, the xhprof output would not be of much use to me; I hope it helps you in understanding what is happening.

I would also like to comment that the Archiving process is very resource intensive, esp. if there is no bug in it. I would urge you to modify it in a way that is not so resource intensive and can smartly process whatever number of visits/pageviews with the resources available on the host, e.g. at the expense of time to be consumed. Alternatively, you could support two modes of operation: 1/current: optimize performance 2/suggested: minimize system resources usage.

Here is the whole output of the Archive process:


# /usr/bin/php /var/webs/wwwpiwik/www/console core:archive --url=http://wstat1.noa.gr
INFO CoreConsole[2014-08-17 08:09:08] [a7fae] ---------------------------
INFO CoreConsole[2014-08-17 08:09:08] [a7fae] INIT
INFO CoreConsole[2014-08-17 08:09:08] [a7fae] Piwik is installed at: http://wstat1.noa.gr/index.php
INFO CoreConsole[2014-08-17 08:09:08] [a7fae] Running Piwik 2.5.0 as Super User
INFO CoreConsole[2014-08-17 08:09:10] [a7fae] ---------------------------
INFO CoreConsole[2014-08-17 08:09:10] [a7fae] NOTES
INFO CoreConsole[2014-08-17 08:09:10] [a7fae] - Reports for today will be processed at most every 21600 seconds. You can change this value in Piwik UI > Settings > General Settings.
INFO CoreConsole[2014-08-17 08:09:10] [a7fae] - Reports for the current week/month/year will be refreshed at most every 3600 seconds.
INFO CoreConsole[2014-08-17 08:09:10] [a7fae] - Archiving was last executed without error 3 days 18 hours ago
INFO CoreConsole[2014-08-17 08:09:27] [a7fae] - Will process 1 websites with new visits since 3 days 18 hours , IDs: 1
INFO CoreConsole[2014-08-17 08:09:27] [a7fae] - Will process 1 other websites because some old data reports have been invalidated (eg. using the Log Import script) , IDs: 1
INFO CoreConsole[2014-08-17 08:09:27] [a7fae] ---------------------------
INFO CoreConsole[2014-08-17 08:09:27] [a7fae] START
INFO CoreConsole[2014-08-17 08:09:27] [a7fae] Starting Piwik reports archiving...
INFO CoreConsole[2014-08-17 08:19:35] [a7fae] Archived website id = 1, period = day, 648340 visits in last last52 days, 1462 visits today, Time elapsed: 607.360s
INFO CoreConsole[2014-08-17 10:55:29] [a7fae] Archived website id = 1, period = week, 1878590 visits in last last37 weeks, 66583 visits this week, Time elapsed: 9354.430s
ERROR CoreConsole[2014-08-17 11:31:31] [a7fae] Got invalid response from API request: http://wstat1.noa.gr/index.php?module=API&method=API.get&idSite=1&period=month&date=last38&format=php&token_auth=a0709b20762bae2088c915aa19a461d2&trigger=archivephp. Response was 'PHP Fatal error:  Allowed memory size of 5368709120 bytes exhausted (tried to allocate 71 bytes) in /var/webs/wwwpiwik/www/core/DataTable.php on line 1154 '
ERROR CoreConsole[2014-08-17 11:31:31] [a7fae] Got invalid response from API request: http://wstat1.noa.gr/index.php?module=API&method=API.get&idSite=1&period=month&date=last38&format=php&token_auth=a0709b20762bae2088c915aa19a461d2&trigger=archivephp. Response was 'PHP Fatal error:  Allowed memory size of 5368709120 bytes exhausted (tried to allocate 71 bytes) in /var/webs/wwwpiwik/www/core/DataTable.php on line 1154 '
INFO CoreConsole[2014-08-17 11:31:31] [a7fae] Archived website id = 1, period = month, 0 visits in last last38 months, 0 visits this month, Time elapsed: 2161.587s
ERROR CoreConsole[2014-08-17 12:07:04] [a7fae] Got invalid response from API request: http://wstat1.noa.gr/index.php?module=API&method=API.get&idSite=1&period=year&date=last7&format=php&token_auth=a0709b20762bae2088c915aa19a461d2&trigger=archivephp. Response was 'PHP Fatal error:  Allowed memory size of 5368709120 bytes exhausted (tried to allocate 8208 bytes) in /var/webs/wwwpiwik/www/core/DataTable.php on line 1154 '
ERROR CoreConsole[2014-08-17 12:07:04] [a7fae] Got invalid response from API request: http://wstat1.noa.gr/index.php?module=API&method=API.get&idSite=1&period=year&date=last7&format=php&token_auth=a0709b20762bae2088c915aa19a461d2&trigger=archivephp. Response was 'PHP Fatal error:  Allowed memory size of 5368709120 bytes exhausted (tried to allocate 8208 bytes) in /var/webs/wwwpiwik/www/core/DataTable.php on line 1154 '
INFO CoreConsole[2014-08-17 12:07:05] [a7fae] Archived website id = 1, period = year, 0 visits in last last7 years, 0 visits this year, Time elapsed: 2133.476s
INFO CoreConsole[2014-08-17 12:07:05] [a7fae] Archived website id = 1, 4 API requests, Time elapsed: 14256.985s [1/1 done]
INFO CoreConsole[2014-08-17 12:07:05] [a7fae] Done archiving!
INFO CoreConsole[2014-08-17 12:07:05] [a7fae] ---------------------------
INFO CoreConsole[2014-08-17 12:07:05] [a7fae] SUMMARY
INFO CoreConsole[2014-08-17 12:07:05] [a7fae] Total visits for today across archived websites: 1462
INFO CoreConsole[2014-08-17 12:07:05] [a7fae] Archived today's reports for 1 websites
INFO CoreConsole[2014-08-17 12:07:05] [a7fae] Archived week/month/year for 1 websites
INFO CoreConsole[2014-08-17 12:07:05] [a7fae] Skipped 0 websites: no new visit since the last script execution
INFO CoreConsole[2014-08-17 12:07:05] [a7fae] Skipped 0 websites day archiving: existing daily reports are less than 21600 seconds old
INFO CoreConsole[2014-08-17 12:07:05] [a7fae] Skipped 0 websites week/month/year archiving: existing periods reports are less than 3600 seconds old
INFO CoreConsole[2014-08-17 12:07:05] [a7fae] Total API requests: 4
INFO CoreConsole[2014-08-17 12:07:05] [a7fae] done: 1/1 100%, 1462 vtoday, 1 wtoday, 1 wperiods, 4 req, 14257983 ms, 2 errors.
INFO CoreConsole[2014-08-17 12:07:05] [a7fae] Time elapsed: 14257.983s
INFO CoreConsole[2014-08-17 12:07:05] [a7fae] ---------------------------
INFO CoreConsole[2014-08-17 12:07:05] [a7fae] SCHEDULED TASKS
INFO CoreConsole[2014-08-17 12:07:05] [a7fae] Starting Scheduled tasks... 
INFO CoreConsole[2014-08-17 12:08:08] [a7fae] task,output
Piwik\Plugins\CoreAdminHome\Tasks.purgeOutdatedArchives,Time elapsed: 24.244s
Piwik\Plugins\PrivacyManager\Tasks.deleteReportData,Time elapsed: 0.021s
Piwik\Plugins\PrivacyManager\Tasks.deleteLogData,Time elapsed: 0.001s
Piwik\Plugins\CorePluginsAdmin\Tasks.clearAllCacheEntries,Time elapsed: 0.092s
Piwik\Plugins\CorePluginsAdmin\Tasks.sendNotificationIfUpdatesAvailable,Time elapsed: 0.016s
Piwik\Plugins\CoreAdminHome\Tasks.optimizeArchiveTable,Time elapsed: 29.025s
Piwik\Plugins\CoreUpdater\Tasks.sendNotificationIfUpdateAvailable,Time elapsed: 0.024s
INFO CoreConsole[2014-08-17 12:08:08] [a7fae] done
INFO CoreConsole[2014-08-17 12:08:08] [a7fae] ---------------------------
http://wstat1.noa.gr/xhprof/index.php?run=53f09b28ece24&source=xhprof_testing
INFO CoreConsole[2014-08-17 12:08:09] [a7fae] ---------------------------
INFO CoreConsole[2014-08-17 12:08:09] [a7fae] SUMMARY OF ERRORS
INFO CoreConsole[2014-08-17 12:08:09] [a7fae] Error: Got invalid response from API request: http://wstat1.noa.gr/index.php?module=API&method=API.get&idSite=1&period=month&date=last38&format=php&token_auth=a0709b20762bae2088c915aa19a461d2&trigger=archivephp. Response was 'PHP Fatal error:  Allowed memory size of 5368709120 bytes exhausted (tried to allocate 71 bytes) in /var/webs/wwwpiwik/www/core/DataTable.php on line 1154 '
INFO CoreConsole[2014-08-17 12:08:09] [a7fae] Error: Got invalid response from API request: http://wstat1.noa.gr/index.php?module=API&method=API.get&idSite=1&period=year&date=last7&format=php&token_auth=a0709b20762bae2088c915aa19a461d2&trigger=archivephp. Response was 'PHP Fatal error:  Allowed memory size of 5368709120 bytes exhausted (tried to allocate 8208 bytes) in /var/webs/wwwpiwik/www/core/DataTable.php on line 1154 '
ERROR CoreConsole[2014-08-17 12:08:09] [a7fae] 2 total errors during this script execution, please investigate and try and fix these errors.
ERROR CoreConsole[2014-08-17 12:08:09] [a7fae] 2 total errors during this script execution, please investigate and try and fix these errors.

Your feedback will be appreciated.

Actually, the link should be http://wstat1.noa.gr/xhprof/index.php?run=53f09b28ece24&source=xhprof which has a different “source” parameter. Should work then. The run only includes the needed time and not the used memory see solution in code below. I just remembered the Archiver triggers multiple other URL’s via web or command line that are more interesting to profile. The problem is this can be quite difficult since the script would not end regularly (because of memory) so your recorded XHProf data would never be written. This is why you probably have to wrap the code in a register_shutdown_function.

Try to put this in index.php before the line “require_once PIWIK_INCLUDE_PATH . ‘/core/dispatch.php’;”


if(!defined('PIWIK_PRINT_ERROR_BACKTRACE')) {
    define('PIWIK_PRINT_ERROR_BACKTRACE', false);
}

// XHPROF START
xhprof_enable(XHPROF_FLAGS_MEMORY); // --> enable memory profiling

register_shutdown_function(function () {
        $data = xhprof_disable();

        $XHPROF_ROOT = '/usr/share/xhprof';

        include_once $XHPROF_ROOT . "/xhprof_lib/xhprof_lib.php";
        include_once $XHPROF_ROOT . "/xhprof_lib/xhprof_runs.php";

        $xhprof_runs = new \XHProfRuns_Default();

        $run_id = $xhprof_runs->save_run($data, "xhprof");
        echo "http://wstat1.noa.gr/xhprof/index.php?run={$run_id}&source=xhprof\n";
});
// XHPROF END

require_once PIWIK_INCLUDE_PATH . '/core/dispatch.php';

I can currently not test it but it should work. If you don’t get the link maybe execute an “ls -al /tmp/xhprof” to get the run ids. I recommend to remove this code again after you have recorded one tracker run as it would otherwise record all your requests which can take some disk space…

OK, I’ve run today’s archive script using the above code and it generated 4 files, which you can find in the following tgz file:

http://iweb.noa.gr/files/xprof-wstat1-20140818.tgz

I hope this can help troubleshooting.

Please let me know your findings and provide guidance in solving the problems.

Thanks,
Nick

This profile looks good! I can see a memory peak of 3GB in “Piwik\Plugins\Actions\Archiver::aggregateMultipleReports”. It looks like you have many actions (different URL’s for instance) in your Piwik and many directories in your URL? Do you approx. know how many different URL’s you have?

Maybe you can execute the database query “select count(*) from piwik_log_action where type = 1;” where you might have to replace “piwik_” with your Piwik table prefix.

Output from SQL command:


select count(*) from piwik_log_action where type = 1;

89733

Now?

I am thinking that such a number of URLs may stem from the fact that the website includes a file storage, which includes a huge (and ever increasing) number of files, each of which may in turn be treated by the archiver as a separate URL?

To be more exact, here is the base website: http://www.gein.noa.gr.

This part of the site is a microsite: NOANET GREECE GNSS NETWORK that includes a file store here:

http://www.gein.noa.gr/services/GPSData/

Can we exclude from Reporting / Archiving as separate URLs this part of the site? Or, it would be enough to report this part of the site in total (aggregate) numbers of visits for all URLs expanded from (i.e. starting with): Index of /services/GPSData/.

In general, can we exclude parts of a website from archiving or define such URL aggregations?

Please let me know.

From what I know there is the possibility to exclude parameters How do I exclude URL query parameters from the URLs tracked, and from Pages reports? - Analytics Platform - Matomo and you could not track some folders by disabling tracking in your code like this How do I set some of my website directories or pages to not be tracked? - Analytics Platform - Matomo

There might be more solutions but I don’t know. Maybe someone else knows?

I am afraid that the former reference seems to pertain only to “Query URL parameters” and not to URLs themselves.

The latter is used only in websites which use the Javascript-based tracking method and NOT (as in our case) the log analysis method.

However, I could try the archive option “–exclude-path” (as explained here: How to use Log Analytics tool - Analytics Platform - Matomo). Yet, ideally, I would like to know the aggregate number of visits/pageviews in the excluded path as well, but I don’t know if it’s possible.

Please let me know if there are any alternative/additional suggestions!

Please, let me ask for a clarification: can such a memory peak occur due to a large number of URLs, even if these URLs are not visited at all during the period being archived? If so, isn’t it a bit strange, since log import works fine and I imagine that reporting shouldn’t be overloaded by URLs not having been visited (or in fact they have, by searchengine spiders)?

Please, let me ask for a clarification: can such a memory peak occur due to a large number of URLs, even if these URLs are not visited at all during the period being archived?

memory peak will only occur because of URLs when these URLs have indeed been visited sometime.

However, I could try the archive option “–exclude-path” (as explained here: [piwik.org]). Yet, ideally, I would like to know the aggregate number of visits/pageviews in the excluded path as well, but I don’t know if it’s possible.

It is not possible only with Piwik log importer. However it’s quite possible to do it manually. What I would do is:

As a result you will less URLs and it may lower memory requirements in the future.
There is however need to code this extra step of modifying the log file to “simplify” URLs. hope it helps!

Thanks Matt,

  1. I’ve specified in robots.txt that the location /services/GPSData/ should not be scanned by SE robots. This will reduce unnecessary requests.

  2. I will use the following script for importing data:

Note: I am importing data from the rotated (previous day’s) access_log, hence the name of the original file: “access_log.1”


cp /var/webs/wwwgein2/log/access_log.1 /var/webs/wwwgein2/log/access_log.sedtmp

sed -i "s,/services/GPSData/[^\s]*\s,/services/GPSData/ ,g" /var/webs/wwwgein2/log/access_log.sedtmp

python /var/webs/wwwpiwik/www/misc/log-analytics/import_logs.py -dd \
--url=http://wstat1.noa.gr --login=AdminUser --password=admin_pass \
--idsite=1 --recorders=4 --enable-http-errors --enable-http-redirects \
--enable-static --enable-bots --enable-reverse-dns \
/var/webs/wwwgein2/log/access_log.sedtmp

rm -f /var/webs/wwwgein2/log/access_log.sedtmp

/usr/bin/php /var/webs/wwwpiwik/www/console core:archive --url=http://wstat1.noa.gr >> /var/webs/wwwpiwik/www/archive-log/piwik-archive1.log

In this way, however, we will leave in the db all this huge number of paths imported until today; hence archiving will continue to process these, esp. since all past archiving has been unsuccessful (because it cannot archive month/year) and I believe the problem may continue.

Is there a way (i.e. can you suggest a query or a set of queries) to process the db in a way that stored info can be modified accordingly?

Or, can you provide any alternative suggestion to avoid the problems which may be caused by those already stored paths?

Thanks.

For completeness, the above sed regex does not work correctly if a path contains dots. So, I modified it as follows:


sed -i "s,/services/GPSData/[a-zA-Z0-9\/\.\-\_]*\s,/services/GPSData/ ,g"

Hope that might help someone…