Archiving fails


#1

Hi,

Not sure if this is the correct place to post this, but here it goes.
We’re currently validating if Piwik is an alternative solution that will replace Google analytics. We started using it since march. We have approx 500k requests per day and are heavily relying on events using custom page variables. Everything is great from the tracker side, I successfully validated that all events are sent, small differences between GA and Piwik. Pulled out access logs and everything is in the db.

Our issues lie with the API. We have approx 70m rows in piwik_log_link_visit_action. We plan on deleting records older than 30 days/ Without archiving, pulling out events based on a custom var for one day takes approx 70s using the API. Noticed that by default Piwik only retrieves the first 500 custom vars values, so I changed those values in the config file. Deleted all data from piwik_archive tables for october and restrated the archiving process. If data is archived, the api responds in 25 seconds for one day. I still think it’s a lot.

The problem is that now, when I run the archive, it seems to ignore the input params:
e.g:
bash-4.1$ /var/www/piwik/console core:archive --url=… --force-date-range=2015-10-01,2015-10-02 -vvv

I can see that it’s creating data for all dates, not only the first and second of oct. It fails with this:

[Exception]
1 total errors during this script execution, please investigate and try and fix these errors.

Exception trace:
() at /var/www/piwik/core/CronArchive.php:417
Piwik\CronArchive->logFatalError() at /var/www/piwik/core/CronArchive.php:410
Piwik\CronArchive->end() at /var/www/piwik/core/CronArchive.php:269
Piwik\CronArchive->Piwik{closure}() at /var/www/piwik/core/Access.php:456
Piwik\Access::doAsSuperUser() at /var/www/piwik/core/CronArchive.php:270
Piwik\CronArchive->main() at /var/www/piwik/plugins/CoreConsole/Commands/CoreArchiver.php:27
Piwik\Plugins\CoreConsole\Commands\CoreArchiver->execute() at /var/www/piwik/vendor/symfony/console/Symfony/Component/Console/Command/Command.php:257
Symfony\Component\Console\Command\Command->run() at /var/www/piwik/vendor/symfony/console/Symfony/Component/Console/Application.php:874
Symfony\Component\Console\Application->doRunCommand() at /var/www/piwik/vendor/symfony/console/Symfony/Component/Console/Application.php:195
Symfony\Component\Console\Application->doRun() at n/a:n/a
call_user_func() at /var/www/piwik/core/Console.php:79
Piwik\Console->Piwik{closure}() at /var/www/piwik/core/Access.php:456
Piwik\Access::doAsSuperUser() at /var/www/piwik/core/Console.php:80
Piwik\Console->doRun() at /var/www/piwik/vendor/symfony/console/Symfony/Component/Console/Application.php:126
Symfony\Component\Console\Application->run() at /var/www/piwik/console:27

Any idea why? We have the latest piwik version.

Would this scale for our needs? The plan is to run the archive every hour and pull daily data out. Keep only raw data for 30 days (since we can also recreate it easily using access logs) and delete archives older than 60 days.

Otherwise, I was also thinking to disable completely the archiving process, adding some indexes on the custom var columns, only keep the raw data for 30 days and use the api directly on it. I’m just a bit reluctant to start adding indexes.

Regards,