Very slow archiving after upgrading to 2.2(.2)

Hi,

I’m running a large piwik instance with more than 6k sites.
Usually I’m doing daily archiving via cronjob which took around 4 hours before (also with piwik 2.1.0).
After upgrading to piwik 2.2.2 I got issues with the archive time.
I killed the first archiving cronjob which was still running after the second one (24 hours later) started.
From the performance I see now it will take around 48 hours to process all the sites once.

Is there anything that needs to be done now to have the old performance back? Or is it “only” for the first run after upgrading?
I already switched from archive.php to console core:archive .

Edit:
I see that most of the time is spent in archiving for the year:


INFO CoreConsole[2014-05-11 10:36:26] [6ddb9] Archived website id = 1826, period = day, 0 visits in last 4 days, 0 visits today, Time elapsed: 0.747s
INFO CoreConsole[2014-05-11 10:36:27] [6ddb9] Archived website id = 1826, period = week, 0 visits in last 4 weeks, 0 visits this week, Time elapsed: 1.105s
INFO CoreConsole[2014-05-11 10:36:29] [6ddb9] Archived website id = 1826, period = month, 0 visits in last 4 months, 0 visits this month, Time elapsed: 2.096s
INFO CoreConsole[2014-05-11 10:36:50] [6ddb9] Archived website id = 1826, period = year, 0 visits in last 4 years, 0 visits this year, Time elapsed: 21.055s
INFO CoreConsole[2014-05-11 10:36:50] [6ddb9] Archived website id = 1826, 4 API requests, Time elapsed: 25.013s [347/4533 done]
INFO CoreConsole[2014-05-11 10:36:51] [6ddb9] Archived website id = 1827, period = day, 0 visits in last 4 days, 0 visits today, Time elapsed: 0.745s
INFO CoreConsole[2014-05-11 10:36:52] [6ddb9] Archived website id = 1827, period = week, 0 visits in last 4 weeks, 0 visits this week, Time elapsed: 1.102s
INFO CoreConsole[2014-05-11 10:36:54] [6ddb9] Archived website id = 1827, period = month, 0 visits in last 4 months, 0 visits this month, Time elapsed: 2.097s
INFO CoreConsole[2014-05-11 10:37:15] [6ddb9] Archived website id = 1827, period = year, 0 visits in last 4 years, 0 visits this year, Time elapsed: 21.072s

Regards,

Thomas

Hi Thomas,

A user has reported some regression in the Tracker performance (see ticket: Tracking servers CPU saturation after upgrade from 2.1.0 to 2.2.0 · Issue #5019 · matomo-org/matomo · GitHub ) but I haven’t heard of archiving problem in latest 2.2.2.

If you revert code back to 2.1 do you see much better performance?

Hopefully we can find the cause of your issue, as it sounds major!

What is python version? Maybe the new piwik would benefit from an update of python for optimal speed to help cron?

What has python to do with archiving?
Take into account, that I’m not importing logs here.

@Matt: I’ll see when the next run finished. First archiving run was now finally done:


INFO CoreConsole[2014-05-12 17:42:25] [6ddb9] Done archiving!
INFO CoreConsole[2014-05-12 17:42:25] [6ddb9] ---------------------------
INFO CoreConsole[2014-05-12 17:42:25] [6ddb9] SUMMARY
INFO CoreConsole[2014-05-12 17:42:25] [6ddb9] Total visits for today across archived websites: 22
INFO CoreConsole[2014-05-12 17:42:25] [6ddb9] Archived today's reports for 4533 websites
INFO CoreConsole[2014-05-12 17:42:25] [6ddb9] Archived week/month/year for 4533 websites
INFO CoreConsole[2014-05-12 17:42:25] [6ddb9] Skipped 1632 websites: no new visit since the last script execution
INFO CoreConsole[2014-05-12 17:42:25] [6ddb9] Skipped 0 websites day archiving: existing daily reports are less than 3600 seconds old
INFO CoreConsole[2014-05-12 17:42:25] [6ddb9] Skipped 0 websites week/month/year archiving: existing periods reports are less than 3600 seconds old
INFO CoreConsole[2014-05-12 17:42:25] [6ddb9] Total API requests: 18132
INFO CoreConsole[2014-05-12 17:42:25] [6ddb9] done: 4533/4533 100%, 22 vtoday, 4533 wtoday, 4533 wperiods, 18132 req, 121025868 ms, no error
INFO CoreConsole[2014-05-12 17:42:25] [6ddb9] Time elapsed: 121025.868s

Was it previously also archiving for 4 years? If I’m not mistaken (just looking at mytop during an archiving run) it goes through all the 4 years for every month and checks first if there is some "lockend_" entry for that month or something like that. That means going through 412 tables already without any work done. Not all of these ancient tables are always loaded into memory already.

Was just an idea with python…

Heres another idea not even sure its applicable to your setup see if cron priority will help things…

You can change the priority a command runs at by preceding it with “nice”.

For example, if the command running from your cron job looks like this:

php -q /home/guru54gt5/public_html/sys/auto_cb_update.php

You can precede it with a nice command such as:

nice -n N php -q /home/guru54gt5/public_html/sys/auto_cb_update.php

Where “N” is a number between -19 and 19.

Running at -19 means “highest priority”, 19 means “lowest prority”, and you have all the numbers in between to play with too :slight_smile:

Unless specified, I believe tasks default to “0”, or medium priority.

You can read more about nice by typing “man nice” from the command line.

-Eric

It should only archive for the amount of time since the last time the archiving ran successfully. So technically next time it runs it should only archive few days / couple weeks / couple months / couple years (ie. last2 or so)

Let me know if you find something

Latest archiving time is 8 hours. I’ll check over the weekend for a downgrade.
Is it because of the change to cli archiving? I was using archive.php before.

It should be faster after migrating to CLI as it’s not using http anymore… But if you find older codebase is running faster let me know, maybe there is a regression