Multiple archive cron jobs started in parallel

We have a cron job started every 30 minutes and, in an attempt to diagnose a performance issue after having modified a segment triggering a major rearchiving, we realised that most (weirdly enough, one is not there) of the hourly cron jobs were still running. This is obviously not helping the processes to finish. After a similar operation last week, the whole Piwik server had crashed last week, about 30 hours later.

ps -eaf |grep piwik | grep /opt/piwik/tmp/logs/piwik-archive.log root 9446 9442 0 09:00 ? 00:00:00 /bin/sh -c /usr/bin/php /opt/piwik/console core:archive -v --url=http://127.0.0.1/blah >> /opt/piwik/tmp/logs/piwik-archive.log 2>&1 root 5931 5927 0 10:00 ? 00:00:00 /bin/sh -c /usr/bin/php /opt/piwik/console core:archive -v --url=http://127.0.0.1/blah >> /opt/piwik/tmp/logs/piwik-archive.log 2>&1 root 31124 31114 0 11:00 ? 00:00:00 /bin/sh -c /usr/bin/php /opt/piwik/console core:archive -v --url=http://127.0.0.1/blah >> /opt/piwik/tmp/logs/piwik-archive.log 2>&1 root 28657 28644 0 12:00 ? 00:00:00 /bin/sh -c /usr/bin/php /opt/piwik/console core:archive -v --url=http://127.0.0.1/blah >> /opt/piwik/tmp/logs/piwik-archive.log 2>&1 root 14137 14132 0 13:00 ? 00:00:00 /bin/sh -c /usr/bin/php /opt/piwik/console core:archive -v --url=http://127.0.0.1/blah >> /opt/piwik/tmp/logs/piwik-archive.log 2>&1 root 2471 2455 0 14:00 ? 00:00:00 /bin/sh -c /usr/bin/php /opt/piwik/console core:archive -v --url=http://127.0.0.1/blah >> /opt/piwik/tmp/logs/piwik-archive.log 2>&1 root 17159 17142 0 16:00 ? 00:00:00 /bin/sh -c /usr/bin/php /opt/piwik/console core:archive -v --url=http://127.0.0.1/blah >> /opt/piwik/tmp/logs/piwik-archive.log 2>&1
Is this the intended behaviour? Should we introduce a check on the process (whether it still runs or not) ourselves in the cron call?

Thanks in advance!

Patrick

PS: This Discourse forum is great!

Could you check whether the processes are blocked at a particular place?

Should we introduce a check on the process (whether it still runs or not) ourselves in the cron call?

this is a good idea I think

Thanks! Sorry I had missed your message!

No, it was just the load increasing (more sites tracked) and the job not being finished by the time the next one was to start.

[quote][quote]
Should we introduce a check on the process (whether it still runs or not) ourselves in the cron call?
[/quote]
this is a good idea I think
[/quote]

We have implemented a semaphore with flock. But shouldn’t Piwik itself check for archive process re-entry?

As far as I know, the archiver should not archive the same website twice. I suppose all your cron jobs were archiving different websites. It may still be an issue if it overloads the server. We could propose a new option or mechanism. Feel free to create a bug report on the tracker.