Archive Too Slow

CanadianNomad · August 2, 2011, 11:06pm

I have an install that has:
7 sites
~1000 visitors per day
~100000 unique urls
~3 months old
DB is on a different system from Piwik.

I have turned off the automatic archiving, and am currently running the archive.sh manually at the shell.
It pegs my CPU, grids at my swap, and takes over an hour to complete if at all.

I can assume it is because I have too many unique URLs and Piwik doesn’t like that.
So I put in some JS magic to reduce the 100k urls into groups that will in time only give me 1k urls.

But I am left with a problem: How can I hack my old logs so I still have the # of visitors etc while dropping or grouping the unique urls that they went to so that the archive.sh can work normally again?
Simply put, how can I make archive.sh work properly again with the minimum of lost data?
Thanks!
Kyle

matthieu · August 3, 2011, 4:47am

when you run archive.sh,

put timeout 2 hours
run archive.sh once until completion
wait 2 hours and run it again,
=> how long does it take the second run (which will only RE-archive data for today only)

CanadianNomad · August 3, 2011, 4:52am

I just let it go.
It never gets to completion before I run out of memory(php memory_limit set as high as I can go)
It never completes “Archiving period = year for idsite = 7…” which is the big one.
Are you saying I might need a bigger machine just to get past the problem :S ?

matthieu · August 3, 2011, 7:35am

How high was the memory limit? usually with 2G or even 4G it should work? if not, we will work soon on memory improvements, but not before 2 months or more?

CanadianNomad · August 3, 2011, 2:28pm

Cloud resize to the rescue. Got a clean archive.sh done. Too bad it doesn’t work for when my system isn’t ballooned up to 8 times its normal size.
Can I safely not run the yearly archive until your memory improvements are in place?