Archive Too Slow


#1

I have an install that has:
7 sites
~1000 visitors per day
~100000 unique urls
~3 months old
DB is on a different system from Piwik.

I have turned off the automatic archiving, and am currently running the archive.sh manually at the shell.
It pegs my CPU, grids at my swap, and takes over an hour to complete if at all.

I can assume it is because I have too many unique URLs and Piwik doesn’t like that.
So I put in some JS magic to reduce the 100k urls into groups that will in time only give me 1k urls.

But I am left with a problem: How can I hack my old logs so I still have the # of visitors etc while dropping or grouping the unique urls that they went to so that the archive.sh can work normally again?
Simply put, how can I make archive.sh work properly again with the minimum of lost data?
Thanks!
Kyle


(Matthieu Aubry) #2

when you run archive.sh,

  1. put timeout 2 hours
  2. run archive.sh once until completion
  3. wait 2 hours and run it again,
    => how long does it take the second run (which will only RE-archive data for today only)

#3

I just let it go.
It never gets to completion before I run out of memory(php memory_limit set as high as I can go)
It never completes “Archiving period = year for idsite = 7…” which is the big one.
Are you saying I might need a bigger machine just to get past the problem :S ?


(Matthieu Aubry) #4

How high was the memory limit? usually with 2G or even 4G it should work? if not, we will work soon on memory improvements, but not before 2 months or more?


#5

Cloud resize to the rescue. Got a clean archive.sh done. Too bad it doesn’t work for when my system isn’t ballooned up to 8 times its normal size.
Can I safely not run the yearly archive until your memory improvements are in place?