What are the benefits of archiving?

(geniosity) #1


I would just like a little clarity on the benefits of archiving. I don’t have the ability to increase memory due to being on a shared hosting account.

So, I’d like to know what exactly I’m missing out on with the Archiving process not completing due to “out of memory” issues.

It looks like it’s completing the weekly archive, but crashes on the yearly.

Will it clear out some DB tables? Will it just make reporting quicker? (And if so, how/why)?

Sorry for the questions, bug if Matt or vipsoft could chime in with their supreme knowledge regarding this, I’d REALLY appreciate it.


(kolchak) #2

We are using Piwik on a cluster of websites, and when I setup the archiving, the performance improved dramatically! Depending on your traffic volumes, you may get similar outcomes.

Other than performance, I dont think there are any features that you dont get access to by not archiving…

(geniosity) #3

Thanks for your reply.

Now I just have to figure out how to get it to actually archive since it’s gotten too big to do so now.

(erinwentworth) #4

[quote=geniosity @ Aug 6 2009, 09:33 AM]Thanks for your reply.

Now I just have to figure out how to get it to actually archive since it’s gotten too big to do so now.[/quote]

Could you let us know when you figure how to do it

(yieldonred) #5

archiving will reduce disk space needed and also make for a faster application. archiving will also permit you to perform adequate backups.

(geniosity) #6

Ok, I’ve been playing around, and ended up moving my stats to a different host where I’m in charge.

I upped the memory to 384MB and that was the only way to get the archiving to complete.

I even set it up to run every hour, and when I changed it back to 128MB, it fails.

So obviously the memory issue does not care about how much needs to be archived (if I understand it correctly, during the day it should only need to archive the last hour’s data).

AND, my DB tables are not “shrinking”. They’re up to 400MB. Is that supposed to be correct? Does archiving not get rid of unnecessary data, or will all data be stored for all time, only the totals will be quicker due to them being “archived”?

(Matthieu Aubry) #7
  • Archiving several times per day will only result in Today’s reports being updated more often. It will not change the memory requirement for other periods.

  • There is an issue with memory and Piwik archiving. We already spent time looking for leaks and fixing some, but there is still some work to do obviously.

  • Data size growing is expected. Piwik will delete archives that are for imcomplete periods (ie. when you archived a week in the middle of this week), but will not delete other archives. You will therefore have archives for every day, every week, every month and every year in the mysql tables. They ensure very fast UI response and data access, but obviously they require disk space. In the future one can imagine a plugin that would delete some of the old data (for example, only keep the top 50 keywords for each day, etc.).

  • At this point, archiving doesn’t delete logs. It is expected that, at some point, archiving daily reports means that we can then get rid of the logs. However, before Piwik 1.0, this wasn’t a requirement for us and we decided to leave all logs in the DB until this feature is implemented. In the future, these logs will either be deleted or rotated in other tables or files.

  • If you don’t setup archiving to run automatically, archiving will occur when requesting the report. This will often be slow and a bad user experience (have to wait N seconds) hence why we recommend to setup auto archiving for medium to large websites: http://piwik.org/docs/setup-auto-archiving/

(geniosity) #8

Now THAT’S the information I was looking for.

I fully recommend that gets added to the “Archiving” help/info page.

Thanks for the confirmation…

(Matthieu Aubry) #9

good idea, I added the info in http://piwik.org/docs/setup-auto-archiving/

is it clear enough? do you have other questions?

(geniosity) #10

That page is helpful, and it has all the info on “HOW”, but I really think you should add the info about what to expect from the archive.

That’s the REALLY useful information you provided earlier. (Unless the page I’m seeing is cached and you already added that)

(Matthieu Aubry) #11

I put the same content at the end of the page.

(geniosity) #12

That’s awesome. Thanks, I’m sure it will save a lot of people a lot of time.