TokuDB as storage engine for Piwik

Many people with huge trafic are already using or considering TokuDB engine as a replacement for InnoDB. Please use this thread to share knowledge and suggestions about TokuDB.

There’s already a Piwik FAQ about installing TokuDB:

Few questions already :
In theory using Toku should take less disk space, is it the case, how much smaller ?
Is there a difference in Crontab archiving time ?
Will users see a difference ? (mainly in using segments)
If disk space and speed are way better, can we add more pre-processed segments in cronjob instead of real-time slow ones ?

Let the sharing starts.

"In theory using Toku should take less disk space, is it the case, how much smaller ? "
On 3 servers exactly 4 times space on disk, so it’s awesome.

So 200GB in mysql it’s 50GB on disk.

Will try to make some archiving and speed tests. (I have 10 more piwik servers to convert)

What do you mean by “mainly in using segments” any particular URL or situation to test ?

To speed up segments archiving you can use this new option added in 2.15:

Complain I got about is speed from interface for end users on my current InnoDB 34 GB, 3 years of logs setup (we keep everything) :

  1. Date range, using current year (so, not completed).

Ex: Current year => Actions => Pages => Transition on the first link
Piwik Apache with 4 CPU is at 10% CPU usage and MySQL at 50% while processing.

Single day transition = 3 sec. (from 13k pageviews)
Month of September transition = 44 sec.(from 700k pageviews)
Year 2015 so far = nothing yet after 45 minutes (4.8 million pageviews, Apache server (4 CPU, 8 GB ) is at 20% CPU and MySQL 2 CPU 6 GB used on 8 GB available)

  1. Segments.

We got 2 segments (real-time, not under cron pre-crunching) where we include or exclude our own traffic.
External, make this kind of URL:

And the Internal traffic only produce this kind of URL:

Even just for a single day, this kind of segments takes ages.

I made a quick test for 1 full month :

August => No segment => Actions: Pages = 5 sec.
August => internal (visitIp>%3D215.216.3.0%3BvisitIp<%3D215.216.3.255) => Actions: Pages = Got 2 Chrome timout but results after 1 min 15 sec.
July => internal => Actions: Pages = 1 min. 6 sec. no Chrome timout this time.

@Quba, My understanding is that Caching id actions in general cache for faster segmented archive SQL queries by mattab · Pull Request #8861 · matomo-org/piwik · GitHub is good for pre-processed/cron segments with no impact on real-time one, right ?

I’ve maintained a no-pre-processed segments policiy so far because of the DB size. Turning to TokuDB with compression enable (and saving 75% of disk space) would allow me to make some popular segments pre-process in the future… That’s the plan at least…