Scalability

Hi,

I’m trying to make Piwik scale to multiple machines, and I really want your help or opinion. The main thing is to improve the speed that the results are returned from an API call.

The ways I’m thinking to tackle this are by:

  1. Having one writer & multiple reader Piwik instances that point to the same MySQL database. Then disable the archiving function in all the readers, and only enable it in the writer. But I haven’t found a way to disable it yet.
  2. MySQL replication
  3. Memcached, either in Piwik or in a different server.

Another questions are:
Are there any crontabs in the default Piwik installation?
When do the Archive functions get executed?

I’m thinking to implement all of the above, but I am not sure where to start in some. Would it be enough? Are there any disadvantages? Thanks.

Any other opinions to scale it are welcome.

[quote=zeta @ Feb 18 2009, 08:01 PM]Hi,

I’m trying to make Piwik scale to multiple machines, and I really want your help or opinion. The main thing is to improve the speed that the results are returned from an API call.

The ways I’m thinking to tackle this are by:

  1. Having one writer & multiple reader Piwik instances that point to the same MySQL database. Then disable the archiving function in all the readers, and only enable it in the writer. But I haven’t found a way to disable it yet.
  2. MySQL replication
  3. Memcached, either in Piwik or in a different server.

Another questions are:
Are there any crontabs in the default Piwik installation?
When do the Archive functions get executed?

I’m thinking to implement all of the above, but I am not sure where to start in some. Would it be enough? Are there any disadvantages? Thanks.

Any other opinions to scale it are welcome.[/quote]

Hi Zeta, could you please post your questions on the piwik-hackers mailing list (see dev.piwik.org), there are more tech people looking and we’ll be able to help better. thanks

Hi Matt,

I’m not sure if I posted the issue on the piwik-hackers mailing list correctly. I sent an email to piwik-hackers@piwik.org about a month ago and haven’t received a reply. Do I need to subscribe to the mailing list? I checked in the February archives, but there’s no post made by me in there.

Thanks.

#1 should be doable so long as your readers (web sites hosting the piwik UI) don’t share the same config files as the writer (web site handling the logging).

See the document re: setting up archiving. On the writer, modify misc/cron/archive.sh to suit your environment or requirements, and then add to your crontab. (There’s an updated cron job script in SVN if you’re interested.) On the readers, set enable_browser_archiving_triggering = false.

I don’t have any thoughts on the other two scenarios you proposed.

Hi,

I’ve given this some thought since I plan on putting Piwik in production - if I get the frontend to work.

We have 2mil+ pageviews daily and this really stresses the machine doing the datacollection. I plan on having 2 or 3 servers as collectors in a load-balanced cluster (I use LVS) with a single db-server. I’ve tweaked the apache-servers and the mysql-server and they should be able to handle the stress. I plan on running the frontend on a separate server.

Cheers,

Johan

Hi,

In a few days I will write a plugin that allows having one or more read-only databases (slaves), and one database for writing (master).

This plugin is designed to use with several replication servers.

Apart from this I work on TrackerSecondaryDb plugin which logs visits to files when master database is not available (and you can replay the logs and switch to master when it is up again).

When I got a demo of these plugins you will find them on: http://piwik.org/faq/plugins/

Cheers,

Maciej Zawadziński