Scaling Piwik

Hi, logging data in MongoDB is something we are very interested in seeing for Piwik style_emoticons/<#EMO_DIR#>/smile.gif

Have you made further progress? We would like to maybe reorganize the core data code to allow such nosql drivers to work smoothly withing Piwik. Please let us know the status of the task and what help you would need. Feel free to email me at matt at piwik.org

Thanks!

KBergose, do you have/can you share any details of what you have done to get PiWik to work in your environment with MongoDB?

We are considering using it in a similar scenario, with millions of page hits/day (or at least that is what we are hoping we will need to scale to…

We can throw some hardware resource at it, so not too concerned on that front, but need reliability and speed…

Currently we are looking at Redis for some of our stats, but that means writing a solution from scratch (more or less).

Argon0

Any news on MongoDB?

Hi guys,
anybody with interest in MongoDB and who can help with development, or testing (eg. high traffic site), please contact me at matt at piwik.org with your details, thanks!

See High traffic Piwik tips

We used MongoDB, but our solution did not do all the processing that Piwik does for every requests. The challenge is the plug-in structure that is invoked for each tracks. The way I see this - it would require a major rewrite in order to maintain this plugin structure and support MongoDB.

Getting the volume of tracks was priority for us - so we changed the code to only do the track. Removed everything else and made the track directly in MongoDB. Then we had a process of rebuilding the datastructure for Piwik and doing batch imports. It worked, but then the archiving process was the next barrier together with MySQL saturation. We handled that by sharding the database with the Spider storage engine - it worked, but increased complexity.

Sorry to say that we left Piwik as a platform for our analytics. We build integration for GA but would like to include Piwik at a later stage. One thing I our new solution that I would recommend is a real time archiving process so all stats are constantly updated. This takes out the spikes entirely and the archiving process is running with the lowest priority on the servers - so it will just not run if the load is high enough.

I know that there are things happening on the MongoDB scene with Piwik - but I am not updated.

Matt?

Thanks for the update, sorry to see you go for now, but hopefully Piwik will be the right tool for you in 6 months or so? If you sent me an email about mongodb you should be on the list for updates, if not send me one… I don’t expect working prototype before March… Cheers

Has anyone tried using Xeround for the MySQL layer? As an autoscaling cloud based MySQL compatible database, it seems like it may help get the numbers up beyond the current 500k or pages per day limit.

flexbean, not sure but you are welcome to try and report here if it is working fine… Probably it will require some changes in piwik though?

Matt, any progress with the Mongo port?
Even if MySQL Cluster have just got its performance increased significantly - I bet that we will need a Nosql option to move Piwik to the next level.

Hi Matt, KBergsoe,

InfiniDB 4 is now GPL V2 with no restrictions on scale or syntax.

http://infinidb.org/forum/4-announcements/3769-infinidb-4-now-gpl-v2

Cheers,
Jim

I have re-opened our ticket to discuss Nosql tools, please comment there if you have any thoughts or would like to participate in this project!

Hi Guys,

Love Piwik - but when it comes to scaling it is fundamentally flawed. The plug-in structure that makes Piwik flexible and it’s reliance of MySQL - also mean that it will never scale in its current design - we are talking major refactoring of the entire system to move it onto other technologies that actually scale.

If we want scalability we could focus on building a UI like Piwik on top of Snowplow - that would take Piwik to an entire different class of systems and open it’s targetgroup upwards toward enterprise organizations. Snowplow are currently without UI and use paid BI interfaces on top of the Snowplow core service. You can of course use Excel - but it is not for the average web admin to use. This fact is the weakest link with Snowplow - but a Piwik like interface could change that.

Have no idea of the work required - but it seems like a totally different project. However I suspect that scaling Piwik will be another project in any case.

Thanks for your message. Snowplow seems a very good project. Maybe we can work with it in the future. We certainly have ideas for scaling Piwik and yes, it will require major refactorings and months of work. We are up for it though (if we can find clients to pay for our engineers). See also: How do I use another database like Postgresql, SQLite, Oracle? Will you support Nosql databases like Hadoop, Mongodb? - Analytics Platform - Matomo

Get in touch if you can sponsor the Piwik team to make progress on scaling Piwik.

I would like to contribute to moving the solution to Infinidb if that is on radar .

Cheers
Vivek Singh

Hi Vivek Singh,

Sure this would be very interesting. How do you see yourself moving the needle on this project? Feel free to create an issue on Github ‘Piwik compatible with Infinidb’ and post your ideas in there?

update: InfiniDB went bankrupt actually.

Thanks KBergsoe for the great technical summary of what is entailed.

I just looked over the piwik source in github and see no movement on MongoDB even thought there seemed to be a lot of interest in it, has there been any movement on scaling the inserts mentioned in the OP four (4) years ago?

The problem is essentially, clients love Piwik, however it is 1) highly engineered PHP effectively shutting out inhouse-part-timer-enterprise application administrators and 2) it is fundamentally flawed when it comes to scaling. Piwik the company sells its cloud services and its professional services, so I don’t believe they have any interest in making Piwik workable for the inhouse guys. It’s been four years, entire companies have come and gone, is Piwik going to address ordinary simple scability? Why would they?

Given the bankruptsy of InfiniDB I can appreciate the business needs Piwik must have to survive. Is there anything the inhouse guy can do with Piwik beyond a log file analyzer on steroids, without making it a career? Already I have fulfilled stats requests by falling back to webalizer, which can crunch through in 30 seconds what would take Piwik a week to load.

Thank you
-R

@kucerarichard We are actually at Piwik actively working on improving performance of Piwik for the long term. The thing is MongoDB or a specific solution is not the only answer. Especially not MongoDB but that’s a long discussion (and it may always change as things change so fast). In general we manage high availability PIwik systems tracking hundreds of millions of pageviews per month. Piwik scales to 500M events per month or possibly more. Right now for example we are working on Reddis driver for scaling Piwik Tracker. It should be open sourced. We want to make Piwik the best analytics platform and for all. So join the team, participate, and let’s get great things done as a community!

I’m extremely surprised anyone mentioned the free-libre PostgreSQL that’s evolving its nosql features

IMO would be the perfect scaling solution, more than relying on different dbms products. Also because of mySql open community is migrating to MariaDB because of Oracle’s classic messy-management on opensource projects.

Not talking from experience (I don’t need that kind of performance, we got 100k pages/day), but did someone looked at TokuDB ?

Obviously, the numbers come from themselves but they say : “insertion rate at 10 000/second” (+800M/day).

If anyone in our Community try a TokuDB setup I would be interested knowing about it.

Dali

About TokuDB I’ve found any reference to any free software license, the only paragraph i got was this copyright about 3rd parties software included in it: http://docs.tokutek.com/tokudb/tokudb-index-appendix.html#tokudb-3rd-party-libraries

So I bet is a noway solution for us.

In the mean time I’ve found an issue that’s explaining Postgres wouldn’t be a choice for now because it’s including too much refactoring at SQL level. So this change should be crowd/privately sponsored or someone should afford himself.

https://github.com/piwik/piwik/issues/500

So a mid-choiche would be integrating a nosql near the actual mysql support