Piwik and Large Sites


(raj) #1

Hi all,

Great job on Piwik! It’s really slick!

I tested Piwik on a subsite of archive.org this weekend, and thought I would send a report of what worked and what didn’t.

In order to get Piwik running at scale, I started with the tips from the auto-archiving page: http://piwik.org/docs/setup-auto-archiving/

I set up piwik 0.4.3 on its own Athlon64 dual core 3800+ machine with 1GB of memory.

I set up auto-archiving to run every 10 minutes, and set enable_browser_archiving_triggering to false as suggested. I kept bumping up php cli memory_limit until archive.sh stopped failing, and end up with memory_limit = 1024M, which was the amount of physical memory in the box.

I set up piwik on a portion of our site that gets 100K uniques/day on the weekends, and 140K on weekdays. Things ran great on Saturday and Sunday. On Monday, the stats machine locked up with the increased load. Before it locked up, cpu usage was at 100%. and I saturated the mysql connection limit. That would be the next thing to tweek when I revisit piwik in the future. Here are some stack traces, which might be useful to piwik developers:

SQLSTATE[00000] [1040] Too many connections
Backtrace: 
#0 /var/www/ol/libs/Zend/Db/Adapter/Abstract.php(228): Zend_Db_Adapter_Pdo_Abstract->_connect()
#1 /var/www/ol/core/Piwik.php(1345): Zend_Db_Adapter_Abstract->getConnection()
#2 /var/www/ol/core/FrontController.php(209): Piwik::createDatabaseObject()
#3 /var/www/ol/index.php(64): Piwik_FrontController->init()
#4 {main}
SQLSTATE[HY000] [2002] Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
Backtrace:
#0 /var/www/ol/libs/Zend/Db/Adapter/Abstract.php(228): Zend_Db_Adapter_Pdo_Abstract->_connect()
#1 /var/www/ol/core/Piwik.php(1345): Zend_Db_Adapter_Abstract->getConnection()
#2 /var/www/ol/core/FrontController.php(209): Piwik::createDatabaseObject()
#3 /var/www/ol/index.php(64): Piwik_FrontController->init()
#4 {main}

Thanks again for your project! If you need someone to volunteer to test scaling fixes, let me know. The main site gets several million hits/day, and around 1M uniques/day.


(vipsoft) #2

Thanks for the feedback. Getting PHP to return freed memory back to the pool is an ongoing issue for archiving…

Watch for Piwik 0.5. Performance improvements are coming…


(Xetra-Max) #3

Hi,

i am planning to use piwik on a large site with approx. 1.3 million page requests a day.
thx for reporting your experience.

and I saturated the mysql connection limit ;(

so am i right that i should wait until the 0.5 release?


(Jerry21) #4

Hello,

We tested Piwik on our website, ranked about 10.000 on Alexa and reached 150.000 unique visitors per day.

For the moment, after few minutes, and 5000 users connected, Piwik appears to be down. The stats graph doesn’t load and diplay “No datas for this graph”…
It is the same with JS loaders which doesn’t display datas.

Sometimes, it works, but not always.

I seems to be a too big quantity of queries… maybe ?
115 618 saves in the archive table, in just 2 hours…

Is the problem us ? Or your script is not written to big websites ?

Our database is hosted on a i7 8x 2.66+ GHz with 12 Go Ram…


(CreativeNotice) #5

you may find this thread interesting forum.piwik.org/index.php?showtopic=1435


(Sorius) #6

I tried to use Piwik on a big site and experienced the same problems described here.

i even had a better server than jerry21.

i have read through the postings here and one of the developers (vipsoft) mentioned they are working on performance issues for the release 0.5 of piwik. i also read the ticket for that issue, it’s quite old, seems not to be on a high priority list.

i think it’s one of the key features. small blog sites do not care about Google Analytics and their data policy. but many companies do and i know some who are willing to go away from GA but i cant recommend Piwik right now.

perhaps the developers can give us a hint how they are going to handle piwik on large sites. perhaps they don’t see piwik as an enterprise solution…

greetings Sorius


(vipsoft) #7

In 0.5, we’re tweaking the table structures and indices to optimize the sql queries for the tracker. I think that should be our first priority before we try to identify other bottlenecks.

Please don’t judge priority by the age of the ticket. Oftentimes there are dependencies on other tickets.


(Jerry21) #8

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 53 bytes) in /var/www/piwik/core/ArchiveProcessing/Day.php on line 181

Memory_limit is already fixed at 128M…


(Sorius) #9

@vipsoft: great. good to know. 0.5 will come directly after 0.44?
regarding the post here and the comments from users: do you think after optimizing the table structures there will be many problems gone? or is it a small piece of a big cake?

let’s assume we are talking about the following scenario:

100.000 unique visitors a day
10 pages for each visitor
makes around 1000000 Page Impressions

thanks a lot


(vipsoft) #10

Yes, the current plan is for 0.5 to follow after 0.4.4.

0.5 won’t address all the performance/scaleability issues that we’ve identified. See ticket #386 for an overview.

Ticket #708 is the biggest change planned for 0.5. It’s a blocker to reporting enhancements in the UI, and is probably performance neutral on the tracker-side.


(Matthieu Aubry) #11

Not performance neutral, for websites that have page names starting with often the same characters they will beneficiate from much faster mysql queries during logging. There will be more improvements we can make to the code, the main being a way to log visits in a buffer and then do mass inserts rather than connecting at each request…

if any of you would like to help with performance, you are welcome to contribute style_emoticons/<#EMO_DIR#>/smile.gif


(Applejax33) #12

In order to have Piwik handle the amount of traffic we are throwing at it we have made the following changes:

1 - 2 front end dedicated web servers with local mysql only collecting data.
(we did this to offload any overhead on generating displays)
2 - Backend sql server pulling the data from the 2 front end collectors.

Of course we have to show delayed results and not real time traffic.

Currently we are pushing 150k hits an hr to each server.

Servers:
Both front end web servers are dedicated dualcore 2.8, 4 gb ram with ssd drives
(we made this design so we can add as needed due to server load)

Php has been reconfigured to use the full ram as needed
mySQL connections have been pushed up.

Problems:

For some reason some clicks are missed… they are just not showing up neither as a unique or as a repeat visitor.

Also we still working on the load balancing issue of how to direct to each box. (for time being we separated it based on class A address)

Question:

has anyone seen this diapering click issue?
and is anyone playing with the performance side of the tracker?


(vipsoft) #13

I’m not aware of a disappearing clicks bug. There are, however, conditions where clicks (or aspects of a click) might be ignored, e.g., smart bots, campaign referer, etc. See http://piwik.org/faq/troubleshooting/#faq_51

There’s a potential bottleneck for sites with highly similar URLs. See the index patch in http://forum.piwik.org/index.php?showtopic=1123. We’ll be addressing this in 0.5 (ticket #708).


(Xetra-Max) #14

Hi,

i read in the Trac Roadmap that you will provide mySQLi support in 0.44.

I have read a little bit about mySQLi and it’s advantages.

Very often it is said that using mySQLi you will get a performance gain…

Is it so with Piwik too? Perhaps somebody could write a short describtion from a non technical perspektive why mySQLi is better and why also Piwik will gain from it’s use…

thx a lot!


(tquakulinsky) #15

Hi,

we want to use Piwik for our company: We have about 500 sites, 500.000 visits/day and 1.1 Million pageviews/day. Now we want to design the infrastructure and make a hardware sizing. We have the following questions:

  • Is it possible to have multiple servers with Piwik which write into the same database? A load balancer will be in front of the Piwik servers...
  • What kind of machines (x86- or x64-based) do we propably need for the Apache servers?
  • Does anybody use Apache + mod_proxy in front of the Apache with Piwik for security reasons? Which experience did you gain?

I hope someone could help us and provide informations concerning our questions.

Thanks a lot
Tim Quakulinsky


(vipsoft) #16

At this time, mysqli support is simply to accommodate servers without PDO and pdo_mysql extensions (e.g., as might be experienced with some third party web hosting providers).


(Matthieu Aubry) #17

Tim, the main bottleneck in Piwik is the Mysql database which needs to be the most powerful box. Having a load balancer in load of two cheap front end servers that would record logs in a powerful db server is a good idea. Please let us know how it is working for you.


(Applejax33) #18

So A little update for you guys the issue we were running into is that when you try to include piwik from a js file the new updates from Microsoft will not allow. In essence since our piwik servers sit on a different domain that that of all the web sites out there the new patches from msft will not allow you to pull that data. So in all our templates we have set the base coding and made a document.write for the id of the site.

On another note we have pushed piwik to amazing cps (4+ million records a day in test environment) but don’t run the web interface at the same time… not 1 break LOVE U GUYS!

We moved to dedicated tracking boxes and a separate box for displaying information.


(KBergsoe) #19

[quote=Applejax33 @ Oct 2 2009, 07:15 PM]So A little update for you guys the issue we were running into is that when you try to include piwik from a js file the new updates from Microsoft will not allow. In essence since our piwik servers sit on a different domain that that of all the web sites out there the new patches from msft will not allow you to pull that data. So in all our templates we have set the base coding and made a document.write for the id of the site.

On another note we have pushed piwik to amazing cps (4+ million records a day in test environment) but don’t run the web interface at the same time… not 1 break LOVE U GUYS!

We moved to dedicated tracking boxes and a separate box for displaying information.[/quote]

This sounds interesting! - how have you separated the trackingservers from the main Piwic DB?
For some time we have been using OpenX with Distributed Stats setup - having two frontend servers with local MySQL servers taking the inbound traffic. They regularly do batch inserts into the main DB which reduce the load significantly and makes it possible to scale out to a large extent.

We are trying to setup something similar with Piwik but are challenged with maintaining the session information - just like you describe. Do you have any details on the progress made here?

Best regards


#20

AppleJax33 - if you are still around, do you have any details of how you were able to push Piwik to 4+ million cps? And how you split it up to have separate tracking and “display” boxes?