Full MySQL table scan without index?


(AndrewDB) #1

Hi,
Still testing Piwik to get the highest performance possible (right now I am hitting a wall at around 200 hits/s style_emoticons/<#EMO_DIR#>/ohmy.gif ), it seems Piwik does a full table scan of table piwik_log_visit for each visit. Also it seems this table is not indexed on the hash code (column config_md5config) or the cookie (visitor_idcookie). Would it be possible to add these two indexes to the Piwik PHP code and so speed up the system ? style_emoticons/<#EMO_DIR#>/blink.gif
Where do I need to patch the Piwik source code to include these two indexes ?
Thanks for any answer,
Cheers,
Andrew


(vipsoft) #2

Let us know what the performance improvement is after you create those indexes.

If you are submitting a patch, the schema is defined in core/Piwik.php, while update scripts are placed in core/Updates.


(AndrewDB) #3

[quote=vipsoft @ Jan 22 2010, 04:45 PM]Let us know what the performance improvement is after you create those indexes.
…[/quote]
Hi everybody,
OK, here is what I did :

  1. I created an index, which I called [b]idx_hash[/b], on the [b]first 8 characters[/b] of the [b]config_md5config[/b] field of the [b]piwik_log_visit[/b] table.
  2. I searched for and found the SELECT statement in the piwik/core/Tracker/Visit.php file. I believe this SELECT statement was the cause of rather expensive table scans.
  3. I modified the SELECT statement by adding FORCE INDEX (idx_hash) just before the WHERE...

The performance of the Piwik tracker before this change was around 50 requests /second with an average response time of 100ms. After the change, I got 450 requests / second with an average response time of 25ms!
Actually I am now hitting a different wall, in the sense that MySQL is not the system bottleneck anymore, but CPU performance on the Apache/PHP/Piwik server : my 8-core Linux server is running at 75% CPU utilization on all 8 cores.


(vipsoft) #4

A cursory glance suggests we might be able to replace the existing index, e.g.,

DROP INDEX index_idsite_date ON piwik_log_visit;
CREATE INDEX index_idsite_date_config ON piwik_log_visit (idsite, visit_server_date, config_md5config);

(AndrewDB) #5

[quote=vipsoft @ Jan 26 2010, 05:51 AM]A cursory glance suggests we might be able to replace the existing index, e.g.,

DROP INDEX index_idsite_date ON piwik_log_visit;
CREATE INDEX index_idsite_date_config ON piwik_log_visit (idsite, visit_server_date, config_md5config);

[/quote]
Thanks, I am going to try that.
However, I have noticed that indexing on the full config_md5config column is extremely expensive, so I would suggest indexing on only the first 8 characters :

DROP INDEX index_idsite_date ON piwik_log_visit;
CREATE INDEX index_idsite_date_config ON piwik_log_visit (idsite, visit_server_date, config_md5config(8));

OK, tested that, it works, I am getting 450 requests / second with an average response time of 25ms which seems to be the CPU bottleneck I mentioned above.


(Matthieu Aubry) #6

Andrew, this is an interesting experiment, please let us know the results.

Do you have the mysql slow logs for this query? Your index piwik_log_visit (idsite, visit_server_date, config_md5config(8)) looks good indeed - I’m a bit surprised it wasn’t already defined like this style_emoticons/<#EMO_DIR#>/blink.gif thanks for pointing it out.

What are the conditions of your tests? a real test would include serving that much traffic over the course of many days as the performance decrease with increased data size (ie 450 req/s would be come 100 req/s after a few millions rows probably?).


(vipsoft) #7

If you’re using Zend_Server, there might be some improvement by caching the config settings in shared memory instead of reloading/reparsing these with each tracker instance.

Here’s a modified version of core/Tracker/Config.php that you can try out.


(AndrewDB) #8

[quote=vipsoft @ Jan 26 2010, 05:12 PM]If you’re using Zend_Server, there might be some improvement by caching the config settings in shared memory instead of reloading/reparsing these with each tracker instance.

Here’s a modified version of core/Tracker/Config.php that you can try out.[/quote]
Thanks !
That modified version of Config.php has doubled the tracking performance of the Apache server : I am now able to service 800 requests / second with an average response time which is down to 15ms. CPUs are at 75%.
I’ll be generating a few million visits and checking the performance of the MySQL database next.
I am using JMeter on a separate 8-core server to generate the load on the Piwik/Apache server, and I am running the MySQL database on a separate 8-core server too.


(AndrewDB) #9

style_emoticons/<#EMO_DIR#>/sad.gif
Hi everybody,
It seems I spoke too soon. The new Config.php works wonders for performance but completely breaks the tracking code, apparently. Or do I have to restart Apache, or something, to make it work ? Quite simply, visits are not recorded in the database anymore.
Any help would be much appreciated.


(vipsoft) #10

I’ll take another look at it.


(vipsoft) #11

Andrew

Sorry, that file is based on the dev trunk.

In Config.php, change two instances of _parse_ini_file to simply parse_ini_file (removing the leading underscore).


(AndrewDB) #12

[quote=vipsoft @ Jan 29 2010, 04:20 AM]Andrew

Sorry, that file is based on the dev trunk.

In Config.php, change two instances of _parse_ini_file to simply parse_ini_file (removing the leading underscore).[/quote]
Hi ! Thanks for your help. I really need to get as much performance as I can out of Piwik.
In Piwik 0.5.4, the code already has parse_ini_file without the leading underscore (see attached file).
Does this mean it is already optimised ? style_emoticons/<#EMO_DIR#>/blink.gif


(vipsoft) #13

For the upcoming 0.5.5, we added a function that emulates parse_ini_file if the built-in function has been disabled for security purposes. _parse_ini_file is a wrapper that calls either the built-in function or the emulated one. The built-in parse_ini_file is faster, of course.

Make the changes to the Config.php I posted because _parse_ini_file doesn’t exist in 0.5.4.