Full MySQL table scan without index?

Hi,
Still testing Piwik to get the highest performance possible (right now I am hitting a wall at around 200 hits/s style_emoticons/<#EMO_DIR#>/ohmy.gif ), it seems Piwik does a full table scan of table piwik_log_visit for each visit. Also it seems this table is not indexed on the hash code (column config_md5config) or the cookie (visitor_idcookie). Would it be possible to add these two indexes to the Piwik PHP code and so speed up the system ? style_emoticons/<#EMO_DIR#>/blink.gif
Where do I need to patch the Piwik source code to include these two indexes ?
Thanks for any answer,
Cheers,
Andrew

Let us know what the performance improvement is after you create those indexes.

If you are submitting a patch, the schema is defined in core/Piwik.php, while update scripts are placed in core/Updates.

[quote=vipsoft @ Jan 22 2010, 04:45 PM]Let us know what the performance improvement is after you create those indexes.
…[/quote]
Hi everybody,
OK, here is what I did :

  1. I created an index, which I called [b]idx_hash[/b], on the [b]first 8 characters[/b] of the [b]config_md5config[/b] field of the [b]piwik_log_visit[/b] table.
  2. I searched for and found the SELECT statement in the piwik/core/Tracker/Visit.php file. I believe this SELECT statement was the cause of rather expensive table scans.
  3. I modified the SELECT statement by adding FORCE INDEX (idx_hash) just before the WHERE...

The performance of the Piwik tracker before this change was around 50 requests /second with an average response time of 100ms. After the change, I got 450 requests / second with an average response time of 25ms!
Actually I am now hitting a different wall, in the sense that MySQL is not the system bottleneck anymore, but CPU performance on the Apache/PHP/Piwik server : my 8-core Linux server is running at 75% CPU utilization on all 8 cores.

A cursory glance suggests we might be able to replace the existing index, e.g.,

DROP INDEX index_idsite_date ON piwik_log_visit;
CREATE INDEX index_idsite_date_config ON piwik_log_visit (idsite, visit_server_date, config_md5config);

[quote=vipsoft @ Jan 26 2010, 05:51 AM]A cursory glance suggests we might be able to replace the existing index, e.g.,

DROP INDEX index_idsite_date ON piwik_log_visit;
CREATE INDEX index_idsite_date_config ON piwik_log_visit (idsite, visit_server_date, config_md5config);

[/quote]
Thanks, I am going to try that.
However, I have noticed that indexing on the full config_md5config column is extremely expensive, so I would suggest indexing on only the first 8 characters :

DROP INDEX index_idsite_date ON piwik_log_visit;
CREATE INDEX index_idsite_date_config ON piwik_log_visit (idsite, visit_server_date, config_md5config(8));

OK, tested that, it works, I am getting 450 requests / second with an average response time of 25ms which seems to be the CPU bottleneck I mentioned above.

Andrew, this is an interesting experiment, please let us know the results.

Do you have the mysql slow logs for this query? Your index piwik_log_visit (idsite, visit_server_date, config_md5config(8)) looks good indeed - I’m a bit surprised it wasn’t already defined like this style_emoticons/<#EMO_DIR#>/blink.gif thanks for pointing it out.

What are the conditions of your tests? a real test would include serving that much traffic over the course of many days as the performance decrease with increased data size (ie 450 req/s would be come 100 req/s after a few millions rows probably?).

If you’re using Zend_Server, there might be some improvement by caching the config settings in shared memory instead of reloading/reparsing these with each tracker instance.

Here’s a modified version of core/Tracker/Config.php that you can try out.

[quote=vipsoft @ Jan 26 2010, 05:12 PM]If you’re using Zend_Server, there might be some improvement by caching the config settings in shared memory instead of reloading/reparsing these with each tracker instance.

Here’s a modified version of core/Tracker/Config.php that you can try out.[/quote]
Thanks !
That modified version of Config.php has doubled the tracking performance of the Apache server : I am now able to service 800 requests / second with an average response time which is down to 15ms. CPUs are at 75%.
I’ll be generating a few million visits and checking the performance of the MySQL database next.
I am using JMeter on a separate 8-core server to generate the load on the Piwik/Apache server, and I am running the MySQL database on a separate 8-core server too.

style_emoticons/<#EMO_DIR#>/sad.gif
Hi everybody,
It seems I spoke too soon. The new Config.php works wonders for performance but completely breaks the tracking code, apparently. Or do I have to restart Apache, or something, to make it work ? Quite simply, visits are not recorded in the database anymore.
Any help would be much appreciated.

I’ll take another look at it.

Andrew

Sorry, that file is based on the dev trunk.

In Config.php, change two instances of _parse_ini_file to simply parse_ini_file (removing the leading underscore).

[quote=vipsoft @ Jan 29 2010, 04:20 AM]Andrew

Sorry, that file is based on the dev trunk.

In Config.php, change two instances of _parse_ini_file to simply parse_ini_file (removing the leading underscore).[/quote]
Hi ! Thanks for your help. I really need to get as much performance as I can out of Piwik.
In Piwik 0.5.4, the code already has parse_ini_file without the leading underscore (see attached file).
Does this mean it is already optimised ? style_emoticons/<#EMO_DIR#>/blink.gif

For the upcoming 0.5.5, we added a function that emulates parse_ini_file if the built-in function has been disabled for security purposes. _parse_ini_file is a wrapper that calls either the built-in function or the emulated one. The built-in parse_ini_file is faster, of course.

Make the changes to the Config.php I posted because _parse_ini_file doesn’t exist in 0.5.4.