Parsing logs very slow when using multithread import script

Hello,

i am doing some test because i have high traffic website log to parse and i need it to be fast so i want to parse 2 differents logs at the same time.

my server is a 12vCPU, 16G RAM nginx/php-fpm/mysql and i am usually using the command line without problem :
./import_logs.py --url=piwik_url --idsite=ID --debug --debug --recorders=10 --recorder-max-payload-size=150 --enable-http-errors --enable-http-redirects /my_log

For my tests i used a unique site, changed the recorders but seems like it’s very slow parsing 2 logs at the same time :

  • parsing a single log to the site i am using the command :
    ./import_logs.py --url=piwik_url --idsite=ID --debug --debug --recorders=5 --recorder-max-payload-size=150 --enable-http-errors --enable-http-redirects /my_log
    the execution time seem “correct” :
    Performance summary

Total time: 36 seconds
Requests imported per second: 373.17 requests per second

  • parsing two logs for the same site at the same time :
    commands used are :
    ./import_logs.py --url=piwik_url --idsite=ID --debug --debug --recorders=5 --recorder-max-payload-size=150 --enable-http-errors --enable-http-redirects /my_log1
    ./import_logs.py --url=piwik_url --idsite=ID --debug --debug --recorders=5 --recorder-max-payload-size=150 --enable-http-errors --enable-http-redirects /my_log2

results for the log1 :
Performance summary

Total time: 369 seconds
Requests imported per second: 37.93 requests per second

results for the log2 :
Performance summary

Total time: 370 seconds
Requests imported per second: 37.97 requests per second

I even caught an error 500 when parsing the logs at the same time but the auto retry was fine.

So what did i do wrong ? Can we parse multi logs at the same time ? Why are the parsing very very slow (36 sec parsing one log => 370 seconds by pasing 2 logs simultaneously) ?

Hello,
do you got some errors while parsing two files at the same time (nginx errors logs)?
Maybe queries lock each other?

Hello,

i only have one error in nginx but i think it’s when i got the error 500 :
2016/04/25 09:46:23 [error] 12328#0: *316619 FastCGI sent in stderr: “PHP message: Error in Piwik (tracker): Error query: SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction In query: INSERT INTO piwik_option (option_name, option_value, autoload) VALUES (?, ?, ?) ON DUPLICATE KEY UPDATE option_value = ? Parameters: array ( 0 => ‘report_to_invalidate_31_2016-04-21’, 1 => ‘1’, 2 => 0, 3 => ‘1’, )” while reading response header from upstream, client: 127.0.0.1, server: piwik_url, request: “POST /piwik.php HTTP/1.1”, upstream: “fastcgi://127.0.0.1:9000”, host: “piwik_url”

In mysql i have very few slow query but nothing explaining the slow parsing from the logs at the same time.

Regards,

Can you can try to reduce the max-payload? (with two process you have doubled this value).
Maybe reduce the number of records will also help.

You can always try how it will work with the queuedtracking plugin and redis.

Hello jlubzinski,

i reduced the max-payload from 150 to 100 and i don’t have the errors.

Thx for your answer.

Regards,