import_logs.py - Error when connecting to Piwik

Unfortantly no, I have never managed to get logs into Piwik from either Apache or IIS. Just stalls with the above error. I tried everything suggested, it appears to be a bug but I 'm not a Python expert so got stuck at that point. I did send the logs onto the developers but heard nothing back. So I stopped using it …or trying to use it I should say.

Very disappointed!!! :X

“You should probably have a look into adjusting your server config”

Adjust - in which way ?

I am not willing to desperately try out some lgo formats without knowing why…

Consider that importing logs will take more time than a typical web request given the size of the log to process. If you import daily then, the server must be able to process the size of log produced without timing out in NGINX or in the FastCGI PHP server.

Most default configurations have modest values for timeouts, the server has to be specifically allowed to let a process run longer and use more memory.

Adjust the timeouts to a higher threshold for NGINX and PHP FastCGI, if set really high and the import still fails there is some other problem

NGINX vhost config for piwik

set this value high so

fastcgi_read_timeout 14400; # 4 hrs

Module ngx_http_fastcgi_module “Defines a timeout for reading a response from the FastCGI server.”

By setting this high we can ensure NGINX will wait long enough for the processing to complete in order to get a response.

FastCGI / PHP pool config for piwik

; make sure php can log errors if something is wrong
php_admin_flag[log_errors] = on
php_admin_value[error_log] = </path/to/log>

; set to something reasonable, make sure there are no out of memory errors in log otherwise increase value with respect to server’s ram
php_admin_value[memory_limit] = 512M

; set these values high, 1 hr
php_admin_value[max_execution_time] = 3600
php_admin_value[max_input_time] = 3600

You should read and understand NGINX’s configuration as well as for PHP FastCGI.

That corresponds to setting “max_execution_time” in php.ini, right?

Btw. I am using Apache 2.4 with mod_php

The logfile is NOT processed by th webserver but by the python script adn the result are sent to the webserver (in small chunks?)

Nothing changed. Still this error :frowning:

[quote=rdux]
Consider that importing logs will take more time than a typical web request given the size of the log to process. If you import daily then, the server must be able to process the size of log produced without timing out in NGINX or in the FastCGI PHP server.

Most default configurations have modest values for timeouts, the server has to be specifically allowed to let a process run longer and use more memory.

Adjust the timeouts to a higher threshold for NGINX and PHP FastCGI, if set really high and the import still fails there is some other problem

NGINX vhost config for piwik

set this value high so

fastcgi_read_timeout 14400; # 4 hrs

Module ngx_http_fastcgi_module “Defines a timeout for reading a response from the FastCGI server.”

By setting this high we can ensure NGINX will wait long enough for the processing to complete in order to get a response.

FastCGI / PHP pool config for piwik

; make sure php can log errors if something is wrong
php_admin_flag[log_errors] = on
php_admin_value[error_log] = </path/to/log>

; set to something reasonable, make sure there are no out of memory errors in log otherwise increase value with respect to server’s ram
php_admin_value[memory_limit] = 512M

; set these values high, 1 hr
php_admin_value[max_execution_time] = 3600
php_admin_value[max_input_time] = 3600

You should read and understand NGINX’s configuration as well as for PHP FastCGI.[/quote]

[quote=vovando]
Nothing changed. Still this error :frowning:

/mnt/data/importlog/ …

Are reading the log file from an NFS mount? If so, try copying locally.

[quote=rdux]

[quote=vovando]
Nothing changed. Still this error :frowning:

/mnt/data/importlog/ …

Are reading the log file from an NFS mount? If so, try copying locally.[/quote]

No ntfs mount. The same situation locally…

@vovando

Read and understand the documentation for usage and configuration for Piwik, NGINX, and PHP FastCGI / PHP FPM depending on what you are using.

python /mnt/data/www/piwik/misc/log-analytics/import_logs.py -d --url=http:///mnt/data/importlog//*******_access.log.2013080510 --idsite=4 --recorders=12 --enable-http-errors --enable-http-redirects --enable-static --enable-bots

Your command is wrong. You are specifying too many recorders but more importantly the --url is for the piwik server address not the path to the log file.

Remember, you cannot import any log data older than what is already indexed (imported previously).

There is a README.md in the same folder as the import_logs.py script. Read that.

From that is:

How to use this script?

The most simple way to import your logs is to run:

./import_logs.py --url=piwik.example.com /path/to/access.log

[…]

To improve performance,

  1. by default, the script one thread to parse and import log lines.
    you can use the --recorders option to specify the number of parallel threads which will
    import hits into Piwik. We recommend to set --recorders=N to the number N of CPU cores
    that the server hosting Piwik has. The parsing will still be single-threaded,
    but several hits will be tracked in Piwik at the same time.

[quote=rdux]
@vovando

Read and understand the documentation for usage and configuration for Piwik, NGINX, and PHP FastCGI / PHP FPM depending on what you are using.

python /mnt/data/www/piwik/misc/log-analytics/import_logs.py -d --url=http:///mnt/data/importlog//*******_access.log.2013080510 --idsite=4 --recorders=12 --enable-http-errors --enable-http-redirects --enable-static --enable-bots

Your command is wrong. You are specifying too many recorders but more importantly the --url is for the piwik server address not the path to the log file.

There is a README.md in the same folder as the import_logs.py script. Read that.

From that is:

How to use this script?

The most simple way to import your logs is to run:

./import_logs.py --url=piwik.example.com /path/to/access.log

[…]

To improve performance,

  1. by default, the script one thread to parse and import log lines.
    you can use the --recorders option to specify the number of parallel threads which will
    import hits into Piwik. We recommend to set --recorders=N to the number N of CPU cores
    that the server hosting Piwik has. The parsing will still be single-threaded,
    but several hits will be tracked in Piwik at the same time.[/quote]

My command is right. The piwik path is correct. But I changed it to stars, and it looks wrong, maybe. --url is the piwik server address and not the path to the log file. Sure.
It has worked more than 6 month, but in some moment stoped…

My command is right. The piwik path is correct. But I changed it to stars, and it looks wrong, maybe. --url is the piwik server address and not the path to the log file. Sure.
It has worked more than 6 month, but in some moment stoped…

ok I see now. How big is the log file and why are you specifying so many recorders?

[quote=rdux]

My command is right. The piwik path is correct. But I changed it to stars, and it looks wrong, maybe. --url is the piwik server address and not the path to the log file. Sure.
It has worked more than 6 month, but in some moment stoped…

ok I see now. How big is the log file and why are you specifying so many recorders?[/quote]

One log file size is 100 Kb - 300 Mb, this situation is with all files. Recorders? Don’t remember. Maybe my mistake.