LOAD DATA INFILE in chroot environment

Hi!

I’m using Nginx with PHP-FPM to keep my Piwik installation in a chrooted environment. I’d like to use the “LOAD DATA INFILE” feature because the normal archiving methods are way too slow for processing my > 700.000 daily visits. I used Apache2, but today I moved to Nginx and stumbled over this problem:


Try #1: LOAD DATA INFILE : SQLSTATE[HY000]: General error: 13 
Can't get stat of '/web/tmp/assets/piwik_option-bacca2a4e382aafddbd4e02d4cef7617.csv' (Errcode: 2)

I already figured out the cause:

My Piwik installation is stored in


"/srv/www/web/sites/domainxy/data/web"

So the CSV file is stored here:


"/srv/www/web/sites/domainxy/data/web/tmp/assets/piwik_option-bacca2a4e382aafddbd4e02d4cef7617.csv"

PHP is chrooted to this directory:


"/srv/www/web/sites/domainxy/data"

The CSV file is successfully generated, but when Piwik advises MySQL to load the file, it sends the chrooted and not the real path.
As a workaround, I modified “core/Db/BatchInsert.php” and changed the following line:


//$rc = self::createTableFromCSVFile($tableName, $fields, $filePath, $fileSpec);
$rc = self::createTableFromCSVFile($tableName, $fields, '/srv/www/web/sites/domainxy/data/' . $filePath, $fileSpec);

By doing so, Piwik creates the CSV file like it always does, but provides MySQL with the non-chrooted path.

My question is: Does someone had the same problem and found a better way to handle this without modifying the code?
I don’t want to keep this workaround, because I don’t want to “repair” this line after every update.

Would it be worth to open a request for enhancement to add a config option for chrooted environments or absolute paths?

Cheers,
Thomas

it sends the chrooted and not the real path.

can you explain further this problem?
Maybe we can provide “good” solution to this but I admit not being familiar with this setup

Hi Matt,

[quote=matt]
can you explain further this problem?
Maybe we can provide “good” solution to this but I admit not being familiar with this setup[/quote]

great! Okay, I’ll try to explain the setup. Let’s start with the basic filesystem. Imagine a domain named “stats.domain.xy” which is used for Piwik:


/srv/www/web/sites/domain.xy_stats/
├── conf
│   ├── nginx.conf
│   └── php-fpm.conf
├── data
│   ├── home
│   │   └── web000
│   ├── sock
│   │   └── php-fpm
│   ├── tmp
│   └── web
│       └── piwik-data-is-here.php
└── log
    ├── access.log
    ├── error.log
    └── php-fpm_access.log

Everything related to this Piwik installation is in “/srv/www/web/sites/domain.xy_stats/”. The traditional so-called DocumentRoot here is “/srv/www/web/sites/domain.xy_stats/data/web” as defined in the nginx configuration. “/srv/www/web/sites/domain.xy_stats/data” normally contains a lot more files that are required for the chroot environment (libs, etc.), but I excluded them for easier understanding.


        server
        {
                listen x.x.x.x:80;
                server_name stats.domain.xy;

                access_log /srv/www/web/sites/domain.xy_stats/log/access.log;
                error_log /srv/www/web/sites/domain.xy_stats/log/error.log;

                root /srv/www/web/sites/domain.xy_stats/data/web/;
                index index.php index.html index.htm;

                location /
                {
                        try_files $uri $uri/ /index.php?$args;
                        autoindex  off;
                }

                # pass the PHP scripts to FastCGI server listening on a socket
                location ~ \.php$ {
                        include /etc/nginx/fastcgi_params;
                        fastcgi_pass unix:/srv/www/web/sites/domain.xy_stats/data/sock/php-fpm;
                        fastcgi_param SCRIPT_FILENAME /web$fastcgi_script_name;
                        fastcgi_param PATH_INFO $fastcgi_script_name;
                        fastcgi_param DOCUMENT_ROOT /web;
                }
        }

The interesting part here is the PHP-FPM configuration. I’m using a socket to communicate with the FastCGI Process Manager. Nginx passes every request to a .php file to the socket.

PHP-FPM config:


[global]
pid = /var/run/php-fpm.pid
error_log = /var/log/php-fpm.log

[domain.xy_stats]
prefix = /srv/www/web/sites/$pool/data
user = webxy
group = webxy
listen = /srv/www/web/sites/$pool/data/sock/php-fpm
listen.owner = root
listen.group = root
listen.mode = 0666
pm = dynamic
pm.max_children = 5
pm.start_servers = 2
pm.min_spare_servers = 1
pm.max_spare_servers = 3
access.log = /srv/www/web/sites/$pool/log/php-fpm_access.log
slowlog = /srv/www/web/sites/$pool/log/php-fpm.slow.log
chroot = $prefix
chdir = /

PHP-FPM is available through the socket “/srv/www/web/sites/domain.xy_stats/data/sock/php-fpm”. This instance is configured as “chrooted”, so PHP believes that “/srv/www/web/sites/domain.xy_stats/data” would be the root filesystem “/”. A security advantage of this is that PHP is never able to access directories outside the chrooted jail.

Let’s go back now to the problem. Piwik’s system check page shows the following error message:


Try #1: LOAD DATA INFILE : SQLSTATE[HY000]: General error: 13 
Can't get stat of '/web/tmp/assets/piwik_option-bacca2a4e382aafddbd4e02d4cef7617.csv' (Errcode: 2)

Since PHP-FPM is running in the described changed-root environment, it doesn’t know that it lives in a deep subdirectory, so it tells MySQL the absolute path ‘/web/tmp/assets/piwik_option-bacca2a4e382aafddbd4e02d4cef7617.csv’ for loading the “DATA INFILE”. On the other hand, the MySQL daemon doesn’t run chrooted, so it know about all system directories. MySQL receives the “LOAD DATA INFILE” command and tries to access the absolute path ‘/web/tmp/assets/piwik_option-bacca2a4e382aafddbd4e02d4cef7617.csv’… and fails, because this directory isn’t existing. “/srv/www/web/sites/domain.xy_stats/data” has to be prepended to path that PHP tells to get the real absolute path. This is what my workaround does:


//$rc = self::createTableFromCSVFile($tableName, $fields, $filePath, $fileSpec);
$rc = self::createTableFromCSVFile($tableName, $fields, '/srv/www/web/sites/domain.xy_stats/data/' . $filePath, $fileSpec);

Conclusion:
MySQL’s “LOAD DATA INFILE” fails because of mixing chrooted and non-chrooted components. A possible way so solve this would IMO be to define a config option like “absolute_chroot_path” which would usually stay empty if no chroot environment is used.


$absoluteChrootPath = '/srv/www/web/sites/domain.xy_stats/data'; // defined via global.ini.php
$rc = self::createTableFromCSVFile($tableName, $fields, $absoluteChrootPath . $filePath, $fileSpec);

The above line should then work with and without chrooted environments, but I don’t know about Piwik’s architecture rules and configuration guidelines.
I’m sure that there should be a way to let Piwik know about it’s un-chrooted, absolute path. Without this, Piwik won’t be able to communicate with external services about files.

If you need further information, just let me know. I’d really appreciate your help here.

Cheers,
Thomas

Thanks for the info! I created ticket & fixed at: LOAD DATA INFILE in chroot environment · Issue #4442 · matomo-org/matomo · GitHub

can you please confirm the fix works?

Wow, that was fast! It works like a charm! :slight_smile:

This helps me a lot, thank you very much, Matt.

Regards,
Thomas