Use SitesManager API from PHP?

Hello!

I have developed a php script to transfer statistics from a legacy stats system for Wordpress Multiuser to Piwik. Now I want to transfer our back stats (about ~1 300 000 actions) using it, but it’s going too slow (about 3000 items per hour).

I think the reason is because I use the sites SitesManager.getSitesIdFromSiteUrl by calling it via HTTP.

So, for every action I am performing this:


        $piwik_api_call = $piwik_url . '?module=API&method=SitesManager.getSitesIdFromSiteUrl&url='. $site_url .'&token_auth='. $token_auth .'';
        $reply = file_get_contents($piwik_api_call);

Is there a way to include the SitesManager API into my own script?

I didn’t check the documentation closely enough, below is an example that works by putting the file in the root directory of your Piwik install.

However, the script is still running very slowly. Does anyone have any tips for speeding it up? I am using the Tracking API for inserting hits, and it is using a HTTP request for every new hit. I’d like to use it in the same way as SitesManager API below, is it possible?


    /*
    define('PIWIK_INCLUDE_PATH', realpath('..'));
    define('PIWIK_USER_PATH', realpath('..'));
    */
    define('PIWIK_ENABLE_DISPATCH', false);
    define('PIWIK_ENABLE_ERROR_HANDLER', false);
    define('PIWIK_ENABLE_SESSION_START', false);

    function piwik_id_by_url($site_url,$token_auth)
    {

        require_once "index.php";
        require_once "core/API/Request.php";
        Piwik_FrontController::getInstance()->init();
 
        // This inits the API Request with the specified parameters
        $request = new Piwik_API_Request('
        			method=SitesManager.getSitesIdFromSiteUrl
        			&url='.$site_url.'
        			&format=XML
        			&token_auth='.$token_auth.'
        ');
        // Calls the API and fetch XML data back
        $reply = $request->process();
        preg_match('/\<idsite\>(.+)\<\/idsite\>/', $reply, $matches);
        if(isset($matches[1]))
            return $matches[1];
        else
            return null;           
    }

    $site_id = piwik_id_by_url('http://my.site.com','TOKEN AUTH');
    echo $site_id;


You should call the API only once for every website and cache it in memory in your script.

In our tests we have found 100-200 req / second

I have tweaked the previous code. Site IDs are now cached in the script.

I also process the old logs on a site-by-site basis, so i only create a new PiwikTracker a couple of hundred times.

In benchmarks I have found that the code below is causing the delay. (I do this for every hit.)


                $t->setIp(long2ip($row['ip_int2']));
                $t->setURL($row['url']);
                $t->setForceVisitDateTime((strtotime($row['timestamp'])-3600)); //Timezone offset
                $t->doTrackPageView($row['title']);

I am still getting online 3-4000 items per hour with this.

Can you advise on how I can add visits quicker than via the PiwikTracker.php method?

Edit: After benchmarking, it seems that the query that creates new visitors seems to be running very slowly. If I file all users under the same IP I get speeds similar to what you describe. (~200/second), but whenever a new IP is processed everything halts for 2-10 seconds.

whenever a new IP is processed everything halts for 2-10 seconds.

This is because of the provider plugin which does DNS lookups.

I should have told you that for high performance, it is betterr to disable the Provider plugin.

That seems so simple now, didn’t consider the DNS lookups.

Thanks for the assistance, I will make the adjustments you suggested.