"403: Forbidden" with import_logs.py


#1

I’m getting a 403 error when attempting to import my (nginx) logs using import_logs.py. I’ve verified:
[ul]
[li] The web UI is working fine and collecting statistics.
[/li][li] I’ve verified that the token_auth is correct by accessing the API over HTTP
[/li][li] The same error occurs whether I use token_auth or login/password with import_logs.py
[/li][li] I’ve verified that the server can access the web UI, so it’s not a weird firewall rule preventing it from accessing itself
[/li][/ul]
Curiously, there are no records of requests in either the access or error log for the nginx vhost serving piwik (that is, it logs my web UI access just fine, but as far as I can tell, import_logs.py doesn’t even contact it in the first place).


root@harrenhal:~# python /home/piwik/piwik/misc/log-analytics/import_logs.py --url='http://piwik.petterhaggholm.net' --enable-static --enable-bots --enable-http-errors --enable-http-redirects /var/log/nginx/phnet.access.log --idsite='1' --debug --token-auth='********************************'
2013-08-06 16:02:37,180: [DEBUG] Accepted hostnames: all
2013-08-06 16:02:37,180: [DEBUG] Piwik URL is: http://piwik.petterhaggholm.net
2013-08-06 16:02:37,180: [DEBUG] Authentication token token_auth is: ********************************
2013-08-06 16:02:37,181: [DEBUG] Resolver: static
2013-08-06 16:02:37,216: [DEBUG] Error when connecting to Piwik: HTTP Error 403: Forbidden
2013-08-06 16:02:39,250: [DEBUG] Error when connecting to Piwik: HTTP Error 403: Forbidden
2013-08-06 16:02:41,291: [DEBUG] Error when connecting to Piwik: HTTP Error 403: Forbidden
Traceback (most recent call last):
  File "/home/piwik/piwik/misc/log-analytics/import_logs.py", line 1573, in <module>
    resolver = config.get_resolver()
  File "/home/piwik/piwik/misc/log-analytics/import_logs.py", line 529, in get_resolver
    return StaticResolver(self.options.site_id)
  File "/home/piwik/piwik/misc/log-analytics/import_logs.py", line 872, in __init__
    'SitesManager.getSiteFromId', idSite=self.site_id
  File "/home/piwik/piwik/misc/log-analytics/import_logs.py", line 854, in call_api
    return cls._call_wrapper(cls._call_api, None, None, method, **kwargs)
  File "/home/piwik/piwik/misc/log-analytics/import_logs.py", line 843, in _call_wrapper
    raise Piwik.Error(message)
__main__.Error: Forbidden

Anyone have any advice/ideas?


#2

Well, I looked at the requests send via urllib2 and discovered what the problem is, though I don’t actually know where: import_log.py does not set a User-Agent header, and all requests to the Piwik API on my website that omit the User-Agent header are rejected with 403 Forbidden.

Although nginx does have the ability to reject blank User-Agents, I do not have this option configured. As such, I can only suppose that the requests are rejected by Piwik itself, though if so, I don’t know where in its source this may be.

(The workaround is obvious, of course: Set a User-Agent header in import_logs.py.)


#3

had the exact same problem importing logs, except running an apache webserver.

my workaround was placing an .htaccess file in the piwik directory containing:

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^$ [OR]
RewriteRule ^.* - [L]

once that was there, the python script worked.

I’ll delete the .htaccess now that I’ve successfully imported the old log.


#4

I am experiencing the same problem, but @ecruhling’s .htaccess solution didn’t work for me.

@haggholm – could you post the relevant patch to set the user-agent? I’m not much of a python hacker, and fumbling attempts to set the user agent have not been successful.

thanks.


#5

My patch was pretty damned trivial; I don’t recall the version, alas, but it’s just:


766d765
<         headers['User-Agent'] = 'Piwik Bulk Log Import'

I.e. in Piwik._call(),


        if data is None:
            # If Content-Type isn't defined, PHP do not parse the request's body.
            headers['Content-type'] = 'application/x-www-form-urlencoded'
            data = urllib.urlencode(args)
        elif not isinstance(data, basestring) and headers['Content-type'] == 'application/json':
            data = json.dumps(data)

        # Make sure we set a User-Agent header!
        headers['User-Agent'] = 'Piwik Bulk Log Import'
        request = urllib2.Request(url + path, data, headers)
        response = urllib2.urlopen(request)
        result = response.read()
        response.close()
        return result


(Matthieu Aubry) #6

this was included in core a few months ago: https://github.com/piwik/piwik/blob/master/misc/log-analytics/import_logs.py#L897-897

so maybe that’s not the issue if you are already using 2.1 or 2.2


#7

(It was my problem back at the time of the original post, and my reply suggesting the solution that was [presumably independently] committed a few days later, but indeed it’s unlikely to be foobard’s problem now.)


#8

Hi,

I have just installed piwik but still get the same error (1/4/2015).


/var/www/html/piwik/misc/log-analytics# python import_logs.py --url=mindset-tool.businessgrowthservice.co.uk  --enable-http-errors --enable-http-redirects --enable-static --enable-bots /var/log/apache2/access.log --idsite=1
Traceback (most recent call last):
  File "import_logs.py", line 2099, in <module>
    resolver = config.get_resolver()
  File "import_logs.py", line 880, in get_resolver
    return StaticResolver(self.options.site_id)
  File "import_logs.py", line 1221, in __init__
    'SitesManager.getSiteFromId', idSite=self.site_id
  File "import_logs.py", line 1204, in call_api
    return cls._call_wrapper(cls._call_api, None, None, method, **kwargs)
  File "import_logs.py", line 1193, in _call_wrapper
    raise Piwik.Error(message)
__main__.Error: Forbidden


I have check the code has the recommend fix from above and i have also added the htaccess file, but no luck …
running it as root for testing …
server is apache2, vanilla install …
thanks

Oliver


#9

fixed it, needed to add host/vhost details for piwik so ti posted to the correct url …