Import of IIS 8.5 log missing page data


#1

When I import a IIS 8.5 W3C default log file into Piwik, the Actions tab says there is no data for Pages (entry, exit, and titles also show no data):
[attachment 1661 piwik-page-titles.png]

The Web server is Windows Server 2012 R2 with IIS 8.5 using the default W3C log format. In the attached console log ([attachment 1660 piwik-log-import-results.pdf]), the server was accessed from Mac Chrome, Mac Firefox, and Win IE. After importing and archiving, the Visitor Log shows only a single entry for Chrome:
[attachment 1662 piwik-visitors.png]

The import results also list 0 requests to static resources when the majority of the 54 requests are for CSS and JPG files.

This is the command I used to import the log file:
python /Users/cdaniels/Documents/Sites/linux/piwik/misc/log-analytics/import_logs.py --url=https://ildcdaniels01/piwik/ /Users/cdaniels/Documents/Sites/WS2012/u_ex140326.log --idsite=15 --token-auth=<> --recorders=1 --enable-http-errors --enable-http-redirects --enable-static --add-sites-new-hosts -d

Is there something obvious that I’ve missed in the import process?


#2

I added ‘action_name’ to the args object in the import_logs.py script and now all of the items correctly show up under Actions > Pages.


(Matthieu Aubry) #3

@craig_fisv please let me know more, what exactly have you changed? I would like to fix it in the script in git.


#4

First change ignores ‘-’ as a query parameter:


Around line 1575:
----------------
try:
    hit.query_string = format.get('query_string')
    hit.path = hit.full_path
except BaseFormatException:
    hit.path, _, hit.query_string = hit.full_path.partition(config.options.query_string_delimiter)


Changed to exclude '-' because IIS defaults to '-' if there is no query string.
----------------
try:
    hit.query_string = format.get('query_string')
    if hit.query_string == '-':
        hit.query_string = ''
    hit.path = hit.full_path
except BaseFormatException:
    hit.path, _, hit.query_string = hit.full_path.partition(config.options.query_string_delimiter)

The second change adds ‘action_name’ to the args which allows the pages to show under Action > Pages. I’m setting it to our query parameter ‘content’ or the URL.


Around line 1243:
----------------
if config.options.replay_tracking:
    # prevent request to be force recorded when option replay-tracking
    args['rec'] = '0'
args.update(hit.args)


Changed to include action_name so Pages and Page Titles are populated
with our 'content' query parameter or the URL.
================
query_arguments = urlparse.parse_qs(hit.query_string)

# Page title
if u'content' in query_arguments:
    args['action_name'] = str(query_arguments[u'content'][0])
else:
    args['action_name'] = url.encode('utf8')

if config.options.replay_tracking:
    # prevent request to be force recorded when option replay-tracking
    args['rec'] = '0'
args.update(hit.args)


(Matthieu Aubry) #5

this looks interesting. I need your help to fix the issue properly:

  • can you create a ticket on our issue tracker: http://dev.piwik.org/
  • attach to the ticekt a small log file (eg. 2 lines or so) that show both bugs
  • then put your proposed change in the ticket (ie. what you have written just here)

I will replicate the problem, add it to our tests cases, and fix it. Thanks