Issue with logfile tracking in replay mode

Hello,

I am trying to recover some PIWIK data by reading the Apache Logfiles which contain the recordings of the JS tracking requests.
Unfortunalety i get the following error:


PS D:\www\piwik\misc\log-analytics> python ./import_logs.py --url="http://my.piwik.domain/piwik" --replay-tracking "D:/www/sort.log"


Traceback (most recent call last):
  File "./import_logs.py", line 1737, in <module>
    resolver = config.get_resolver()
  File "./import_logs.py", line 653, in get_resolver
    return DynamicResolver()
  File "./import_logs.py", line 1017, in __init__
    self._cache['sites'] = piwik.call_api('SitesManager.getAllSites')
  File "./import_logs.py", line 974, in call_api
    return cls._call_wrapper(cls._call_api, None, None, method, **kwargs)
  File "./import_logs.py", line 963, in _call_wrapper
    raise Piwik.Error(message)
__main__.Error: Piwik returned an invalid response: <!DOCTYPE html>
<!--[if lt IE 9 ]>
<html class="old-ie"> <![endif]-->
<!--[if (gte IE 9)|!(IE)]><!-->
<html><!--<![endif]-->
<head>
    <meta charset="utf-8">
        <meta http-equiv="x-ua-compatible" content="IE=EDGE,chrome=1" >
    <title>Piwik &rsaquo; Sign in</title>

        <link rel="shortcut icon" href="plugins/CoreHome/images/favicon.ico"/>
        <link rel="stylesheet" type="text/css" href="index.php?module=Proxy&action=getCss&cb=ad8ba31f2b25f3a7e23
a838720a65b47" />

        <link rel="stylesheet" type="text/css" href="plugins/Login/stylesheets/login.css?cb=ad8ba31f2b25f3a7e23a838720a6
5b47" />
    <meta name="description" content="free/libre analytics platform"/>
    <meta name="apple-itunes-app" content="app-id=737216887" />
    <meta name="google-play-app" content="app-id=org.piwik.mobile2">
        <link rel="stylesheet" type="text/css" href="index.php?module=Proxy&action=getCss&cb=ad8ba31f2b25f3a7e23
a838720a65b47" />

    <script type="text/javascript">
var translations = {"CorePluginsAdmin_NoZipFileSelected":"Please select a ZIP file.","General_InvalidDateRange":"Invalid
 Date Range, Please Try Again","General_Loading":"Loading...","General_Show":"show","General_Hide":"hide","General_YearS
hort":"yr","General_MultiSitesSummary":"All Websites","CoreHome_YouAreUsingTheLatestVersion":"You are using the latest v
ersion of Piwik!","CoreHome_IncludeRowsWithLowPopulation":"Rows with low population are hidden %s Show all rows","CoreHo
me_ExcludeRowsWithLowPopulation":"All rows are shown %s Exclude low population","CoreHome_DataTableIncludeAggregateRows"
:"Aggregate rows are hidden %s Show them","CoreHome_DataTableExcludeAggregateRows":"Aggregate rows are shown %s Hide the
m","CoreHome_Default":"default","CoreHome_PageOf":"%1$s of %2$s","CoreHome_FlattenDataTable":"The report is hierarchical
 %s Make it flat","CoreHome_UnFlattenDataTable":"The report is flat %s Make it hierarchical","CoreHome_ExternalHelp":"He
lp (opens in new tab)","SitesManager_NotFound":"No websites found for","Annotations_ViewAndAddAnnotations":"View and add
 annotations for %s...","General_RowEvolutionRowActionTooltipTitle":"Open Row Evolution","General_RowEvolutionRowActionT
ooltip":"See how the metrics for this row changed over time","Annotations_IconDesc":"View notes for this date range.","A
nnotations_IconDescHideNotes":"Hide notes for this date range.","Annotations_HideAnnotationsFor":"Hide annotations for %
s...","General_LoadingPopover":"Loading %s...","General_LoadingPopoverFor":"Loading %s for","General_ShortMonth_1":"Jan"
,"General_ShortMonth_2":"Feb","General_ShortMonth_3":"Mar","General_ShortMonth_4":"Apr","General_ShortMonth_5":"May","Ge
neral_ShortMonth_6":"Jun","General_ShortMonth_7":"Jul","General_ShortMonth_8":"Aug","General_ShortMonth_9":"Sep","Genera
l_ShortMonth_10":"Oct","General_ShortMonth_11":"Nov","General_ShortMonth_12":"Dec","General_LongMonth_1":"January","Gene
ral_LongMonth_2":"February","General_LongMonth_3":"March","General_LongMonth_4":"April","General_LongMonth_5":"May","Gen
eral_LongMonth_6":"June","General_LongMonth_7":"July","General_LongMonth_8":"August","General_LongMonth_9":"September","
General_LongMonth_10":"October","General_LongMonth_11":"November","General_LongMonth_12":"December","General_ShortDay_1"
:"Mon","General_ShortDay_2":"Tue","General_ShortDay_3":"Wed","General_ShortDay_4":"Thu","General_ShortDay_5":"Fri","Gene
ral_ShortDay_6":"Sat","General_ShortDay_7":"Sun","General_LongDay_1":"Monday","General_LongDay_2":"Tuesday","General_Lon
gDay_3":"Wednesday","General_LongDay_4":"Thursday","General_LongDay_5":"Friday","General_LongDay_6":"Saturday","General_
LongDay_7":"Sunday","General_DayMo":"Mo","General_DayTu":"Tu","General_DayWe":"We","General_DayTh":"Th","General_DayFr":
"Fr","General_DaySa":"Sa","General_DaySu":"Su","General_Search":"Search","General_Clear":"Clear","General_MoreDetails":"
More Details","General_Help":"Help","General_Id":"Id","General_Name":"Name","General_JsTrackingTag":"JavaScript Tracking
 Code",

What is it that i am doing wrong?
Do the logfiles need to be in a special sort of format?

Thanks in advance for you kind help

greeting

Stephan

hi there, after running the log import, can you check your server access log file and see the HTTP requests that were made by the log importer tool? one of this URL should return JSON (as it’s calling the API) but on your server it returns the HTML login page instead. that’s what causing the error. not sure why.

I have just run into this problem with my setup. I did have it working and then it broke. In my case it turned out to be a change that I had made to my configuration. I wanted to enforce https so I added “force_ssl=1” to my “config.ini.php” file. That change broke the log importing script. It seems that the issue is a protocol mismatch. If you get to your site via https or you have it configured to force SSL then you need to make sure that you also use https on you import command. So you above command should look like;


PS D:\www\piwik\misc\log-analytics> python ./import_logs.py --url="https://my.piwik.domain/piwik" --replay-tracking "D:/www/sort.log"

Just added one character. Let me know if this helps.

Hi @tlhogan I think this is a bug in Piwik, could you please create an issue and explain how to reproduce at: Issues · matomo-org/piwik · GitHub ? thanks

Hey guys,

thanks for the helpful answers.

indeed, i enforce https on my server. After adding this to the command the script executes. but it seems rather slow to me. also i cannot see any request in the apache logfiles so far. it is not through yet. every 600 lines or so it crashes with an internal server error and needs to be restarted.

anything to be done here?


7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
7202 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)

Hi @matt. Are you sure this is a bug? The tracking code generated by the GUI seems to be correct and my thoughts are that if you enforced SSL they you should automatically be using https in the script to do a log import. I can open an issue if you think that is best but could you give me an idea of what to title it?

Regards,
Tim

HI @Sm4ster. You got me on that one. All I can think of suggesting is what doe the web server log file show? Are you running out of memory? What is the web server? Are you sure the logs are written in the correct format? Sorry I don’t have a beet answer for you.

Regards,
Tim

Hi,

what formats do the logs have to be in? I am running Apache, the log format should be default. I dont think i am running out of memory or anything, bcs the server is quite strong and not very busy atm.


   LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
    LogFormat "%h %l %u %t \"%r\" %>s %b" common

The Webserverrequests dont show anything important. it is like the script isnt talking to the server at all. but i might be wrong. since it is mixed up with normal tracking requests and i clicked on the piwik gui in between.

regards,

Stephan

Hello?!

Maybe even a hint on whether the logfile format is correct?

I think that is correct. I just followed the information at https://github.com/piwik/piwik/tree/master/misc/log-analytics#readme. About mid way through there is a section about setting up Apache log format. If it is crashing, I am wondering if you are running into a memory issue with running on Windows. That would be way beyond my skill set to figure out though.