Download files


#1

Dear All,

I am using the following regular expression to import logFiles:

python /var/www/piwik/misc/log-analytics/import_logs.py --url=https://XYZ /media/ezproxy/ezp20150729.log --idsite=35 --dry-run --log-format-regex=’(?P.)\s-\s[a-zA-Z0-9-].[(?P.?) (?P.?)] “(?P.?)"\s(?P\S+) (?P\S+)\s"(?P<user_agent>.?)”\s"(?P.*?)"’ --recorders=4 --enable-http-errors --enable-http-redirects --download-extensions=csd,ccs,dmg,enf,ens,enz,7z,aac,arc,arj,asf,asx,avi,bin,csv,deb,dmg,doc,docx,exe,gzip,hqx,jar,mpg,mp2,mp3,mp4,mpeg,mov,movie,msi,msp,odb,odf,odg,odp,ibooks,jar,mpg,mp2,mp3,mp4,mpeg,mov,movie,msi,msp,odb,odf,odg,odp,ods,odt,ogg,ogv,pdf,phps,ppt,pptx,qt,qtm,ra,ram,rar,rpm,sea,sit,tar,tbz,bz2,tgz,torrent,txt,wav,wma,wmv,wpd,xls,xlsx,xml,xsd,z,zip,azw3,epub,mobi,apk,flv,gz

This works fine, except the download files are not counted in the Piwik Backend (empty download file report); in contrast, in the --dry-run modus the download files are recognized.

When I use --log-format-name=common instead of the regular expression, there are about one third of unknown lines but download files are counted. Furthermore, with --log-format-name=common the browser types are not analyzed which is the case with the regular expression.

Has anyone an idea how to solve these problems?

Thank you!

Best
mucctecc


#2

This is the Logs import summary from last night; there are 24132 downloads recognized but I cannot see any download statistic in Piwik, i.e. there are zero download files for yesterday:


880754 requests imported successfully
24132 requests were downloads
25079 requests ignored:
    0 HTTP errors
    0 HTTP redirects
    0 invalid log lines
    0 requests did not match any known site
    0 requests did not match any --hostname
    316 requests done by bots, search engines...
    24763 requests to static resources (css, js, images, ico, ttf...)
    0 requests to file downloads did not match any --download-extensions

Website import summary

880754 requests imported to 1 sites
    1 sites already existed
    0 sites were created:

0 distinct hostnames did not match any existing site:

Performance summary

Total time: 2563 seconds
Requests imported per second: 343.61 requests per second

(Matthieu Aubry) #3

Hi there,

can you try without the parameter --download-extensions - maybe this will work better?

if you think this is a bug in Log Analytics, please create a bug report at: Issues · matomo-org/piwik-log-analytics · GitHub with a small log file of a few lines that can be used to reproduce the issue, and commands used, etc.


#4

I definitely need the additional file types csd, ccs, dmg, enf, ens and enz. I think, I will add these file types directly in import_logs.py …

If this is not working, I will create a bug report.

Thank you for your help.


#5

Okay, I reproduced the problem with some sample data, I will create a bug report.

There is no difference whether I use the --download-extensions parameter or not (I extended the download list in import_logs.py).