Trying to create a segment for downloads from imported logs


#1

Hi,

I am running PIWIK locally on a VM and have successfully imported some apache2 logs into the PIWIK database using the log analytics tool tutorial available here:
http://piwik.org/docs/log-analytics-tool-how-to/

I am doing this for a number of reason, but the first one is because on the web site I want to track stats using piwik, I do not have the required PHP version.
So I am instead exporting the apache logs from that server and importing them on a local PIWIK instance I have running on a machine which has the right version of PHP. It means there is no PHP anchor on my website pages either.

So far, so good, everything works…

Now, I have noticed with joy that PIWIK also reports on some files that can be downloaded from the website (some .PDF and .ZIP file) as they appear in the apache logs.
I can see the total number of download for each file under ACTIONS -> DOWNLOAD.
But where things are getting more complicated is when I would like to know WHO/WHICH visitor has dowloaded those files rather than just a numeric value.

I thought it would be easy enough by creating a new SEGMENT under VISITOR -> VISITOR LOGS
I selected “Actions->Page URL”, then “contains” and entered ".pdf"
I also tried different regular expression (*.)pdf, etc and also replaced “contains” with “is” and using the full URL as it appears in ACTIONS->DOWNLOADS and in the apache logs.

I made sure to select the right time window (basically the whole year) but this segment/filter never works… it never returns a result!
It does however work if I replace the reference to the file (pdf or zip) to an html file… so I know my query is correct

Is this a bug? or am I really missing something obvious?
I tried all the different filter options possible and none can return the details of the visitors who have downloaded those files… even though I can tell PIWIK sees references to those files being downloaded from the ACTIONS -> DOWNLOADS entry.

The log format I am working with is as follow:

THIS ENTRY I CANNOT REPORT ON, if I have contains "myfile.pdf"
125.45.43.1 - - [18/Mar/2015:20:24:41 +0000] “GET /files/myfile.pdf HTTP/1.1” 200 3326 “mywebsite.com -&nbspThis website is for sale! -&nbspmywebsite Resources and Information.” “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.89 Safari/537.36”

THIS ENTRY I CAN REPORT ON, if I have contains "page1.html"
125.45.43.1 - - [18/Mar/2015:24:20:54 +0000] “GET /core/page1.html HTTP/1.1” 200 3692 “mywebsite.com -&nbspThis website is for sale! -&nbspmywebsite Resources and Information.” “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.89 Safari/537.36”

If anyone can help, that would be great! :slight_smile:
garbs


(Matthieu Aubry) #2

Hi there,

What you want is not yet implemented in piwik, but we’d like to do. please comment on: New segments: Download file URL ‘downloadUrl’ and Outlink URL ‘outlinkUrl’ · Issue #4103 · matomo-org/piwik · GitHub


#3

Hi Matt,

Apparently there is a way of getting detailed visitor reports on file downloads by using “goals”:

http://www.elysiumsecurity.com/blog/Guides/post8.html

I tried and it works, but as stated in the article above, this only works on new “hits” which is a shame.

Thanks for a great tool by the way!
Garbs.