Content-type of reporting API csv files


#1

Is there currently a way to change how Piwik and/or the Reporting API encodes CSV files and/or assigns their content-type?

I’ve been having a difficulty with CSV files produced by the Piwik Reporting HTTP API. I’ve searched Forums, FAQs, and the Issues tickets on the Piwik GitHub repo and don’t find anything resembling this particular situation.

I’m trying to phase out Excel in favor of the statistical language R. CSV files from the API work fine in Excel, but not in R. Attempts to read them into memory as an object fail. My webmaster and I hypothesize that it’s a difficulty with the encoding and/or content-type native to the API’s CSV option.

Here is the header and also a truncated portion of the first row of the CSV, obtained from a call of Actions.getPageUrls in my HTTP client:

[i]HTTP/1.1 200 OK
Date: Thu, 24 Sep 2015 16:40:10 GMT
Server: Apache/2.2.26 (Unix) mod_ssl/2.2.26 OpenSSL/1.0.1e-fips mod_auth_passthrough/2.1 mod_bwlimited/1.4
X-Powered-By: PHP/5.4.23
Content-Disposition: attachment; filename="Piwik Export _ Page URLs _ 1 Jan 15 - 31 Jan 15.csv"
Pragma:
Expires:
Cache-Control: must-revalidate
Connection: close
Transfer-Encoding: chunked
Content-Type: application/vnd.ms-excel

ÿþlabel,nb_visits,nb_hits,sum_time_spent,nb_hits_with_time_generation,min_time_generation …[/i]

I recognize the “ÿþ” at the start of the CSV’s first row as some sort of byte order mark. It’s visible in the file preview pane on a Macintosh. R appears to interpret it as an “invalid multibyte string.”

After opening the CSV in a text editor, the file comes up as UTF-16 Little-Endian. The “ÿþ” is not immediately visible, but if you fiddle with the encoding it will show. You can also see that each character is separated by an invisible character expressed as an inverted question mark, which R interprets as “embedded nulls.” The invisible characters can be seen in the attachment to this posting (a screenshot of a CSV from a different API call).

Using the Import function in Excel and then saving-as CSV will result in a file that R can accept without any errors. Presumably the encoding/content-type gets changed or smoothed out. But I’m not ready to add all of that as an extra layer to the workflow and would be happiest if a CSV produced by the API could just be fed right to R.

One possible suggestion, from the webmaster: allow text/csv as the content-type: mime text/csv

According to some, that is what it should be:

Thanks for any suggestions or help!

PS: I am exploring ways to correct for these difficulties in R. Haven’t yet found one.


(Matthieu Aubry) #2

Hi there,

It would be easier if you can post your comment & suggestion (or even maybe later a pull request :wink: on our github issue tracker at: Issues · matomo-org/piwik · GitHub


#3

Thanks, Matt. Can do, will do.

David

Edit:
Posted here: assign more accurate content-type to CSV output from Reporting API · Issue #8898 · matomo-org/piwik · GitHub


#4

Figured out a fix. Details here: assign more accurate content-type to CSV output from Reporting API · Issue #8898 · matomo-org/piwik · GitHub