Google Import Encoding Question(s)

Not quite sure how to ask this question, so forgive me if I fumble around a bit! I’ve been working with a new install of Piwik (1.4) and experimenting with importing my old data from Google Analytics with the google2piwik script. I don’t have shell access on my server (I’m using shared hosting), so I’ve been using a dummy set-up on a local Ubuntu box to do the actual import, with a view to exporting the database there and importing it to the db on my “real” server.

I’ve noticed a difference, though, between the db as it exists on my dummy server, and the export it generates. I’m specifically looking at the piwik_log_visit table, in the idvisitor and configid fields – when I export, the encoding seems to change…or something. Here’s a sample record:

Import db –
idvisitor: 486ee8589ad1bab0
configid: 2b4acd2dd238db93

Export db –
idvisitor: HnËXö—∫∞
configid: +JÕ-“8€ì

Also, on a somewhat related note, all the locationip’s in my Google import show up as 30, but export as 0. Is that to be expected?

(Now that I’m looking, those field values on a new install of Piwik on my “real” server are things like:
idvisitor: (°7hªß≥n
configid: †r+9Ö…^
locationip: `ÓA
etc.

So is this normal, or do I need to tweak some encodings before I run with my Google import?

Thanks!

idvisitor and config_id are 8 byte binary columns and contain hashes that will be somewhat meaningless outside of Piwik

location_ip is a varbinary column, typically 4 or 16 bytes (i.e., IPv4 or IPv6 addresses, respectively) in network address format (i.e., big-endian)

If you’re going to export these columns, you might want to use the hex() function.

Ah, thanks. I see. So (I assume?) Google is sending me utf-8 numbers or something, and they’re being hashed when the fields are actually “processed” via an export function?

I’m primarily using phpMyAdmin to wrangle my databases; is there a way to specify a hex export through the phpMyAdmin interface, or do I need to roll my own query?

Of course, if the “black diamond-question mark” display I see for those fields in phpMyAdmin is just a representation of the binary through my web browser, is everything a-ok as far as the actual data is concerned? Should I worry at all?

I haven’t looked at the google2piwik script, so I can’t answer your encoding question.

Ditto with PMA. If I had to roll my own query, I’d use MySQL’s “SELECT … INTO OUTFILE”.

[quote=vipsoft]
I haven’t looked at the google2piwik script, so I can’t answer your encoding question.

Ditto with PMA. If I had to roll my own query, I’d use MySQL’s “SELECT … INTO OUTFILE”.[/quote]

I’m having the some wrong enconding problem in a totally new installation of Piwik (without Google imports).
My idvisitor looks like this:

http://dl.dropbox.com/u/3009615/bugpwk.png

What is this about? Wrong config in MySQL or Piwik?

Idvisitor is an internal string of 8-bit octets. it’s not generally printablr, and it’s not a bug

Ok! Thanks!

But I’m getting the same this type of sequence in “config_id” and “location_ip”, still the same answer?

Yes.

Reports and API are there for a reason.