all the page titles with special characters (like umlauts äöü) are not correctly saved in piwik table “log_action”. (The tracked page is utf-8 and the piwik db is utf-8.)
For example:
piwikTracker.setDocumentTitle("Hello World! ä ö ü Ä Ö Ü");
=> Result in db: “Hallo World! ä ö ü Ä Ö Ü”
At the piwik frontend, everything is fine: The page is correctly displayed as “Hello World! ä ö ü Ä Ö Ü”.
Is this default behavior or an error at my piwik installation?
How can I fix it?
Your problem isn’t with Piwik or the database. The tool you’re using to inspect the database contents is displaying the data in ASCII instead of UTF-8.
Here’s the Wikipedia article that should explain what you’re seeing:
Thank you for your help. I’am using an UTF-8 console and so this isn’t the reason.
I do the following test which shows that the special characters not correctly saved in piwik table “log_action”. 1. I add the following to an utf-8 html page:
piwikTracker.setDocumentTitle("Ä");
2. Than I checked the data in table “log_action”:
select name, hex(name) from piwik_log_action where idaction=<id from the new entry>;
Result: name: ä, hex(name): C383C2A4
=> It’s wrong - the umlaut “Ä” is not saved correctly.
(UTF-8 hex(‘Ä’) is “C384”. If I add the ‘Ä’ with an table-insert, it’s saved as UTF-8: insert into piwik_log_action set name=‘Ä’; )
Could you please check this at your piwik db.
Thanks, Micha
Could it be your OSs language setting causing the problem in the dev and and prod environment?
Also see this article below there are a few points(if applicable) from the web server side you should check to how they are set as if they are different may be a source as to why your console is “mysteriously” changing formats somehow. Its not exactly your issue but may give you some ideas as to where to check some things
@matt: Okay, you checked the UI. Here’s everything fine. ‘Ä’ is displayed as ‘Ä’. The error is only in the DB. Could you please check the piwik table “log_action”.
select name, hex(name) from piwik_log_action where idaction=<id from the new entry>;
@lesjokolat: I checked the configs. UTF-8 everywhere. Thank you. The CMS-System use the same DB, same Webserver. Here’s everything fine - the charset error is only in the piwik tables. Is umlaut “Ä” saved as hex “C384” in your piwik table “log_action”?
My page titles show up fine in my page titles report but I wonder what is your CMS? When you say your CMS shares the same database does that mean it shares the same Mysql database that piwik does?
Hello,
you can see the charset error only in the DB. (Yes in page reports / UI everything is fine. ‘Ä’ is displayed as ‘Ä’. )
Could someone pleeease :)-D track a page with only an umlaut ‘Ä’ in the page title and then check the DB. You can check it with phpMyAdmin or SQL-Console:
SELECT name, hex(name) FROM log_action WHERE hex(name) = 'C383C2A4' OR hex(name) = 'C384' ORDER BY idaction DESC LIMIT 0,10;
@lesjokolat : Yes, my cms and piwik are using the same mysql-db. This is the default of typo3-extension “piwikintegration”. (It’s not the cause of the charset error.)
Is it possible the typo3 settings is the problem? I am guessing here and not sure if its entirely applicable to your setup but the issue of a sort of double encoding sounded similar to the issue you are experiencing.
The funny characters that seem to represent the letters with Umlauts in UTF-8 text are entirely normal. You get this when the tool used to inspect the UTF-8 encoded string assumes that it’s looking at ASCII instead of UTF-8.
To settle the matter once and for all, you need to inspect the text using a Hex editor that does not attempt to produce a graphic representation of the text. You will, in all likelihood, see that the letters with Umlaut actually occupy two octets, where the letters without Umlaut occupy only one octet. When the paired octets are displayed as a pair of ASCII letters instead of being combined to form a single character from the UTF-8 character set, you’ll see those funny characters.
The description of UTF-8 encoding that I pointed you at right near the beginning of this discussion is quite clear that UTF-8 encoding is variable length for each character, with the encoding method self-defining the length of each character. The method is also self-synchronizing, so that if you pick up the UTF-8 string in the middle of a variable-length character, you can easily find the correct end of that character.
In short: don’t assume the tool you’re using is telling the truth about what it sees. You need to look at the actual hex representation of the string.
mysql> SELECT name, hex(name) FROM piwik_log_action WHERE hex(name) = 'C383C2A4' OR hex(name) = 'C384' ORDER BY idaction DESC LIMIT 0,10;
+------+-----------+
| name | hex(name) |
+------+-----------+
| Ä | C384 |
+------+-----------+
1 row in set (0.00 sec)
You can even see the correct character in the tracking request:
I have a similar problem. When trying to import a mysql database with Piwik tables on it (using Sequl Pro for mac) I get a “File read error: An error occurred when reading the file, as it could not be read in the encoding you selected (Unicode (UTF-8)).”
The dump I’m trying to move around contains an install of Joomla & Piwik, both which are set as “utf-8” in the database. The Joomla tables move around fine. Some of the Piwik content that does manage to get imported is showing those ascii characters: ä ö ü Ä Ö Ü
I tired exporting the dump using mysqldump command line and phpmyadmin. The phpmyadmin dump can execute 33querys of unreadable ascii while the mysqldump none.
thanks a lot for testing. (tu)
I analyzed the piwik tracking request (with a network trace).
Here everything is fine: action_name=%C3%84 - as described by you.
…but the result in the DB (piwik_log_action) is still wrong: hex C383C2A4.
So I can say:
With the content management system everything is fine. Correct UTF-8 send to the client.
The tracking request is okay. Correct UTF-8 send to piwik.
=> The error occurred after that.
Do you have any idea how to find the reason? (piwik_log_action = utf8_general_ci)