Could it be your OSs language setting causing the problem in the dev and and prod environment?
Also see this article below there are a few points(if applicable) from the web server side you should check to how they are set as if they are different may be a source as to why your console is “mysteriously” changing formats somehow. Its not exactly your issue but may give you some ideas as to where to check some things
@matt: Okay, you checked the UI. Here’s everything fine. ‘Ä’ is displayed as ‘Ä’. The error is only in the DB. Could you please check the piwik table “log_action”.
select name, hex(name) from piwik_log_action where idaction=<id from the new entry>;
@lesjokolat: I checked the configs. UTF-8 everywhere. Thank you. The CMS-System use the same DB, same Webserver. Here’s everything fine - the charset error is only in the piwik tables. Is umlaut “Ä” saved as hex “C384” in your piwik table “log_action”?
Is it possible the typo3 settings is the problem? I am guessing here and not sure if its entirely applicable to your setup but the issue of a sort of double encoding sounded similar to the issue you are experiencing.
The funny characters that seem to represent the letters with Umlauts in UTF-8 text are entirely normal. You get this when the tool used to inspect the UTF-8 encoded string assumes that it’s looking at ASCII instead of UTF-8.
To settle the matter once and for all, you need to inspect the text using a Hex editor that does not attempt to produce a graphic representation of the text. You will, in all likelihood, see that the letters with Umlaut actually occupy two octets, where the letters without Umlaut occupy only one octet. When the paired octets are displayed as a pair of ASCII letters instead of being combined to form a single character from the UTF-8 character set, you’ll see those funny characters.
The description of UTF-8 encoding that I pointed you at right near the beginning of this discussion is quite clear that UTF-8 encoding is variable length for each character, with the encoding method self-defining the length of each character. The method is also self-synchronizing, so that if you pick up the UTF-8 string in the middle of a variable-length character, you can easily find the correct end of that character.
In short: don’t assume the tool you’re using is telling the truth about what it sees. You need to look at the actual hex representation of the string.
mysql> SELECT name, hex(name) FROM piwik_log_action WHERE hex(name) = 'C383C2A4' OR hex(name) = 'C384' ORDER BY idaction DESC LIMIT 0,10;
| name | hex(name) |
| Ä | C384 |
1 row in set (0.00 sec)
You can even see the correct character in the tracking request:
I have a similar problem. When trying to import a mysql database with Piwik tables on it (using Sequl Pro for mac) I get a “File read error: An error occurred when reading the file, as it could not be read in the encoding you selected (Unicode (UTF-8)).”
The dump I’m trying to move around contains an install of Joomla & Piwik, both which are set as “utf-8” in the database. The Joomla tables move around fine. Some of the Piwik content that does manage to get imported is showing those ascii characters: Ã¤ Ã¶ Ã¼ Ã„ Ã– Ãœ
I tired exporting the dump using mysqldump command line and phpmyadmin. The phpmyadmin dump can execute 33querys of unreadable ascii while the mysqldump none.
thanks a lot for testing. (tu)
I analyzed the piwik tracking request (with a network trace).
Here everything is fine: action_name=%C3%84 - as described by you.
…but the result in the DB (piwik_log_action) is still wrong: hex C383C2A4.
So I can say:
With the content management system everything is fine. Correct UTF-8 send to the client.
The tracking request is okay. Correct UTF-8 send to piwik.
=> The error occurred after that.
Do you have any idea how to find the reason? (piwik_log_action = utf8_general_ci)