Garbled data

From PmaWiki
(Redirected from garbled data)
Jump to: navigation, search

Contents

If phpMyAdmin displays your data coming from your application showing "funny" characters, e.g., "ä" instead of "ä" you suffer from a bad MySQL and/or application communication setup in the first place. (see http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html look for "client_connection" and/or "set names"), even if your application displays the chars correct - this is not a phpMyAdmin problem, believe us, it is your application setup.

[edit] Problem 1: double encoding

in short: data is marked as another charset as it is in fact encoded, or data delivered to the MySQL server is encoded with another charset than the MySQL connection is set to (on PHP with lib_mysql by default to latin1).

Here a short example:

Instead of

012345 (string index)
Närhet (displayed "ISO" representation)

which should be stored as utf8 encoded string

0  [1´ä´2 ] 3   4   5   6  (idx)
N  [Ã   ¤ ] r   h   e   t  (ISO)
4e  c3  a4  72  68  65  74 (HEX)

the data arrives already as "Närhet" at mysqld and is stored as "double utf8 encoded" string (1st conversion done by your application, 2nd by mysql):

0  [1   2 ][3   4 ] 5   6   7   8  (idx)
N   Ã   ƒ   Â   ¤   r   h   e   t  (ISO)
4e  c3  83  c3  a4  72  68  65  74 (HEX)

Facit: The data is already stored "garbled" in the database.


[edit] Problem 2: wrong charset

Template:todo

f.e. an UTF-8 encoded string is stored in a column which is marked as latin

012345 (string index)
Närhet (displayed "ISO" representation)

which should be stored as utf8 encoded string

0  [1´ä´2 ] 3   4   5   6  (idx)
N  [Ã   ¤ ] r   h   e   t  (ISO)
4e  c3  a4  72  68  65  74 (HEX)

but is stored as latin

0   1   2   3   4   5   6  (idx)
N   Ã   ¤   r   h   e   t  (ISO)
4e  c3  a4  72  68  65  74 (HEX)

resulting in Närhet instead of Närhet


[edit] Preventing

You need to tell MySQL what charset you are using in your scripts. In PHP the mysql(i) extension uses latin1 as default, if your scripts send UTF-8 you will get in trouble.

always execute

SET NAMES 'UTF8';

as first statement after you connected to your MySQL server.

This tells MySQL only how the data which is sent through this connection is encoded, it does not tell how to store the data, MySQL correctly converts the data to targeted table/column charset before inserting.

And it tells MySQL how the data sent back to your script should be encoded as.

In best, use UTF-8 everywhere:

  • save ALL files as UTF-8
  • tell the browser:
    • through HTTP headers:
header('Content-Type: text/html; charset=UTF-8', true);
    • OR/AND inside the HTML:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

[edit] Repairing

in short: tell MySQL the correct charset of the data, and convert it to the charset wanted.

[edit] inside MySQL

Doing this inside MySQL requires the following steps for every affected column

  1. Set the charset of the affected columns to BINARY, this removes any charset information from the column without changing (re-coding) the content
  2. Set the charset of the affected columns to the charset the content is encoded, this sets the new charset for the column, again without chaning (re-coding) the content
  3. [Optional] Set the charset of the concerning columns to the designated charset, this re-codes the content to the new charset

f.e. if you have stored UTF-8 encoded text in an latin1 column

  1. ALTER TABLE t MODIFY col1 BINARY(50); -- remove charset info
  2. ALTER TABLE t MODIFY col1 CHAR(50) CHARACTER SET utf8; -- set correct charset info
  3. [Optional] ALTER TABLE t MODIFY col1 CHAR(50) CHARACTER SET latin1; -- convert content

If you do the last step, do not forget to tell your app using SET NAMES, or you will end up with the same problem again.

Of course this can also be done using the phpMyAdmin feature to edit column properties.

http://dev.mysql.com/doc/refman/5.0/en/charset-conversion.html

[edit] outside MySQL

To keep the content of the old garbled "double encoded" data you have to "ungarble" it for example by exporting the data to a file with "charset of file = iso-8859-1" (you need to set $cfg['AllowAnywhereRecoding'] = true; for that in your config.inc.php, that's why you can't test it on the demo server).

Then MAKE A BACKUP/COPY of your db and change the line

/*!40101 SET NAMES latin1 */; 

in the export file to

/*!40101 SET NAMES utf8 */;

and import it back into the database (selecting "charset of file = utf8", which is the default setting).

Personal tools