Convert MySQL database from one encoding/collation into another
8th February 2008
Most frequent use: convert database from latin1_swedish to utf8_general_ci.
Original script found at: MySQL and UTF-8.
Update: the original script had an error, it would generate queries likes this one (note the bold part):
ALTER TABLE `links` CHANGE `link_rel` `link_rel` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT NULL;
This is clearly wrong syntax (and logic). I fixed this by making comparison to NULL strict (with three equal signs instead of two):
// Does the field default to null, a string, or nothing?
if ($row['Default'] === NULL)
Update 2: based on comment by banesto, I modified the script; now it does not require specifying the from_collation, it's sufficient to specify to_collation (which will be used for all the fields and tables). The modified code is:
if ($row['Collation'] == '' || $row['Collation'] == $convert_to)
continue;
Here's the script itself: (to copy-paste: first click the "Plain text" header)
-
<?php
-
// original script (v1.0) by/from: http://www.phpwact.org/php/i18n/utf-8/mysql
-
// improved/modified (v1.03) by Bogdan http://bogdan.org.ua/
-
-
// this script will output all queries needed to change all fields/tables to a different collation
-
// it is HIGHLY suggested you take a MySQL dump/backup prior to running any of the generated queries
-
-
// this code is provided AS IS and without any warranty
-
-
-
-
// collation you want to change to:
-
$convert_to = 'utf8_general_ci';
-
-
// character set of new collation:
-
$character_set= 'utf8';
-
-
// DB login information - *modify before use*
-
$username = 'user';
-
$password = 'pass';
-
$database = 'database_name';
-
$host = 'localhost';
-
-
//-- usually, there is nothing to modify below this line --//
-
-
// show TABLE alteration queries?
-
$show_alter_table = true;
-
// show FIELD alteration queries?
-
$show_alter_field = true;
-
-
-
-
print '<pre>';
-
-
// Alter table collation
-
// ALTER TABLE `account` DEFAULT CHARACTER SET utf8
-
if ($show_alter_table)
-
-
-
-
if ( $row['Collation'] == '' || $row['Collation'] == $convert_to )
-
continue;
-
-
// Is the field allowed to be null?
-
if ( $row['Null'] == 'YES' )
-
$nullable = ' NULL ';
-
else
-
$nullable = ' NOT NULL';
-
-
// Does the field default to null, a string, or nothing?
-
if ( $row['Default'] === NULL )
-
$default = " DEFAULT NULL";
-
elseif ( $row['Default'] != '' )
-
else
-
$default = '';
-
-
// Alter field collation:
-
// ALTER TABLE `tab` CHANGE `fiel` `fiel` CHAR( 5 ) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL
-
if ($show_alter_field) {
-
echo "ALTER TABLE `$table` CHANGE `$field` `$field` $row[Type] CHARACTER SET $character_set COLLATE $convert_to $nullable $default; \r\n";
-
}
-
}
-
}
-
?>












June 22nd, 2008 at 1:47
BIG thanks for this script. i updated it a bit and it was just priceless. saved several hours of my life
June 22nd, 2008 at 21:27
there are several things:
1) before setting new character set all fields have to be changed to binary equivalents
2) it's not so good to depend on "convert_from", because db can contain many collations and script will take effect only in one of them. i think it's more convenient to check if ($row['Collation']!='')
June 24th, 2008 at 9:26
Banesto,
1) why convert to binary?
2) do you think it's a good optional solution to just provide the configuration for "target" character set, and let the script auto-detect the original set? that sounds good to me...
June 26th, 2008 at 21:14
1) because if i leave original type all data gets corrupted
2) because there's usually need to get one collation for whole db and not to separate them. and it requires me to know what is the original collation in db is right now.
June 27th, 2008 at 10:35
Banesto,
I changed the script a little bit by removing the source_collation and adding the check of row[collation] as you suggested.
As for the conversion to binary types - I didn't do that, in part because I've no idea on what are the best binary equivalents to all the column types in MySQL, and in part due to the lack of desire to dig into that
July 20th, 2008 at 13:58
your script above doesn't have your first fix (strict NULL comparison). otherwise, thanks!
July 20th, 2008 at 15:16
Matt,
thanks for reporting that, it is now fixed.
August 21st, 2008 at 12:35
Just what I need, but unfortunately it doesn't do anything for me. When I first run the file I get the message to remove the die() line, which I did, but now it does nothing, no output, zilch, what have I done wrong ?
August 21st, 2008 at 12:44
ok got it to work, I didn;t have the mysql driver uneable din php.ini
now when i run the generated script I get
Error
SQL query:
ALTER TABLE `adminusers` CHANGE `id` `id` int( 11 ) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT NULL ;
MySQL said: Documentation
#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT NULL' at line 1
August 23rd, 2008 at 12:36
Russ,
I thought I fixed that. Quick fix would be to make a global search-and-replace with the NOT NULL DEFAULT NULL string (replacing it with just NOT NULL). This should work, if you do the replace correctly
(I mean replace to be done in the generated query.)
September 20th, 2008 at 20:18
So, I made a php file and put it in my root folder then loaded it in my browser. It output a page like:
"ALTER TABLE `access` DEFAULT CHARACTER SET utf8;
ALTER TABLE `access` CHANGE `mask` `mask` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL ; "
etc. It shows this for each table and field.
But when I actually look at my database nothing has been changed. It's the same encoding/collation as it was to begin with.
What am I doing wrong?
September 20th, 2008 at 20:42
Tim,
the script itself doesn't modify the database - it only produces all the necessary queries.
You have to (manually) copy-paste all the generated queries into the "SQL" text field of your favourite MySQL administration/management software, and "Run" (or how the execute command is named in your software). E.g. in PhpMyAdmin you'd go to the database structure page, then choose the SQL tab, paste all the queries, and run them.
September 20th, 2008 at 23:08
Thanks,
It seems to be working. I get errors like this:
"SQL query:
ALTER TABLE `comments` CHANGE `thread` `thread` varchar( 255 ) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT NULL ;
MySQL said: Documentation
#1067 - Invalid default value for 'thread' "
then I change that one manually and rerun the remainder of the queries that start after the erroneous line and repeat.
September 22nd, 2008 at 21:21
hello
my problem is i have taken backup from mysql>data folder -complete folder of a database
and now copy it to different pc
means
copy from 1PC >Mysql>Data > and copy to 2ndPC >mysql>data
now when i am browsing it from phpmyadmin , its shwoing the database and table but when i am browsing table its saying table doesnt exist.
Collation type show in use ?
Do you have any idea , which can help me ?
Raj
September 22nd, 2008 at 22:17
Raj,
this isn't MySQL support forum, and your issue doesn't seem to be collation-related.
However, I would recommend that you use specific tools for MySQL DB export/import, and not just copy data folders.
If you do copy files - check that MySQL versions are the same, configuration is also the same, and try running the MySQL's "repair" query on all the "copied" files. Might help, but still not a good way.
September 24th, 2008 at 23:10
This is probably a big ol' newbie problem, but the script doesn't work for me: I'm getting "access denied" errors.
Specifically, I cut and pasted the script into a file, backed up the database and removed that line, and then uploaded the script to the top level of my Drupal install and went to that URL in Firefox, where I got the following errors:
Warning: mysql_connect() [function.mysql-connect]: Access denied for user 'user'@'localhost' (using password: YES) in /home/manatee/public_html/acip/collation.php on line 21
Warning: mysql_select_db() [function.mysql-select-db]: Access denied for user 'manatee'@'localhost' (using password: NO) in /home/manatee/public_html/acip/collation.php on line 22
Warning: mysql_select_db() [function.mysql-select-db]: A link to the server could not be established in /home/manatee/public_html/acip/collation.php on line 22
Warning: mysql_query() [function.mysql-query]: Access denied for user 'manatee'@'localhost' (using password: NO) in /home/manatee/public_html/acip/collation.php on line 23
Warning: mysql_query() [function.mysql-query]: A link to the server could not be established in /home/manatee/public_html/acip/collation.php on line 23
Access denied for user 'manatee'@'localhost' (using password: NO)
Am I missing something obvious?
Thanks,
Andrea
September 25th, 2008 at 16:40
Andrea,
please triple-check these configuration values:
October 7th, 2008 at 15:35
Hi Bogdan,
nice script! Looks like a life-saver.
If you should decide to dive into the whole data-conversion aspect of this, here is a little cheat sheet of how I have converted data before.
char, varchar and text -> blob
mediumtext -> mediumblob
longtext -> longblob
(I have not used tinyblob but it might be ok for char and varchar)
There are two ways that the existing data may need conversion. This is something that the user must know and decide. I am not aware of any way to detect it, except by the errors if you do it wrong
You would think that an existing latin1 database would have latin1 data in it. This is what scenario shows.
Common though is that you can have php set up to use utf8 but the database remains in latin1. Any existing data will then me utf8 squeezed into a latin1 table. This needs a slightly different conversion as in the second scenario.
# case where data was enterd correctly as latin1 and later needs conversion to utf8
SET NAMES latin1;
drop table people;
CREATE TABLE people (
id int(11) unsigned NOT NULL auto_increment,
name varchar(64) default NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO people VALUES
(1,'user_a'),
(2,'user_å'),
(3,'user_ä');
select * from people;
SET NAMES utf8;
alter table people engine=MyISAM default character set utf8;
ALTER TABLE people CHANGE name name BLOB;
ALTER TABLE people CHANGE name name varchar(64) CHARACTER SET utf8;
select * from people;
# case where utf8 data was enterd into a table with latin1 and later needs conversion to utf8
SET NAMES latin1;
drop table people;
CREATE TABLE people (
id int(11) unsigned NOT NULL auto_increment,
name varchar(64) default NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
SET NAMES utf8;
INSERT INTO people VALUES
(1,'user_a'),
(2,'user_å'),
(3,'user_ä');
select * from people;
alter table people engine=MyISAM default character set utf8;
ALTER TABLE people CHANGE name name BLOB;
ALTER TABLE people CHANGE name name varchar(64) CHARACTER SET latin1;
ALTER TABLE people CHANGE name name varchar(64) CHARACTER SET utf8;
select * from people;
I'll get back to you if I manage to implement something like this in your script.
October 7th, 2008 at 16:11
Hi again,
I got a bit carried away and did some tweaks to your script to support the two data conversions described above.
There may be cases where it does not work but it works when I have tested it on a bunch of conversions.
in the configuration at the top:
// conversion of data for either normal data or "messed up" data
// either leave empty or choose convert or rescue
$data_conversion = '';
and then at the bottom in the field conversion part.
// Alter field collation:
// ALTER TABLE `tab` CHANGE `fiel` `fiel` CHAR( 5 ) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL
if ($show_alter_field) {
$field = mysql_real_escape_string($row['Field']);
// charset conversion table for data conversions
$types = array(
'default'=>'blob',
'mediumtext' => 'mediumblob',
'longtext' => 'longblob'
);
if ( !empty($data_conversion) && strpos($row['Collation'], $character_set) !== 0 ) {
if ( array_key_exists($row['Type'],$types) ) {
$binary = $types[$row['Type']];
} else {
$binary = $types['default'];
}
echo "ALTER TABLE `$table` CHANGE `$field` `$field` $binary $nullable $default; \r\n";
if ( $data_conversion == 'rescue' ) {
$old_charset = substr($row['Collation'],0,strpos($row['Collation'],'_'));
echo "ALTER TABLE `$table` CHANGE `$field` `$field` $row[Type] CHARACTER SET $old_charset $nullable $default; \r\n";
}
}
echo "ALTER TABLE `$table` CHANGE `$field` `$field` $row[Type] CHARACTER SET $character_set COLLATE $convert_to $nullable $default; \r\n";
}
October 7th, 2008 at 16:24
I found another bug. Your script will actually remove the "auto_increment" attribute.
I didn't know how to read this attribute from the current row to put it back, so I worked round it by modding the script to ignore numeric fields as these do not have a collation anyway.
October 7th, 2008 at 16:51
Dear Martin and Russ,
thank you for valuable feedback!
With your permission, I'll modify the script to take into account your code and suggestions.
(I'm at work right now, so most probably will update the script code late tonight.)
Update: still not done, but now in the process of updating code.
October 10th, 2008 at 21:23
Martin,
why would you need these conversions?
char, varchar and text -> blob
mediumtext -> mediumblob
longtext -> longblob
blobs are for binary data, so why put the text into binary fields?
October 10th, 2008 at 23:58
When you change the encoding, any existing data will be missinterpreted whenever the charsets differ. Converting the field via a binary type will allow you to trick MySQL into reinterpreting the "meaning" of the data.
This will not affect A-Z but other (accented) characters.
October 11th, 2008 at 0:39
Sorry. My last post was a bit crippled from being written on my phone in the car.
There are probably many people that can explain the details of why you do it this way. But what I know is that if you can to "convert" existing data from one Character set to another in MySQL, you do it by making the field a binary (and thus tricking MySQL to reinterpret the raw data stored when you bring it back into a text field in another character set).
I have implemented two such conversions for two common situations. But you can play around with this quite a bit if you have some really messed up data.
November 24th, 2008 at 4:20
pardon the newbie question....
in the line below, do I enter each table individually or simply put the database name in the place of 'table'?
$database = 'table';
thanks
November 24th, 2008 at 12:02
Ns,
I've corrected the text - of course, it must be database name.
November 27th, 2008 at 7:11
Very nice script and the output is generated but when i run the query (Joomla database mysql 5.x)
i get on every table
ALTER TABLE `banners` CHANGE `description` `description` varchar( 255 ) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT NULL ;
MySQL said: Documentation
#1067 - Invalid default value for 'description' "
the original installed field values here are
field: description
type: text
collation: utf8_unicode_ci
attributes: (empty)
null : No
default: (empty)
extra: (empty)
This is basically fact on all tables and it runs into this error on same fields and stops. Any ideas would be welcome?
Chad
November 27th, 2008 at 11:05
Chad,
it is the same old problem which is for some reason happening all the time.
I've just tested the script locally - and for me it works perfectly.
I can only recommend that after you get all the generated queries, just run a global search-and-replace, replacing each and every "NOT NULL DEFAULT NULL" with "NOT NULL"
If you would like the script to do that for you (and also contribute to let others avoid this error), please do the following:
- for the first table where you get the error, run the query
(where tablename is the name of that troublesome table),
and post the result here. That might help me identify the reason why it is not working for you but does work for me.