Database Administrators Asked by geoaxis on August 14, 2020
I have a database which now needs to support 4 byte characters (Chinese). Luckily I already have MySQL 5.5 in production.
So I would just like to make all collations which are utf8_bin to utf8mb4_bin.
I believe there is no performance loss/gain with this change other than a bit of storage overhead.
If you, like me, do not trust automation, this is how I have handled the problem.
First Stop digging!
Start with altering the default charset of new tables by changing the DB definition(like in all other answers):
ALTER DATABASE database_name
CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
Then generate sql to change the default charset for new columns of all existing tables:
SELECT concat("ALTER TABLE `",table_schema,"`.`",table_name,"` DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_swedish_ci;") as _sql
FROM information_schema.TABLES
WHERE table_schema like "database_name" and TABLE_TYPE="BASE TABLE"
GROUP BY table_schema, table_name ;
Now we can handle the "legacy"
List character datatypes you are using:
select distinct data_type from information_schema.columns where table_schema = "database_name" and CHARACTER_SET_NAME is not null;
For me that list was "varchar" and "text"
List character_SETS_ in use:
select distinct character_set_name from information_schema.columns where table_schema = "database_name";
This gives me "utf8", "latin1", and "utf8mb4" which is a reason I do not trust automation, the latin1 columns risk having dirty data.
Now you can make a list of all columns you need to update with:
select table_name, column_name, data_type, character_set_name, collation_name
from information_schema.columns
where table_schema = "database_name" and CHARACTER_SET_NAME is not null AND CHARACTER_SET_NAME <> "utf8mb4"
group by table_name, , data_type, character_set_name, collation_name;
Tables containing only utf8 or utf8mb4 could be converted with "CONVERT TO CHARACTER SET" as Mathias and MrJingles describes above, but then you risk MySQL changing the types for you, so you may be better of running "CHANGE COLUMN" instead since that gives you control of exactly what happens.
If you have non-utf8 columns these questions may give inspiration about checking the columns data: https://stackoverflow.com/q/401771/671282 https://stackoverflow.com/q/9304485/671282
Since you probably know what you expect to have in most of the columns something like this will probably handle most of them after modifying the non-ascii chars allowed to suit your needs:
SELECT distinct section FROM table_name WHERE column_name NOT REGEXP '^([A-Za-z0-9åäöÅÄÖ&.,_ -])*$';
When the above did not fit I used the below that have a bit "fuzzier" maching:
SELECT distinct CONVERT(CONVERT(column_name USING BINARY) USING latin1) AS latin1, CONVERT(CONVERT(column_name USING BINARY) USING utf8) AS utf8 FROM table_name WHERE CONVERT(column_name USING BINARY) RLIKE CONCAT('[', UNHEX('C0'), '-', UNHEX('F4'), '][',UNHEX('80'),'-',UNHEX('FF'),']') limit 5;
This query matches any two characters that could start an utf8-character, thus allowing you to inspect those records, it may give you a lot of false positives. The utf8 conversion fails returning null if there is any character it can not convert, so in a large field there is a good chance of it not being useful.
Answered by Samuel Åslund on August 14, 2020
Run the following SQL statements to get the result statements for updating character set and collation:
SET @database = "your_database_name";
SET @charset = "utf8mb4";
SET @collate = "utf8mb4_0900_ai_ci";
SELECT "SET foreign_key_checks = 0;"
UNION ALL
SELECT concat(
"ALTER DATABASE `",
`SCHEMA_NAME`,
"` CHARACTER SET = ",
@charset,
" COLLATE = ",
@collate,
";"
) AS `sql`
FROM `information_schema`.`SCHEMATA`
WHERE `SCHEMA_NAME` = @database
AND (
`DEFAULT_CHARACTER_SET_NAME` <> @charset
OR `DEFAULT_COLLATION_NAME` <> @collate
)
UNION ALL
SELECT concat(
"ALTER TABLE `",
`TABLE_SCHEMA`,
"`.`",
`TABLE_NAME`,
"` CONVERT TO CHARACTER SET ",
@charset,
" COLLATE ",
@collate,
";"
) AS `sql`
FROM `information_schema`.`TABLES`
WHERE `TABLE_SCHEMA` = @database
AND `TABLE_TYPE` = "BASE TABLE"
AND `TABLE_COLLATION` <> @collate
UNION ALL
SELECT concat(
"ALTER TABLE `",
c.`TABLE_SCHEMA`,
"`.`",
c.`TABLE_NAME`,
"` CHANGE `",
c.`COLUMN_NAME`,
"` `",
c.`COLUMN_NAME`,
"` ",
c.`COLUMN_TYPE`,
" CHARACTER SET ",
@charset,
" COLLATE ",
@collate,
if(c.`IS_NULLABLE`="YES", " NULL", " NOT NULL"),
";"
) AS `sql`
FROM `information_schema`.`COLUMNS` c,
`information_schema`.`TABLES` t,
`information_schema`.`COLLATION_CHARACTER_SET_APPLICABILITY` a
WHERE c.`TABLE_SCHEMA` = t.`TABLE_SCHEMA`
AND c.`TABLE_NAME` = t.`TABLE_NAME`
AND t.`TABLE_COLLATION` = a.`COLLATION_NAME`
AND c.`TABLE_SCHEMA` = @database
AND c.`DATA_TYPE` IN (
'varchar',
'char',
'text',
'tinytext',
'mediumtext',
'longtext'
)
AND (
c.`CHARACTER_SET_NAME` <> a.`CHARACTER_SET_NAME`
OR c.`COLLATION_NAME` <> t.`TABLE_COLLATION`
)
AND t.`TABLE_TYPE` = "BASE TABLE"
UNION ALL
SELECT "SET foreign_key_checks = 1;";
Notice:
utf8mb4_0900_ai_ci
may not be friendly to your preferred language. For example, it consider half-width parentheses ()
the same as full-width parentheses ()
, and causes trouble in certain cases.SHOW COLLATION WHERE CHARSET = 'utf8mb4' AND COLLATION LIKE 'utf8mb4%0900%';
and choose a suitable collation for your needs.Answered by Rockallite on August 14, 2020
I made a script which does this more or less automatically:
<?php
/**
* Requires php >= 5.5
*
* Use this script to convert utf-8 data in utf-8 mysql tables stored via latin1 connection
* This is a PHP port from: https://gist.github.com/njvack/6113127
*
* BACKUP YOUR DATABASE BEFORE YOU RUN THIS SCRIPT!
*
* Once the script ran over your databases, change your database connection charset to utf8:
*
* $dsn = 'mysql:host=localhost;port=3306;charset=utf8';
*
* DON'T RUN THIS SCRIPT MORE THAN ONCE!
*
* @author hollodotme
*
* @author derclops since 2019-07-01
*
* I have taken the liberty to adapt this script to also do the following:
*
* - convert the database to utf8mb4
* - convert all tables to utf8mb4
* - actually then also convert the data to utf8mb4
*
*/
header('Content-Type: text/plain; charset=utf-8');
$dsn = 'mysql:host=localhost;port=3306;charset=utf8';
$user = 'root';
$password = 'root';
$options = [
PDO::ATTR_CURSOR => PDO::CURSOR_FWDONLY,
PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => true,
PDO::MYSQL_ATTR_INIT_COMMAND => "SET CHARACTER SET latin1",
];
$dbManager = new PDO( $dsn, $user, $password, $options );
$databasesToConvert = [ 'database1',/** database3, ... */ ];
$typesToConvert = [ 'char', 'varchar', 'tinytext', 'mediumtext', 'text', 'longtext' ];
foreach ( $databasesToConvert as $database )
{
echo $database, ":n";
echo str_repeat( '=', strlen( $database ) + 1 ), "n";
$dbManager->exec( "USE `{$database}`" );
echo "converting database to correct locale too ... n";
$dbManager->exec("ALTER DATABASE `{$database}` CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci");
$tablesStatement = $dbManager->query( "SHOW TABLES" );
while ( ($table = $tablesStatement->fetchColumn()) )
{
echo "Table: {$table}:n";
echo str_repeat( '-', strlen( $table ) + 8 ), "n";
$columnsToConvert = [ ];
$columsStatement = $dbManager->query( "DESCRIBE `{$table}`" );
while ( ($tableInfo = $columsStatement->fetch( PDO::FETCH_ASSOC )) )
{
$column = $tableInfo['Field'];
echo ' * ' . $column . ': ' . $tableInfo['Type'];
$type = preg_replace( "#(d+)#", '', $tableInfo['Type'] );
if ( in_array( $type, $typesToConvert ) )
{
echo " => must be convertedn";
$columnsToConvert[] = $column;
}
else
{
echo " => not relevantn";
}
}
//convert table also!!!
$convert = "ALTER TABLE `{$table}` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci";
echo "n", $convert, "n";
$dbManager->exec( $convert );
$databaseErrors = $dbManager->errorInfo();
if( !empty($databaseErrors[1]) ){
echo "n !!!!!!!!!!!!!!!!! ERROR OCCURED ".print_r($databaseErrors, true)." n";
exit;
}
if ( !empty($columnsToConvert) )
{
$converts = array_map(
function ( $column )
{
//return "`{$column}` = IFNULL(CONVERT(CAST(CONVERT(`{$column}` USING latin1) AS binary) USING utf8mb4),`{$column}`)";
return "`{$column}` = CONVERT(BINARY(CONVERT(`{$column}` USING latin1)) USING utf8mb4)";
},
$columnsToConvert
);
$query = "UPDATE IGNORE `{$table}` SET " . join( ', ', $converts );
//alternative
// UPDATE feedback SET reply = CONVERT(BINARY(CONVERT(reply USING latin1)) USING utf8mb4) WHERE feedback_id = 15015;
echo "n", $query, "n";
$dbManager->exec( $query );
$databaseErrors = $dbManager->errorInfo();
if( !empty($databaseErrors[1]) ){
echo "n !!!!!!!!!!!!!!!!! ERROR OCCURED ".print_r($databaseErrors, true)." n";
exit;
}
}
echo "n--n";
}
echo "n";
}
Answered by clops on August 14, 2020
For people who might have this problem the best solution is to modify first the columns to a binary type, according to this table:
And after that modify the column back to its former type and with your desired charset.
Eg.:
ALTER TABLE [TABLE_SCHEMA].[TABLE_NAME] MODIFY [COLUMN_NAME] LONGBLOB;
ALTER TABLE [TABLE_SCHEMA].[TABLE_NAME] MODIFY [COLUMN_NAME] VARCHAR(140) CHARACTER SET utf8mb4;
I tried in several latin1 tables and it kept all the diacritics.
You can extract this query for all columns doing this:
SELECT
CONCAT('ALTER TABLE ', TABLE_SCHEMA,'.', TABLE_NAME,' MODIFY ', COLUMN_NAME,' VARBINARY;'),
CONCAT('ALTER TABLE ', TABLE_SCHEMA,'.', TABLE_NAME,' MODIFY ', COLUMN_NAME,' ', COLUMN_TYPE,' CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;')
FROM information_schema.columns
WHERE TABLE_SCHEMA IN ('[TABLE_SCHEMA]')
AND COLUMN_TYPE LIKE 'varchar%'
AND (COLLATION_NAME IS NOT NULL AND COLLATION_NAME NOT LIKE 'utf%');
Answered by MalachiteBR on August 14, 2020
I used the following shell script. It takes database name as a parameter and converts all tables to another charset and collation (given by another parameters or default value defined in the script).
#!/bin/bash
# mycollate.sh <database> [<charset> <collation>]
# changes MySQL/MariaDB charset and collation for one database - all tables and
# all columns in all tables
DB="$1"
CHARSET="$2"
COLL="$3"
[ -n "$DB" ] || exit 1
[ -n "$CHARSET" ] || CHARSET="utf8mb4"
[ -n "$COLL" ] || COLL="utf8mb4_general_ci"
echo $DB
echo "ALTER DATABASE `$DB` CHARACTER SET $CHARSET COLLATE $COLL;" | mysql
echo "USE `$DB`; SHOW TABLES;" | mysql -s | (
while read TABLE; do
echo $DB.$TABLE
echo "ALTER TABLE `$TABLE` CONVERT TO CHARACTER SET $CHARSET COLLATE $COLL;" | mysql $DB
done
)
Answered by Petr Stastny on August 14, 2020
I wrote this guide: http://hanoian.com/content/index.php/24-automate-the-converting-a-mysql-database-character-set-to-utf8mb4
From my work, I saw that ALTER the database and the tables is not enough. I had to go into each table and ALTER each of the text/mediumtext/varchar columns too.
Luckily I was able to write a script to detect the metadata of MySQL databases, so it could loop through the tables and columns and ALTERed them automatically.
Long index for MySQL 5.6:
There is one thing you must have DBA/SUPER USER privilege to do: Setting the database parameters:
innodb_large_prefix : ON innodb_file_format : Barracuda innodb_file_format_max : Barracuda
In the answers for this question, there is instruction how to set those parameters above: https://stackoverflow.com/questions/35847015/mysql-change-innodb-large-prefix
Of course, in my article, there are instructions to do that too.
For MySQL version 5.7 or newer, the innodb_large_prefix is ON by default, and the innodb_file_format is also Barracuda by default.
Answered by Châu Hồng Lĩnh on August 14, 2020
Ran into this situation; here's the approach I used to convert my database:
First, you need to edit my.cnf
to make default database connection (between applications and MYSQL) utf8mb4_unicode_ci compliant. Without this characters like emojis and similar submitted by your apps won't make it to your tables in right bytes/encoding (unless your application's DB CNN params specify a utf8mb4 connection).
Instructions given here.
Execute the following SQL (no need to get prepared SQL to change individual columns, ALTER TABLE
statements will do that).
Before you execute below code replace "DbName" with your actual DB name.
USE information_schema;
SELECT concat("ALTER DATABASE `",table_schema,
"` CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;") as _sql
FROM `TABLES`
WHERE table_schema like "DbName"
GROUP BY table_schema;
SELECT concat("ALTER TABLE `",table_schema,"`.`",table_name,
"` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;") as _sql
FROM `TABLES`
WHERE table_schema like "DbName"
GROUP BY table_schema, table_name;
Collect and save output of above SQL in a dot sql file and execute it.
If you get an error like #1071 - Specified key was too long; max key length is 1000 bytes.
along with the problematic table name, this means index key on some column of that table (which was supposed to be converted to MB4 charstring) will be very big hence that Varchar column should be <= 250 so that its index key will be max 1000 bytes. Check the columns on which you have indexes and if one of them is a varchar > 250 (most likely 255) then
Step 1: check data in that column to make sure that max string size in that column is <= 250.
Example query:
select `id`,`username`, `email`,
length(`username`) as l1,
char_length(`username`) as l2,
length(`email`) as l3,
char_length(`email`) as l4
from jos_users
order by l4 Desc;
Step 2: if max charlength of indexed column data <= 250 then change the col length to 250. if that is not possible, remove index on that column
Step 3: then run the alter table query for that table again and table should now be converted into utf8mb4 successfully.
Cheers!
Answered by Nav44 on August 14, 2020
I have a solution that will convert databases and tables by running a few commands. It also converts all columns of the type varchar
, text
, tinytext
, mediumtext
, longtext
, char
. You should also backup your database in case something breaks.
Copy the following code into a file called it preAlterTables.sql:
use information_schema;
SELECT concat("ALTER DATABASE `",table_schema,"` CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;") as _sql
FROM `TABLES` where table_schema like "yourDbName" and TABLE_TYPE='BASE TABLE' group by table_schema;
SELECT concat("ALTER TABLE `",table_schema,"`.`",table_name,"` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;") as _sql
FROM `TABLES` where table_schema like "yourDbName" and TABLE_TYPE='BASE TABLE' group by table_schema, table_name;
SELECT concat("ALTER TABLE `",table_schema,"`.`",table_name, "` CHANGE `",column_name,"` `",column_name,"` ",data_type,"(",character_maximum_length,") CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci",IF(is_nullable="YES"," NULL"," NOT NULL"),";") as _sql
FROM `COLUMNS` where table_schema like "yourDbName" and data_type in ('varchar','char');
SELECT concat("ALTER TABLE `",table_schema,"`.`",table_name, "` CHANGE `",column_name,"` `",column_name,"` ",data_type," CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci",IF(is_nullable="YES"," NULL"," NOT NULL"),";") as _sql
FROM `COLUMNS` where table_schema like "yourDbName" and data_type in ('text','tinytext','mediumtext','longtext');
Replace all occurences of "yourDbName" with the database you want to convert. Then run:
mysql -uroot < preAlterTables.sql | egrep '^ALTER' > alterTables.sql
This will generate a new file alterTables.sql, with all the queries you need to convert the database. Run the following command to start the conversion:
mysql -uroot < alterTables.sql
You can also adapt this to run through multiple databases, by changing the condition for the table_schema. For example table_schema like "wiki_%"
will convert all databases with the name prefix wiki_
. To convert all databases replace the condition with table_type!='SYSTEM VIEW'
.
An issue that might arise. I had some varchar(255) columns in mysql keys. This causes an error:
ERROR 1071 (42000) at line 2229: Specified key was too long; max key length is 767 bytes
If that happens you can simply change the column to be smaller, like varchar(150), and rerun the command.
Please note: This answer converts the database to utf8mb4_unicode_ci
instead of utf8mb4_bin
, asked in the question. But you can simply replace this.
Answered by MrJingles87 on August 14, 2020
From my guide How to support full Unicode in MySQL databases, here are the queries you can run to update the charset and collation of a database, a table, or a column:
For each database:
ALTER DATABASE
database_name
CHARACTER SET = utf8mb4
COLLATE = utf8mb4_unicode_ci;
For each table:
ALTER TABLE
table_name
CONVERT TO CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci;
For each column:
ALTER TABLE
table_name
CHANGE column_name column_name
VARCHAR(191)
CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci;
(Don’t blindly copy-paste this! The exact statement depends on the column type, maximum length, and other properties. The above line is just an example for a VARCHAR
column.)
Note, however, that you cannot fully automate the conversion from utf8
to utf8mb4
. As described in step 4 of the abovementioned guide, you’ll need to check the maximum length of columns and index keys, as the number you specify has a different meaning when utf8mb4
is used instead of utf8
.
Section 10.1.11 of the MySQL 5.5 Reference Manual has some more information on this.
Answered by Mathias Bynens on August 14, 2020
I would write a script (in Perl, or whatever) to use information_schema (TABLES and COLUMNS) to walk through all the tables, and do MODIFY COLUMN on every CHAR/VARCHAR/TEXT field. I would collect all the MODIFYs into a single ALTER for each table; this will be more efficient.
I think (but am not sure) that Raihan's suggestion only changes the default for the table.
Answered by Rick James on August 14, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP