Skip to content

Latest commit

 

History

History
122 lines (118 loc) · 5.46 KB

encodings.md

File metadata and controls

122 lines (118 loc) · 5.46 KB

About character encodings

Before using MariaDB in CS340, we should probably check which character encoding our database server is configured to use, so that we can specify the correct encoding when we connect to the database server programmatically.

There are two ways to do this. To do this using the mysql command-line tool (which is convenient if you are outside of OSU since you don't have to use a VPN), you would ssh into acess.engr.oregonstate.edu

and then (assuming your .my.cnf file is already set up as in Assignment 1) you would run this command:

mysql -e "SHOW VARIABLES LIKE 'character_set%'; SHOW VARIABLES LIKE 'collation%'"

which produces:

+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8mb3                    |
| character_set_connection | utf8mb3                    |
| character_set_database   | utf8mb3                    |
| character_set_filesystem | binary                     |
| character_set_results    | utf8mb3                    |
| character_set_server     | utf8mb3                    |
| character_set_system     | utf8mb3                    |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
+----------------------+--------------------+
| Variable_name        | Value              |
+----------------------+--------------------+
| collation_connection | utf8mb3_general_ci |
| collation_database   | utf8mb3_general_ci |
| collation_server     | utf8mb3_general_ci |
+----------------------+--------------------+

so we are using utf8mb3 for connectiong; however, MySQL (including the mysql command-line client tool) uses utf8 as an alias for this character encoding; see Section 10.9.2 of the MySQL 8.0 Reference Manual, so we should refer to this character encoding as utf8 when we connect to the database. We could also have checked this using phpMyAdmin, by navigating your web browser to classmysql.engr.oregonstate.edu, authenticating, then clicking on the "Variables" tab, and then entering character_set in the text box next to the prompt "Containing the word: ". You can see that we are connecting using utfmb4_unicode_ci. You can change it by clicking on the link "Server: classmysql.engr.oregonstate.edu" at the top of the screen, and changing "Server connection collation". You'll need to log out and back in. Click the right arrow in the upper-left corner to display the left-hand-side tabbed pane, and then click the little icon of the exit door with the tiny left-facing green arrow. Then log back in.

Now, utf8mb3 is not ideal because it only supports the Unicode Basic Multilingual Plane (BMP); it cannot encode the full set of Chinese characters or emojis.

Let's fix that for our existing database. First, add this line to your ~/.my.cnf:

default-character-set=utf8mb4

Next, you'll need to modify the mysql client to always specify utf8mb4 as the character-set-server variable; normally this would be done in the global MariaDB config file or in SQL by setting

set global character-set-server='utf8mb4';

but you don't have permission to do that. So, as a workaround, you can set a shell alias to do it each time you connect, by adding these two lines to your ~/.cshrc file:

set backslash_quote
alias mysql mysql "--init-command=\"set character_set_server='utf8mb4';\""

or, if you are using the bash shell, adding these two lines to your ~/.bashrc file:

alias mysql="mysql --init-command=\"set character_set_server='utf8mb4';\""

and then logging out and then back in. Next, be sure to do this exactly as shown:

mysql

Then within mysql, do this:

drop database cs340_ramseyst;
create database cs340_ramseyst;
source ./bsg_db.sql

Now, let's look at the encoding again:

show variables like 'character_set%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8mb4                    |
| character_set_connection | utf8mb4                    |
| character_set_database   | utf8mb4                    |
| character_set_filesystem | binary                     |
| character_set_results    | utf8mb4                    |
| character_set_server     | utf8mb4                    |
| character_set_system     | utf8mb3                    |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

Note that character_set_system is a read-only variable, so there is nothing that can be done about that, short of recompiling MariaDB; I think that just means that you can't have an emoji in a table name, or that kind of thing. To verify that it worked, do this:

mysql -e "select default_character_set_name from information_schema.SCHEMATA where schema_name='cs340_ramseyst';"
+----------------------------+
| default_character_set_name |
+----------------------------+
| utf8mb4                    |
+----------------------------+

Now go to classmysql.engr.oregonstate.edu, log into phpMyAdmin, and click on the "Databases" tab. The character encoding for cs340_ramseyst should show up as utf8mb4_general_ci.