Administration
Multibyte characterset and length semantics
User Rating: / 1
- Details
-
Parent Category: Articles
-
Created on Thursday, 18 March 2010 20:29
-
Last Updated on Monday, 12 March 2012 13:54
-
Published on Thursday, 18 March 2010 20:29
-
Written by Guy Lambregts
-
Hits: 9561
Multibyte characterset and length semantics
These days we can with recent Oracle versions more -and easy- online reorganize our databases. Before it was very important to think about transaction slots (ITL), freelists, extents at the moment of segment creation. With recent Oracle versions we see Oracle is handling it all -or most of it- for us.
There is however still one very important "thing" to think about BEFORE the database creation : the database characterset. Living in Western Europe I have had very few issues with charactersets. Most of the time I had to deal with singlebyte charactersets like WE8ISO8859P15, WE8ISO8859P1 however these days I get involved in projects where we have to handle UTF8 multibyte characters.
Single-byte character sets are character sets with names of the form xxx7yyyyyy and xxx8yyyyyy.
Each character code of a single-byte character set occupies exactly one byte. Multibyte character sets are all other character sets (like UTF8). Some character codes of a multibyte character set occupy more than one byte.
Let' s check the database characterset
SQL> select parameter,value from nls_database_parameters where parameter like '%CHARACTERSET%';
NLS_CHARACTERSET
AL32UTF8
NLS_NCHAR_CHARACTERSET
AL16UTF16
Considerations about multibyte charactersets
In the above example we have to deal with AL32UTF8, which is a unicode variable width multibyte characterset. AL32UTF8 follows the unicode standards, check them out here http://www.unicode.org
More in detail
Version unicode 3.0 in RDBMS version 9.0
Version unicode 3.1 in RDBMS version 9.2
Version unicode 3.2 in RDBMS version 10.1
Version unicode 4.01 in RDBMS version 10.2
It' s time for some sql in a database of which the characterset is multibyte
SQL> create table t_1 (col1 varchar2(1));
Table created.
SQL> create table t_2 (col1 varchar2(1 CHAR));
Table created.
SQL> insert into t_1 values ('é');
insert into t_1 values ('é')
ERROR at line 1:
ORA-12899: value too large for column "MY_UTF"."T_1"."COL1" (actual: 3,maximum: 1)
SQL> insert into t_2 values ('é');
1 row created.
SQL> commit;
Commit complete.
Why didn' t we succeed to insert the character é in table t_1 ? The answer is that the character é is a multibyte character in al32utf8. Here I show you it is 3 bytes.
SQL> select dump(col1,4) from t_2;
Typ=1 Len=3: 239,191,189
Let' s query user_tab_columns in order to retrieve the difference between t_1.col1 and t_2.col1. T_1.col1 has been created with a fixed byte length whereas t_2.col1 has been created with a fixed char length.
SQL> select table_name,column_name,data_length,char_length,char_used from user_tab_columns where table_name in ('T_1','T_2');
T_1 COL1 1 1 B
T_2 COL1 4 1 C
Oracle shouldn' t be Oracle if there wouldn' t be a kind of parameter in order to bypass some issues about this. Here we speak about nls_length_semantics which is a database parameter, which we can alter at the instance level, which we can alter at the session level.
SQL> select parameter,value from nls_database_parameters where parameter='NLS_LENGTH_SEMANTICS';
NLS_LENGTH_SEMANTICS
BYTE
SQL> select parameter,value from nls_instance_parameters where parameter='NLS_LENGTH_SEMANTICS';
NLS_LENGTH_SEMANTICS
BYTE
SQL> select parameter,value from nls_session_parameters where parameter='NLS_LENGTH_SEMANTICS';
NLS_LENGTH_SEMANTICS
BYTE
SQL> alter session set nls_length_semantics=CHAR;
Session altered.
SQL> select parameter,value from nls_session_parameters where parameter='NLS_LENGTH_SEMANTICS';
NLS_LENGTH_SEMANTICS
CHAR
Let' s check out the difference
SQL> create table t_1bis (col1 varchar2(1));
Table created.
SQL> insert into t_1bis values ('é');
1 row created.
SQL> commit;
Commit complete.
SQL> select table_name,column_name,data_length,char_length,char_used from user_tab_columns where table_name in ('T_1','T_2','T_1BIS');
T_1 COL1 1 1 B
T_1BIS COL1 4 1 C
T_2 COL1 4 1 C
This one I use in order to have the fixed char length for the current and the next instances (assuming I work with an spfile)
SQL> alter system set nls_length_semantics=char scope=both;
Check out Metalink Note' s
Doc ID:Note:260893.1
Subject: Unicode character sets in the Oracle database
Doc ID:Note:144808.1
Subject: Examples and limits of BYTE and CHAR semantics usage
Doc ID: Note:119164.1
Subject: Changing Database Character Set - Valid Superset Definitions