Administration

Multibyte characterset and length semantics

User Rating:  / 1
PoorBest 
Parent Category: Articles
Created on Thursday, 18 March 2010 20:29
Last Updated on Monday, 12 March 2012 13:54
Published on Thursday, 18 March 2010 20:29
Written by Guy Lambregts
Hits: 9523

Multibyte characterset and length semantics

 

These days we can with recent Oracle versions more -and easy- online reorganize our databases. Before it was very important to think about transaction slots (ITL), freelists, extents at the moment of segment creation. With recent Oracle versions we see Oracle is handling it all -or most of it- for us.

There is however still one very important "thing" to think about BEFORE the database creation : the database characterset. Living in Western Europe I have had very few issues with charactersets. Most of the time I had to deal with singlebyte charactersets like WE8ISO8859P15, WE8ISO8859P1 however these days I get involved in projects where we have to handle UTF8 multibyte characters

Single-byte character sets are character sets with names of the form xxx7yyyyyy and xxx8yyyyyy. 
Each character code of a single-byte character set occupies exactly one byte. Multibyte character sets are all other character sets (like UTF8). Some character codes of a multibyte character set occupy more than one byte.

Let' s check the database characterset

 

SQL> select parameter,value from nls_database_parameters where parameter like '%CHARACTERSET%';

NLS_CHARACTERSET
AL32UTF8

NLS_NCHAR_CHARACTERSET
AL16UTF16

 

Considerations about multibyte charactersets 

In the above example we have to deal with AL32UTF8, which is a unicode variable width multibyte characterset. AL32UTF8 follows the unicode standards, check them out here http://www.unicode.org

More in detail

Version unicode 3.0     in RDBMS version     9.0
Version unicode 3.1     in RDBMS version     9.2
Version unicode 3.2     in RDBMS version    10.1
Version unicode 4.01   in RDBMS version    10.2

It' s time for some sql in a database of which the characterset is multibyte

 

SQL> create table t_1 (col1 varchar2(1));

Table created.

SQL> create table t_2 (col1 varchar2(1 CHAR));
Table created.

SQL> insert into t_1 values ('é');
insert into t_1 values ('é')

ERROR at line 1:
ORA-12899: value too large for column "MY_UTF"."T_1"."COL1" (actual: 3,maximum: 1)

SQL> insert into t_2 values ('é');
1 row created.

SQL> commit;
Commit complete.

 

Why didn' t we succeed to insert the character é in table t_1 ? The answer is that the character é is a multibyte character in al32utf8. Here I show you it is 3 bytes.

 

SQL> select dump(col1,4) from t_2;

Typ=1 Len=3: 239,191,189

 

Let' s query user_tab_columns in order to retrieve the difference between t_1.col1 and t_2.col1. T_1.col1 has been created with a fixed byte length whereas t_2.col1 has been created with a fixed char length.

 

SQL> select table_name,column_name,data_length,char_length,char_used from user_tab_columns where table_name in ('T_1','T_2');
T_1    COL1    1 1 B


T_2    COL1    4 1
C

 

Oracle shouldn' t be Oracle if there wouldn' t be a kind of parameter in order to bypass some issues about this. Here we speak about nls_length_semantics which is a database parameter, which we can alter at the instance level, which we can alter at the session level.

 

SQL> select parameter,value from nls_database_parameters where parameter='NLS_LENGTH_SEMANTICS';

NLS_LENGTH_SEMANTICS
BYTE

SQL> select parameter,value from nls_instance_parameters where parameter='NLS_LENGTH_SEMANTICS';

NLS_LENGTH_SEMANTICS
BYTE

SQL> select parameter,value from nls_session_parameters where parameter='NLS_LENGTH_SEMANTICS';

NLS_LENGTH_SEMANTICS
BYTE

SQL> alter session set nls_length_semantics=CHAR;

Session altered.

SQL> select parameter,value from nls_session_parameters where parameter='NLS_LENGTH_SEMANTICS';

NLS_LENGTH_SEMANTICS
CHAR

 

Let' s check out the difference

 

SQL> create table t_1bis (col1 varchar2(1));

Table created.

SQL> insert into t_1bis values ('é');
1 row created.

SQL> commit;
Commit complete.


SQL> select table_name,column_name,data_length,char_length,char_used from user_tab_columns where table_name in ('T_1','T_2','T_1BIS');
T_1             COL1    1 1
B

T_1BIS        COL1    4 1
C

T_2             COL1    4 1
C

 

This one I use in order to have the fixed char length for the current and the next instances (assuming I work with an spfile)

 

SQL> alter system set nls_length_semantics=char scope=both;

 


Check out Metalink Note' s

Doc ID:Note:260893.1
Subject: Unicode character sets in the Oracle database

Doc ID:Note:144808.1
Subject: Examples and limits of BYTE and CHAR semantics usage

Doc ID: Note:119164.1 
Subject: Changing Database Character Set - Valid Superset Definitions