Administration

How to change the database characterset ?

User Rating:  / 0
PoorBest 
Parent Category: Articles
Created on Friday, 22 April 2011 16:08
Last Updated on Thursday, 10 September 2015 13:09
Published on Friday, 22 April 2011 16:08
Written by Guy Lambregts
Hits: 15970

How to change the database characterset from single byte to multibyte using CSALTER and export/import utilities

 

Added on September 10 2015 : Oracle has introduced with Oracle 12C the Database Migration Assistant for Unicode ( DMUA )
Hence the below example is for RDBMS < 12C
 


In this example I show how I migrated from a single byte WE8MSWIN1252 to a unicode multibyte characterset AL32UTF8.

 

It is beyond the scope of this article to describe all different scenario' s, there are so many database charactersets, the national characterset comes into play.
A full backup of your database prior to the characterset change is always mandatory.


Since release i don' t know oracle has introduced the characterset scan utility. Easy to use assuming the CSMIG schema is installed.


csscan \"SYS/SECRET as sysdba\" FULL=Y

Character Set Scanner v2.1 : Release 10.2.0.3.0 - Production on Fri Apr 22 04:57:10 2011

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

Connected to:

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

CSS-00107: Character set migration utility schema not installed


How to install the CSMIG schema ?



C:\Documents and Settings\E132707a>sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Fri Apr 22

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4
With the Partitioning, OLAP, Data Mining and Real Appli

SQL> start ?/rdbms/admin/csminst.sql;


Once the CSMIG installed, we run csscan again


csscan \"SYS/SECRET as sysdba\" FULL=Y


Character Set Scanner v2.1 : Release 10.2.0.3.0 - Production on Fri Apr 22 04:59:23 2011

Copyright (c) 1982, 2007, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

Current database character set is WE8MSWIN1252.


Enter new database character set name: > AL32UTF8 >>> user input required

Enter array fetch buffer size: 1024000 > >>> user input required

Enter number of scan processes to utilize(1..32): 1 > 4 >>> user input required


...........................

. process 1 scanning WMSYS.WM$RIC_TRIGGERS_TABLE[AAAMlVAAJAAAB1kAAA]
. process 4 scanning SYS.WRH$_TABLESPACE_STAT[AAAJBZAAJAAABV8AAA]
. process 3 scanning SYSTEM.LOGMNR_ATTRCOL$[AAAIdcAAJAAAAjaAAA]
. process 2 scanning SYS.WRH$_LATCH_MISSES_SUMMARY[AAAJAyAAJAAABElAAA]
. process 1 scanning SYS.WRH$_DB_CACHE_ADVICE[AAAJA2AAJAAABFRAAA]
. process 4 scanning SYS.WRH$_DLM_MISC[AAAJBOAAJAAABQ9AAA]
. process 3 scanning SYS.WRH$_WAITSTAT[AAAJAhAAJAAABASAAA]
. process 4 scanning SYSTEM.LOGMNRC_GSII[AAAIdNAAJAAAAg1AAA]
. process 2 scanning SYSTEM.LOGSTDBY$APPLY_PROGRESS[AAAJFSAAJAAABz2AAA]

Creating Database Scan Summary Report...

Creating Individual Exception Report...

Scanner terminated successfully.


There is an output file scan.txt which can be read with a text editor
You may want to read Oracle' s Metalink "Csscan output explained" [ID 444701.1] for interpertation of output.


Database Scan Summary Report

Time Started  : 2011-04-22 05:00:17
Time Completed: 2011-04-22 05:01:46

Process ID         Time Started       Time Completed
---------- -------------------- --------------------
1  2011-04-22 05:00:52  2011-04-22 05:01:45
2  2011-04-22 05:00:52  2011-04-22 05:01:45
3  2011-04-22 05:00:53  2011-04-22 05:01:45
4  2011-04-22 05:00:54  2011-04-22 05:01:45
---------- -------------------- --------------------

[Database Size]

Tablespace                           Used            Free           Total       Expansion
------------------------- --------------- --------------- --------------- ---------------
SYSTEM                            543.36M           7.73M         551.10M           8.52M
RBS                               456.63M         355.07M         811.70M            .00K
TEMP                                 .00K            .00K            .00K            .00K
COCONUT_DATA                    1,241.00M         759.00M       2,000.00M          35.00K
COCONUT_INDEX                     275.04M       1,724.96M       2,000.00M            .00K
DRSYS                               5.24M          14.76M          20.00M            .00K
TOOLS                             160.48M         351.53M         512.00M            .00K
USERS                             102.00K         108.93M         109.03M            .00K
XDB                                49.04M         990.00K          50.00M            .00K
SYSAUX                             51.86M         448.14M         500.00M          12.00K
------------------------- --------------- --------------- --------------- ---------------
Total                           2,782.76M       3,771.09M       6,553.85M           8.56M

The size of the largest CLOB is 1625114 bytes

[Database Scan Parameters]

Parameter                      Value                                           
------------------------------ ------------------------------------------------
CSSCAN Version                 v2.1                                            
Instance Name                  PLATINUM                                           
Database Version               10.2.0.4.0                                      
Scan type                      Full database                                   
Scan CHAR data?                YES                                             
Database character set         WE8MSWIN1252                                    
FROMCHAR                       WE8MSWIN1252                                    
TOCHAR                         AL32UTF8                                        
Scan NCHAR data?               NO                                              
Array fetch buffer size        1024000                                         
Number of processes            4                                               
Capture convertible data?      NO                                              
------------------------------ ------------------------------------------------

[Scan Summary]

All character type data in the data dictionary are convertible to the new character set
All character type application data are convertible to the new character set

[Data Dictionary Conversion Summary]

Datatype                    Changeless      Convertible       Truncation            Lossy
--------------------- ---------------- ---------------- ---------------- ----------------
VARCHAR2                     1,650,220                0                0                0
CHAR                             1,111                0                0                0
LONG                           140,383                0                0                0
CLOB                            19,084              486                0                0
VARRAY                          16,956                0                0                0
--------------------- ---------------- ---------------- ---------------- ----------------
Total                        1,827,754              486                0                0
Total in percentage             99.973%           0.027%           0.000%           0.000%

The data dictionary can be safely migrated using the CSALTER script

[Application Data Conversion Summary]

Datatype                    Changeless      Convertible       Truncation            Lossy
--------------------- ---------------- ---------------- ---------------- ----------------
VARCHAR2                    11,174,558                1                0                0
CHAR                                 0                0                0                0
LONG                                 6                0                0                0
CLOB                                 0              172                0                0
VARRAY                               0                0                0                0
--------------------- ---------------- ---------------- ---------------- ----------------
Total                       11,174,564              173                0                0
Total in percentage             99.998%           0.002%           0.000%           0.000%

[Distribution of Convertible, Truncated and Lossy Data by Table]

USER.TABLE                                              Convertible       Truncation            Lossy
-------------------------------------------------- ---------------- ---------------- ----------------
PLATINUM.RCFORMULA                                              172                0                0
PLATINUM.SFMOD                                                    1                0                0
MDSYS.SDO_COORD_OP_PARAM_VALS                                   200                0                0
MDSYS.SDO_GEOR_XMLSCHEMA_TABLE                                    1                0                0
MDSYS.SDO_STYLES_TABLE                                           78                0                0
MDSYS.SDO_XML_SCHEMAS                                             3                0                0
SYS.METASTYLESHEET                                               80                0                0
SYS.RULE$                                                         1                0                0
SYS.SCHEDULER$_EVENT_LOG                                          6                0                0
SYS.WRI$_DBU_FEATURE_METADATA                                    98                0                0
SYS.WRI$_DBU_HWM_METADATA                                        19                0                0
-------------------------------------------------- ---------------- ---------------- ----------------

[Distribution of Convertible, Truncated and Lossy Data by Column]

USER.TABLE|COLUMN                                       Convertible       Truncation            Lossy
-------------------------------------------------- ---------------- ---------------- ----------------
PLATINUM.RCFORMULA|RCFORMULA_SCRIPT                              86                0                0
PLATINUM.RCFORMULA|RCFORMULA_TEXT                                86                0                0
PLATINUM.SFMOD|SFMOD_DESC                                         1                0                0
MDSYS.SDO_COORD_OP_PARAM_VALS|PARAM_VALUE_FILE                  200                0                0
MDSYS.SDO_GEOR_XMLSCHEMA_TABLE|XMLSCHEMA                          1                0                0
MDSYS.SDO_STYLES_TABLE|DEFINITION                                78                0                0
MDSYS.SDO_XML_SCHEMAS|XMLSCHEMA                                   3                0                0
SYS.METASTYLESHEET|STYLESHEET                                    80                0                0
SYS.RULE$|CONDITION                                               1                0                0
SYS.SCHEDULER$_EVENT_LOG|ADDITIONAL_INFO                          6                0                0
SYS.WRI$_DBU_FEATURE_METADATA|INST_CHK_LOGIC                     12                0                0
SYS.WRI$_DBU_FEATURE_METADATA|USG_DET_LOGIC                      86                0                0
SYS.WRI$_DBU_HWM_METADATA|LOGIC                                  19                0                0
-------------------------------------------------- ---------------- ---------------- ----------------

[Indexes to be Rebuilt]

USER.INDEX on USER.TABLE(COLUMN)                                                         
-----------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------


According the output I should be able to convert the database characterset using the csalter package. We need to be in restricted mode to use it.
( it is rather obvious that changing the database characterset requires a maintenance window and a single user context )



SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup restrict;
ORACLE instance started.

Total System Global Area  524288000 bytes
Fixed Size                  1297676 bytes
Variable Size             243270388 bytes
Database Buffers          276824064 bytes
Redo Buffers                2895872 bytes
Database mounted.
Database opened.
SQL> start ?/rdbms/admin/csalter.plb

0 rows created.


Function created.


Function created.


Procedure created.

This script will update the content of the Oracle Data Dictionary.
Please ensure you have a full backup before initiating this procedure.
Would you like to proceed (Y/N)?Y
old   6:     if (UPPER('&conf') <> 'Y') then
new   6:     if (UPPER('Y') <> 'Y') then
Checking data validility...
Unrecognized convertible date found in scanner result

PL/SQL procedure successfully completed.

Checking or Converting phrase did not finish successfully
No database (national) character set will be altered
CSALTER finished unsuccessfully.

PL/SQL procedure successfully completed.


0 rows deleted.


Function dropped.


Function dropped.


Procedure dropped.

SQL> alter system disable restricted session;





Despite the fact that according the CSSCAN output I can use csalter. the error "Unrecognized convertible date found in scanner result"
When I ran the csscan i had some issues at the level of the XDB schema. So I have to use expdp/impdp. note there are bugs when you use datapump for characterset conversion, which are solved with 10.2.0.4 ( and I use 10.2.0.4 )

Which directories have been configured so far ?

 


SQL> select * from dba_directories;


SYS                            DATA_PUMP_DIR
D:\oracle\product\10.2.0\db_1\rdbms\log\

SYS                            EXPORT
E:\oracle\export\PLATINUM

SYS                            ORACLE_OCM_CONFIG_DIR
D:\oracle\product\10.2.0\db_1\ccr\state

SYS                            ADMIN_DIR
D:\oracle\product\10.2.0\db_1/md/admin

SYS                            WORK_DIR
D:\oracle\product\10.2.0\db_1/work


SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options



I export the source database with Data Pump, note that

1. If you use datapump you do not need to set the environment variable NLS_LANG

2. if you use traditional export / import you NEED to set the NLS_LANG enviornment variable to the characterset of the source database.

 

See also MOS : NLS considerations in Import/Export - Frequently Asked Questions [ID 227332.1]

 


-- REM SET NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252

expdp dba_user directory=EXPORT dumpfile=expdp_PLATINUM_FULL.dmp logfile=expdp_PLATINUM_FULL.log full=Y


I create a new database with AL32UTF8. The creation of a new database is beyond the scope of this document and I import the previously taken dump.
note that

1. If you use datapump you do not need to set the environment variable NLS_LANG

2. if you use traditional export / import you NEED to set the NLS_LANG enviornment variable to the characterset of the source database.



-- REM SET NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252

impdp dba_user directory=EXPORT dumpfile=expdp_PLATINUM_FULL.dmp logfile=impdp_PLATINUM_schemas.log parfile=impdp_schemas.par



It can happen that the CSSCAN output lists you some records in the "truncation" column. this means that as a result of the characterset change not all the data can be stored in the varchar fields
Why is that ? Some characters - example latin characters like é,è,ê - require 1 byte in a single characterset database whereas they require more bytes in a multibyte characterset database.


See Multibyte characterset and length semantics



The solution for this is

1. import initially with CONTENT=METADATA_ONLY
2. disable triggers, foreign keys and virtual private database policies
3. change the table definition or change the NLS_LENGTH_SEMANTICS parameter ( see Multibyte characterset and length semantics )
4. import with CONTENT=DATA_ONLY
5. enable triggers, foreign keys and virtual private database policies

It does not take a hero to change / migrate the characterset.
Just concentrate on the NLS_LANG environment variable if you would still use traditional export and import utilities and take a full backup of your database PRIOR to the characterset change.