Amr Blog: NLS_CHARACTERSET for BOTH ARABIC and FRENCH

Oracle Database - Enterprise Edition - Version 8.0.3.0 to 11.2.0.4 [Release 8.0.3 to 11.2]
Oracle Database - Standard Edition - Version 8.0.3.0 to 11.2.0.4 [Release 8.0.3 to 11.2]
Information in this document applies to any platform.

Purpose

To provide a guide to change the NLS_CHARACTERSET to AL32UTF8 or UTF8.
This note will only deal with the database (server side) change itself.
For further implications on clients and application level when going to AL32UTF8 please see Note 788156.1 AL32UTF8 / UTF8 (Unicode) Database Character Set Implications.
It's strongly recommended to:

do a complete "testdrive" of the WHOLE change upfront
read Note 788156.1 AL32UTF8 / UTF8 (Unicode) Database Character Set Implications first and to make sure your application and clients are checked and ready for the change on database level.

This note was specific to FROM: WE8ISO8859P1, WE8ISO8859P15 or WE8MSWIN1252 TO: AL32UTF8 or UTF8
The current note however can be used to go from any NLS_CHARACTERSET to AL32UTF8 / UTF8. ( which also means it can be used to go from UTF8 to AL32UTF8 (or inverse) ).

The note is written using AL32UTF8, to use this note to go to an other characterset (for example UTF8) simply replace "AL32UTF8" with "UTF8" in the CSSCAN TOCHAR and for 9i and lower in the alter database character set command.

This "flow" can also be used to go from any single byte characterset (like US7ASCII, WE8DEC) to any other Multi byte characterset (ZHS16GBK, ZHT16MSWIN950, ZHT16HKSCS, ZHT16HKSCS31,KO16MSWIN949, JA16SJIS ...), simply substitute AL32UTF8 with the xx16xxxx target characterset. But in that case going to AL32UTF8 would be simply be a far better idea. Note 333489.1Choosing a database character set means choosing Unicode.

In 8i (8.1.7 and lower) you cannot use AL32UTF8 seen this is not known, use UTF8 instead.
In 10g and 11g you NEED to use Csalter. Do NOT use "Alter database character set", using "Alter database character set" is NOT supported in 10g and 11g.
From Oracle 12c onwards, the DMU will be the only tool available to migrate to Unicode.

Scope

Any DBA changing the current NLS_CHARACTERSET to AL32UTF8 / UTF8 or an other multibyte characterset.
In this note AL32UTF8 will be used, but it's applicable to UTF8 (like to be used in 8i instead of AL32UTF8) or other multibyte charactersets also.

The current NLS_CHARACTERSET is seen in NLS_DATABASE_PARAMETERS.

Sqlplus / as sysdba
SELECT value FROM NLS_DATABASE_PARAMETERS WHERE parameter='NLS_CHARACTERSET'
/

The NLS_CHARACTERSET is defining the characterset of the CHAR, VARCHAR2, LONG and CLOB datatypes.

Details

1) General remarks on going to AL32UTF8

This note documents the step to safely change to Unicode by the usage of
* for 8i and 9i - Csscan and " Alter database" combined with a partial Export / Import
* for 10g and 11.2g - Csscan and Csalter combined with partial Export / Import
* for 8i up to 11.2g Csscan with full Export / Database re-creation / Import.

For migration to AL32UTF8 (and the deprecated UTF8), there is a new tool available called the Database Migration Assistant for Unicode (DMU). The DMU is a unique next-generation migration tool providing an end-to-end solution for migrating your databases from legacy encodings to Unicode. DMU's intuitive user-interface greatly simplifies the migration process and lessens the need for character set migration expertise by guiding the DBA through the entire migration process as well as automating many of the migration tasks.
The DMU tool is supported against Oracle 11.2.0.3 and higher and selected older versions and platform combinations.
For more information please see Note 1272374.1 The Database Migration Assistant for Unicode (DMU) Tool and the DMU pages on OTN.
From Oracle 12c onwards, DMU will be the only tool available to migrate to Unicode.

1.a) Prerequisites:

In this note the Csscan tool is used. Please install this first
Note 458122.1 Installing and configuring CSSCAN in 8i and 9i
Note 745809.1 Installing and configuring CSSCAN in 10g and 11g
To have an overview of the output and what it means please see Note 444701.1 Csscan output explained

1.b) When changing an Oracle Applications Database:

Please see the following note for an Oracle Applications database: Note 124721.1 Migrating an Applications Installation to a New Character Set.
This is the only way supported by Oracle applications. If you have any doubt log an Oracle Applications SR for assistance.

1.c) When to use (full) export / import and when to use Alter Database Character Set / Csalter?

With "Full exp/imp" is meant to take an export and import the complete dataset into a NEW AL32UTF8 database (this may be an other Oracle version or OS platform).
Full exp/imp can be used at any time. To avoid data loss please do check your source database before doing the export with Csscan (= follow this note until point 6 and then go to step 12).

Using Alter Database Character Set / Csalter to change the NLS_CHARACTERSET of an existing database instead of a full export/import has an (down)time advantage when the amount of "Convertible" user data is low compared to the amount of "Changeless" data (seen the "Changeless" data needs no action) and/or when recreating the database will take a lot of time but is in general somewhat more complex.
The end result of both way's of doing the change is however the same.

If only certain schema or tables need to be moved to an other or new AL32UTF8 database (this may be an other Oracle version or OS platform) then, to avoid data loss please do check your source database with Csscan (= follow this note until point 6 and then go to step 12).
One can then adapt the csscan syntax in step 4 to only scan the schema user or table(s) that are going to be exported.
Examples of other than FULL=Y csscan syntax can be found in Note 1297961.1 ORA-01401 / ORA-12899 While Importing Data In An AL32UTF8 / UTF8 (Unicode) Or Other Multibyte NLS_CHARACTERSET Database.

We do however suggest to do in any case the FULL=Y scan in point 3 in his note seen this is more a "health check" and if there is any correction needed this needs in almost al cases to be done on database level.

1.d) When using Expdp/Impdp (DataPump)

Do NOT use Expdp/Impdp when going to (AL32)UTF8 or an other multibyte characterset on ALL 10g versions lower then 10.2.0.4 (including 10.1.0.5). Also 11.1.0.6 is affected.
It will provoke data corruption unless Patch 5874989 is applied on the Impdp side. Expdp is not affected, hence the data in the dump file is correct.Also the "old" exp/imp tools are not affected.
This problem is fixed in the 10.2.0.4 and 11.1.0.7 patch set.
Fixed in 11.2.0.1 and up
For windows the fix is included in
10.1.0.5.0 Patch 20 (10.1.0.5.20P) or later, see Note 276548.1 .
10.2.0.3.0 Patch 11 (10.2.0.3.11P) or later, see Note 342443.1 .

1.e) Using Alter Database Character Set on 9i

For 9i systems please make sure you are at least on Patchset 9.2.0.4, see Note 250802.1 Changing character set takes a very long time and uses lots of rollback space

1.f) What about Physical / Logical Standby databases?

If your system has a Logical Standby database this Logical standby needs to be re-created from the "main" database after the characterset conversion, there is no supported way to alter the standby together with the main database.
Seen the Logical standby database needs to have the same NLS_CHARACTERSET as the source database it's also not possible to for example change a Logical standby database and then, after changing the NLS_CHARACTERSET of the logical apply database, restart the apply of the redo of the main database to this database.
For Physical Standbys lower than 11.1.0.7 the Physical Standby also needs to be re-created.
From 11.1.0.7 onwards this is not needed any more for Physical Standbys, please see Note 1124165.1 Changing Primary Database Character Set without Recreating Data Guard Physical Standbys

1.G) How to test the steps in this note upfront?

To properly test the *database* changes (= the steps in this note) you will need to do the steps:

when using Csalter/alter database on a physical COPY of the database you want to convert.
when using export/import to go to a new AL32UTF8 database on OR a physical COPY of the database you want to export OR a test database that has the SAME NLS_CHARACTERSET as the current NLS_CHARACTERSET and then export /import the whole data set you are planning to move to the new AL32UTF8 into this test database .

2) Check the source database for unneeded or problematic objects:

tip for (10g and up): remove first all objects in the recyclebin

Sqlplus / as sysdba
SELECT OWNER, ORIGINAL_NAME, OBJECT_NAME, TYPE FROM dba_recyclebin ORDER BY 1,2
/

If there are objects in the recyclebin then perform

Sqlplus / as sysdba
PURGE DBA_RECYCLEBIN
/

This will remove unneeded objects and avoid confusion in following steps. It might be a good idea to simply disable the recyclebin until the change is done by setting the recyclebin='OFF' parameter

2.a) Invalid objects.

Sqlplus / as sysdba
SELECT owner,
object_name,
object_type,
status
FROM dba_objects
WHERE status ='INVALID'
/

If there are any invalid objects run utlrp.sql

SQL> @?/rdbms/admin/utlrp.sql

If there are any left after running utlrp.sql then please manually resolve / drop the invalid objects.

2.b) Orphaned Datapump master tables (10g and up)

Sqlplus / as sysdba
SELECT o.status,
o.object_id,
o.object_type,
o.owner
||'.'
||object_name "OWNER.OBJECT"
FROM dba_objects o,
dba_datapump_jobs j
WHERE o.owner =j.owner_name
AND o.object_name=j.job_name
AND j.job_name NOT LIKE 'BIN$%'
ORDER BY 4,2
/

See Note 336014.1 How To Cleanup Orphaned DataPump Jobs In DBA_DATAPUMP_JOBS ?

2.c) Unneeded sample schema/users.

The 'HR', 'OE', 'SH', 'PM', 'IX', 'BI' and 'SCOTT' users are sample schema users. There is no point in having these sample schema in a production system. If a sample schema exist we suggest to remove it.
This note is useful to identify users in your database Note 160861.1 Oracle Created Database Users: Password, Usage and Files .
An other user that might be removed is SQLTXPLAIN from Note 215187.1
If you do not use APEX (Oracle Application Express) / HTMLDB the FLOWS_xxxx and APEX_xxxx users can also be dropped Note 558340.1 How to Uninstall Oracle HTML DB / Application Express from the Database?

2.d) Leftover Temporary tables using CHAR semantics.

Sqlplus / as sysdba
SELECT C.owner
||'.'
|| C.table_name
||'.'
|| C.column_name
||' ('
|| C.data_type
||' '
|| C.char_length
||' CHAR)'
FROM all_tab_columns C
WHERE C.char_used = 'C'
AND C.table_name IN
(SELECT table_name FROM dba_tables WHERE TEMPORARY='Y'
)
AND C.data_type IN ('VARCHAR2', 'CHAR')
ORDER BY 1
/

These tables MAY (!) give during Alter database Charter Set or Csalter

ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-14450: attempt to access a transactional temp table already in use.

Temporary tables should be recreated by the application when needed, so if you have tables listed by above select it's good idea to confirm the application will recreate them if needed and drop them now (or if the db is still in use now, do this just before the final Csscan run , point 7 in this note).
If the reported tables are SYS.ORA_TEMP_X_DS_XXXX (like SYS.ORA_TEMP_1_DS_27681, SYS.ORA_TEMP_1_DS_27686 ) they are leftovers of DBMS_STATS ( note:4157602.8 ) so they can be dropped without problems at any time .

2.e) Make sure your database is in good health.

It might be a good idea to run Note 456468.1 Identify Data Dictionary Inconsistency and NOTE 136697.1 "hcheck8i.sql" script to check for known problems in Oracle8i,Oracle9i, and Oracle10g
or for Oracle 11g Note 466920.1: 11g New Feature Health monitor

3) Check the Source database for "Lossy" (invalid code points in the current source character set).

Run Csscan with the following syntax:

$ csscan \"sys/<syspassword>@<TNSalias> as sysdba\" FULL=Y FROMCHAR=<current NLS_CHARACTERSET> TOCHAR=<current NLS_CHARACTERSET> LOG=dbcheck CAPTURE=N ARRAY=1000000 PROCESS=2

* Always run Csscan connecting with a 'sysdba' connection/user, do not use "system" or "csmig" user.
* The <current NLS_CHARACTERSET> is seen in NLS_DATABASE_PARAMETERS.

conn / AS sysdba
SELECT value FROM NLS_DATABASE_PARAMETERS WHERE parameter='NLS_CHARACTERSET'
/

* The TOCHAR=<current NLS_CHARACTERSET> is not a typo, the idea is to check the CURRENT characterset for codes who are not defined in this NLS_CHARACTERSET before changing the NLS_CHARACTERSET
* The PROCESS= parameter influences the load on your system, the higher this is (6 or 8 for example) the faster Csscan will be done, the lower this is the less impact it will have on your system. Adapt if needed.
* The csscan SUPPRESS parameter limits the size of the .err file by limiting the amount of information logged / table. Using SUPPRESS=1000 will log max 1000 rows for each table in the .err file. It will not affect the information in the .txt file. It WILL affect the data logged in the .err file. This is mainly useful for the first scan of big databases, if you have no idea how much "Convertible" or "Lossy" there is in a database then this will avoid that the .err file becomes 100's of MB big and it limits also the space used by the csscan tables under the Csmig schema.
* Do not specify the TONCHAR or FROMNCHAR csscan parameters , those are to change the NLS_NCHAR_CHARACTERSET. Again they are not needed and should not be specified.

Once Csscan is actually scanning the tables v$session_longops can be used to see the progress of scans of big tables:

Sqlplus / as sysdba
SET pages 1000
SELECT target,
TO_CHAR(start_time,'HH24:MI:SS - DD-MM-YY'),
time_remaining,
sofar,
totalwork,
sid,
serial#,
opname
FROM v$session_longops
WHERE sid IN
(SELECT sid FROM v$session WHERE upper(program) LIKE 'CSSCAN%'
)
AND sofar < totalwork
ORDER BY start_time
/

Csscan will create 3 files :

dbcheck.out a log of the output of csscan
dbcheck.txt a Database Scan Summary Report
dbcheck.err contains the rowid's of the Lossy rows reported in dbcheck.txt (if any).

This Csscan is to check if all data is stored correctly in the current character set. Because the TOCHAR and FROMCHAR character sets as the same there cannot be any "Convertible" or "Truncation" data reported in dbcheck.txt.

If all the data in the database is stored correctly at the moment then there is only "Changeless" data reported in dbcheck.txt. If this is the case please go to point 4).

If there is any "Lossy" data then those rows contain code points that are not currently defined correctly and they should be cleared up before you can continue.
If this "Lossy" is not checked/corrected then this "Lossy" data WILL BE LOST.
Please see the following note more information about "Lossy" data Note 444701.1 Csscan output explained.

One exception is "binary" or "Encrypted data" stored in CHAR, VARCHAR2 or CLOB, this is not supported. The ONLY supported data types to store "raw" binary data (like PDF , doc, docx, jpeg, png , etc files) or encrypted data like hashed/encrypted passwords are LONG RAW or BLOB.
If you want to store binary data (like PDF , doc, docx, jpeg, png , etc files ) or encrypted data like hashed/encrypted passwords in CHAR, VARCHAR2, LONG or CLOB datatype than this must be converted to a "characterset safe" representation like base64 before doing any change of the NLS_CHARACTERSET.
See Note 1297507.1 Problems with (Importing) Encrypted Data After Character Set Change Using Other NLS_CHARACTERSET Database or Upgrading the (client) Oracle Version

3.a) How to "rescue" lossy data?

The first thing to do is try to find out what the actual encoding is of the "Lossy" data, this is extensivly documented in Note 444701.1Csscan output explained. sections "C.2) The source database has data (codes) that are NOT defined in the source characterset." and "C.3) The limit of what Csscan can detect when dealing with lossy, an example:"

The most common situation is when having an US7ASCII/WE8ISO8859P1 database and "Lossy", in this case changing your US7ASCII/WE8ISO8859P1 SOURCE database to WE8MSWIN1252 using Alter Database Character Set / Csalter will most likely solve you lossy. The reason is explained in Note 252352.1 Euro Symbol Turns up as Upside-Down Questionmark. The flow to do this is found in Note 555823.1 Changing US7ASCII or WE8ISO8859P1 to WE8MSWIN1252
Note that using Csscan alone is not enough, you will need to check your whole environment to deduct the real encoding of the data.
Do not blindly assume ALL your data is WE8MSWIN1252 and this does not mean ALL lossy can be "solved" by going to WE8MSWIN1252.

It cannot be repeated enough that :

If LOSSY need to be saved/corrected then it is NEEDED to change the NLS_CHARACTERSET FIRST to the "real" characterset of the LOSSY in the source database BEFORE going to AL32UTF8. If your WE8ISO8859P1 database has for example Hebrew stored you NEED to go most likely to IW8MSWIN1255 before going to AL32UTF8 seen WE8ISO8859P1 simply does not define Hebrew.
In some cases it is NOT possible the "correct" the "Lossy" data by changing the NLS_CHARACTERSET for all rows and then the best solution is to update those rows with something meaningful using Oracle SqlDeveloper. Download the latest version from http://www.oracle.com/technetwork/developer-tools/sql-developer/downloads/ . Do NOT use Toad, sqlplus, etc.
Do NOT use exp/imp to "correct" lossy, for example setting the NLS_LANG to WE8MSWIN1252 while exporting "Lossy" data from a WE8ISO8859P1 database will NOT solve the lossy, that data WILL be lost.

When preparing a test environment to debug this you can use 2 things:

a restored physical copy (backup) of the database
an export/import of the dataset in a database with the same NLS_CHARACTERSET as the current source database. This can also be used if the current database cannot be "corrected" (it is for exemple still used by other applications etc), you then can exp/imp the dataset to a "temporary" database who uses the same NLS_CHARACTERSET as the current source database, correct this database and then go from that database to an AL32UTF8 database using exp/imp.

note: If you use this note to go from AL32UTF8 to UTF8 (or inverse) and you have lossy then log a SR and ask for a review by the "Advanced Resolution Team".

This select will give all the lossy objects found in the last Cssan run:

Note that when using the csscan SUPPRESS parameter this select may give incomplete results (not all tables).

Sqlplus / as sysdba
SELECT DISTINCT z.owner_name
|| '.'
|| z.table_name
|| '('
|| z.column_name
|| ') - '
|| z.column_type
|| ' ' LossyColumns
FROM csmig.csmv$errors z
WHERE z.error_type ='DATA_LOSS'
ORDER BY LossyColumns
/

Lossy in Data Dictionary objects

When using Csalter/Alter Database Character Set:

Most "Lossy" in the Data Dictionary objects will be corrected by correcting the database as a whole, if the only "lossy" is found in Data Dictionary objects then follow the tips for "Convertible" Data Dictionary data .
For example one common thing seen is "Lossy" found only in SYS.SOURCE$, most of the time this means some package source code contain illegal codes/bad data.
You can use the selects found in Note 291858.1 "SYS.SOURCE$ marked as having Convertible or Lossy data in Csscan output" to find what objects are affected.
Note that you CANNOT "fix" tables like SYS.SOURCE$ itself, you need to recreate the objects who's text is stored in SYS.SOURCE$.

Do NOT truncate or export Data Dictionary objects itself unless this is said to be possible in Note 258904.1 .
When using full export/import into a new AL32UTF8 database:
When using export/import to a new database then "Lossy" in Data Dictionary objects is only relevant when it concerns "Application data".
The thing to check for is to see if there is no "lossy" in tables like SYS.SOURCE$ (package source code) / SYS.COM$ (comments on objects) / SYS.VIEW$ (view definitions) / SYS.COL$ (column names) or SYS.TRIGGER$ (triggers).
The reason beeing is simply that these Data Dictionary objects objects contain information about user objects or pl/sql code. If you have "convertible" there that's not a an issue.
For most conversions , if there is "lossy" it will be in SYS.SOURCE$.

3.b) Can I solve "lossy" user/application data in an other way?

yes, it is possible to:

remove the 'lossy" rows or update these "lossy" rows with "meaningful" data using US7ASCII ( A-Z,a-z, 0-9) characters using Oracle SqlDeveloper. Download the latest version fromhttp://www.oracle.com/technetwork/developer-tools/sql-developer/downloads/ . Do NOT use Toad, sqlplus, etc.
treat the "lossy" in user/application data as "convertible" (= need to export and delete/re-imp after csalter), if you exp / expdp the "lossy" data and imp/imdpd it into an AL32UTF8 database (which can be the current database after changing it to AL32UTF8) then the "lossy" characters will simply become the Unicode replacement character � . For example a string like "abcµef" stored in an US7ASCII db (US7ASCII does not define µ so the µ is "lossy") will become "abc�ef" . Note that "Lossy" will not give an error during exp/imp but often rows with a lot of lossy will be also "truncation" (and that needs to be adressed, see below).

4) Check for "Convertible" and "Truncation" data when going to AL32UTF8

If there was no "Lossy" in step 3 or you corrected the NLS_CHARACTERSET and started over then run csscan with the following syntax:

$ csscan \"sys/<syspassword>@<TNSalias> as sysdba\" FULL=Y TOCHAR=AL32UTF8 LOG=TOUTF8 CAPTURE=Y ARRAY=1000000 PROCESS=2

Note the usage of CAPTURE=Y, using CAPTURE=N (the default) will speed up csscan process itself and use far less space in the CSMIG tablespace but the select in point 6a) will then NOT list the "Convertible" User data.
The .txt file will always list the convertible user data regardless of the setting of CAPTURE.
For very big databases it might be an idea to run csscan first using CAPTURE=N as first scan to have an idea about the dataset and then, if needed, do table or user level scans to know what exact rows are convertible.
If there is a lot of "Convertible" Application data in your database then the creation of the .err file will take considerable amount of time when using CAPTURE=Y.
When creating the .err file csscan will fetch the first 30 bytes of each logged row in the .err file together with the rowid and why it's logged
Csscan will do this for
* "Truncation" and "Lossy" rows and "Convertible data dictionary" data if CAPTURE=N
* ALL "Convertible" (application data also) , "Truncation" and "Lossy" rows if CAPTURE=Y

Csscan will create 3 files :

toutf8.out a log of the output of csscan
toutf8.txt the Database Scan Summary Report
toutf8.err contains the rowid's of the Convertible and Lossy rows reported in toutf8.txt

When going to UTF8 or AL32UTF8 there should normally be NO entries under "Lossy" in toutf8.txt, because they should have been filtered out in step 3), if there is "Lossy" data then make sure you accept this data will become � and treat it as "convertible"

note: if you use this note not to go to UTF8 or AL32UTF8 but to go any other Multi byte characterset (ZHS16GBK, ZHT16MSWIN950, ZHT16HKSCS, ZHT16HKSCS31,KO16MSWIN949, JA16SJIS ...), then there CAN be "Lossy". This is data not known in the new characaterset and this then needs to be removed before going further or exp/imp needs to be used for all tables that have "lossy" - exp/imp will then remove this lossy.

If there are in the txt file:

Entries under "Truncation" then go to step 5)
Entries for "Convertible" and "Changeless" but no "Truncation" then goto step 6).
If you have NO entry's under the "Convertible", "Truncation" or "Lossy" (this is not often seen) and all data is reported as "Changeless" then proceed to step 7) .

To have an overview of the output and what it means please see Note 444701.1 Csscan output explained

Note: When using export/import and there is only need to export and import certain schema's into a new/other AL32UTF8 database it is also possible to use csscan only on the actual schema user or tables who will be exported. This will reduce the needed time for doing the csscan run by only checking the actual affected dataset.
Syntax examples are found in note 1297961.1 ORA-01401 / ORA-12899 While Importing Data In An AL32UTF8 / UTF8 (Unicode) Or Other Multibyte NLS_CHARACTERSET Database.

To avoid confusion: when using Csalter or alter database you NEED to use the FULL=Y syntax here in point 4)

5) Dealing with "Truncation" data.

As explained in Note 788156.1, characters may use more BYTES in AL32UTF8 then in the source characterset. Truncation data means this row won't fit in the current column definition once converted to AL32UTF8.

"Truncation" data is always also "Convertible" data, which means that whatever way you do the change, these rows have to be exported before the character set is changed and re-imported after the character set has changed. If you proceed with that without dealing with the truncation issue then the import will fail on these columns with "ORA-01401: inserted value too large for column" (or from 10g onwards "ORA-12899: value too large for column..." ) because the size of the data exceeds the maximum size of the column.

Note: sometimes there is also a list of tables at then end of the toutf8.txt under the header [Truncation Due To Character Semantics] , this is not the same as the truncation this section discusses.
When using Csalter/Alter database and/or (full) export/import the objects under the [Trunation Due To Character Semantics] header can be simply ignored. (for the good order, only under this header.....)

Truncation issues will always require some work, there are a number of ways to deal with them:

There are 2 main options:

5a) or shorten the data before export

A script that gives an idea on how to do this, based on the csscan result of the LAST run (Csscan will only keep ONE result set ) , is found in Note 1231922.1 Script to Solve Truncation in Csscan output by Shortening Data Before Conversion
Note of course that this will provoke data loss, so this not often a good alternative.

5b)or adapt the columns to fit the expansion of the data

For some columns it might be needed to change datatype:

* for CHAR data that becomes more than 2000 bytes after conversion to AL32UTF8 you need to:
change the CHAR column to VARCHAR2 before export on the source side
OR
pre create the CHAR column as VARCHAR2 on the import (AL32UTF8) side.

* for VARCHAR2 data that becomes more than 4000 bytes after conversion to AL32UTF8 you need to:
change the VARCHAR2 column to CLOB before export on the source side
OR
pre create the VARCHAR2 column as CLOB on the import (AL32UTF8) side. Pre-creating CLOB will only work when using DataPump , the "old" imp/exp tools will fail with IMP-00058: ORACLE error 1461 encountered , ORA-01461: can bind a LONG value only for insert into a LONG column

Note: the need for a datatype change is not often seen, but is possible. If the expansion in BYTES is bigger then the max datalength of the datatype , which is 2000 bytes for CHAR and 4000 bytes for VARCHAR2, then using CHAR semantics will not help

For most columns changing the column datatype "size" will be enough , you have again 2 choices:

*- Adapt the column size in BYTES for data that becomes NOT more than 2000 bytes (for CHAR) / 4000 bytes (for VARCHAR2) after conversion to AL32UTF8.
you can:
- change the column(s) size in BYTEs to the minimum need size before export on the source side
OR
- pre-create the table with enlarged column(s) size in BYTEs on the import (AL32UTF8) side

*- Change the columns to CHAR semantics for data that becomes NOT more than 2000 bytes (for CHAR) / 4000 bytes (for VARCHAR2) after conversion to AL32UTF8.
you can:
- change the column(s) to CHAR semantics before export on the source side
OR
- pre-create the table using CHAR semantics on the import (AL32UTF8) side

How to use CHAR semantics is discussed in note 144808.1 Examples and limits of BYTE and CHAR semantics usage (NLS_LENGTH_SEMANTICS)

You will need to adapt at least the reported "Truncation" columns, in general however it is better to change all columns to CHAR semantics to avoid confusion later on.
As linked in note 144808.1 , you can use a script that updates all columns from all tables: Note 313175.1 Changing columns to CHAR length semantics.

Typically :

when using Csalter/Alter database the columns are changed to CHAR semantics (or CLOB) or enlarged in BYTES after going to AL32UTF8 (= Running Csalter or Alter Database) but before importing the exported "Convertible/Truncation" data again.
when using full export import the tables are pre-created in the new database using CHAR semantics , CLOB or larger columns sizes in BYTES before importing the data.

When using full export/import the tables can be created in the new database upfront using for example the FULL=Y and CONTENT=METADATA_ONLY (and optionally SQLFILE) imdpd options or use dbms_metadata ( for example note 556823.1)

The import utility uses the exact semantics defined on the source table definition from the export dump file, not from the source or target database init.ora settings. If the source tables are defined with BYTE semantics then they will be created with BYTE semantics , even when importing into a database that has NLS_LENGTH_SEMANTICS set to CHAR in the init.ora/pfile/spfile.
In other words, setting NLS_LENGTH_SEMANTICS=CHAR in your init.ora will NOT change any table to CHAR semantics during import if those tables (!!!) are using BYTE on the source side.

To know how much the data expands you can:
* Or use this procedure:

Note that when using the csscan SUPPRESS parameter this procedure may give incomplete results (not all tables or not the correct minimal needed data size).

Sqlplus / as sysdba
SET serveroutput ON size 200000
DECLARE
newmaxsz NUMBER;
BEGIN
FOR rec IN
( SELECT DISTINCT u.owner_name,
u.table_name,
u.column_name ,
u.column_type,
u.owner_id,
u.table_id,
u.column_id,
u.column_intid
FROM csmv$errors u
WHERE u.error_type='EXCEED_SIZE'
ORDER BY u.owner_name,
u.table_name,
u.column_name
)
LOOP
SELECT MAX(cnvsize)
INTO newmaxsz
FROM csm$errors
WHERE usr# = rec.owner_id
AND obj# = rec.table_id
AND col# = rec.column_id
AND intcol# = rec.column_intid;
DBMS_OUTPUT.PUT_LINE(rec.owner_name ||'.'|| rec.table_name||' ('|| rec.column_name ||') - '|| rec.column_type ||' - '|| newmaxsz || ' Bytes');
END LOOP;
END;
/

This will give the minimal amount of BYTES the column needs to be to accommodate the expansion.

* Or check the Csscan output. You can find that in the .err file as "Max Post Conversion Data Size" For example, check in the .txt file wich table has "Truncation", let's assume you have there a row that say's:
-- snip from toutf8.txt

[Distribution of Convertible, Truncated and Lossy Data by Table]

USER.TABLE Convertible Truncation Lossy
--------------------- ---------------- ---------------- ----------------
...
SCOTT.TESTUTF8 69 6 0
...

then look in the toutf8.err file for "TESTUTF8" until the "Max Post Conversion Data Size" is bigger then the column size for that table.
-- snip from toutf8.err

User : SCOTT
Table : TESTUTF8
Column: ITEM_NAME
Type : VARCHAR2(80)
Number of Exceptions : 6
Max Post Conversion Data Size: 81

the max size after going to AL32UTF8 will be 81 bytes for this column.

Csalter/Alter Database Character Set has problems with functional indexes on /partitions using CHAR based columns. See point 7). If you have functional indexes / partitions you can only change those columns to CHAR semantics after the change to AL32UTF8. Any other table columns can be changed to CHAR semantics before going to AL32UTF8 if required.

Truncation in Data Dictionary objects is rare and will be solved by using the steps for "Convertible" Data Dictionary data.

While it's technically only needed to take action on the "Truncation" rows reported by Csscan it's still a good idea to consider using CHAR Semantics for every "Application" column or using explicit CHAR semantics declaration for variables in application code in an AL32UTF8 database.

6) Dealing with "Convertible" data.

here you choose depending on what is required/ you want to do
When you want to use Csalter/Alter Database Character Set and change the current database go to point 6a)
When you want to use (FULL) export/import into a new AL32UTF8 db or (FULL) export/import other AL32UTF8 db take the (full) export now and goto point 12).
Once any "Lossy" or "Truncation" is dealt with, (full) exp/imp into a new AL32UTF8 database can be used. This will take care of any "convertible" data.

When using export/import using the "old" Exp/Imp tools the NLS_LANG setting is simply AMERICAN_AMERICA.<source NLS_CHARACTERSET>.
Expdp/imdpd (datapump) does not use the NLS_LANG for data conversion. Note 227332.1 NLS considerations in Import/Export - Frequently Asked Questions

The rest of this note until step 12) will deal only with using Csalter/Alter Database Character Set combined with partial export/import.

6.a) "Convertible" Application Data:

When using Csalter/Alter Database Character Set all User / Application Data "Convertible" data needs to be exported and truncated/deleted. This means ALL data that is listed as "Convertible" in the [Application Data Conversion Summary] summary NEEDS to be exported and truncated.

Again, this is often misunderstood:

* if the "Convertible" data under [Application Data Conversion Summary] is NOT exported then Csalter will NOT run and fail with Unrecognized convertible date found in scanner result .
* if the "Convertible" data under [Application Data Conversion Summary] is NOT exported then when Using "Alter Database Character Set" this data will be CORRUPTED.

It is MANDATORY to export and truncate/delete ALL "Convertible" User / Application Data data .

10g and up: This will give a list of all "Application Data" tables who have "Convertible" (or also Truncation) and need to be exported and truncated/deleted before Csalter can be used:

Below select will only give all "Convertible" columns if CAPTURE=Y was used in point 4).
If you used CAPTURE=N (the default) then the resultset of below select will NOT list the "Convertible" only the "Truncation" data (if it exist).
The .txt file will always list the convertible user data regardless of the setting of CAPTURE.

Sqlplus / as sysdba
SELECT DISTINCT z.owner_name
|| '.'
|| z.table_name
|| '('
|| z.column_name
|| ') - '
|| z.column_type
|| ' - '
|| z.error_type
|| ' ' UserColumns
FROM csmig.csmv$errors z
WHERE z.owner_name NOT IN
(SELECT DISTINCT username FROM csmig.csm$dictusers
)
ORDER BY UserColumns
/

To check for constraint definitions on the tables before exporting and truncating them Note 1019930.6 Script: To report Table Constraints or Note 779129.1 HowTo show all recursive constraints leading to a table can be used.

There might however also "Oracle Provided" objects in this list ( Note 160861.1 Oracle Created Database Users: Password, Usage and Files ).

The most common are sample schema's 'HR', 'OE', 'SH', 'PM', 'IX', 'BI' and 'SCOTT' and/or APEX (Oracle Application Express) / HTMLDB 'FLOWS_xxxx' and 'APEX_xxxx' objects. The sample schema can be simply dropped, if APEX/HTMLDB is not used then this can also be removed using Note 558340.1 How to Uninstall Oracle HTML DB / Application Express from the Database? . (as said in point 2.C in this note) .

Also common are objects for the users 'SYSMAN' and/or 'PERFSTAT'.

These users are NOT considered by Csalter to be "Data Dictionary objects" hence convertible CLOB need also to be addressed for these columns, just like any "normal" application data. From database point of view there is no problem to export/truncate table/do the characterset change/import this "Oracle Provided" user data if there is no 'Truncation' and/or 'Lossy' but only 'Convertible' listed. Of course make sure that for example when exporting 'SYSMAN' user data the Database Control is stopped and stay stopped until the change is done.
SYSMAN convertible can also be solved by removing the dbcontrol registration , see Note 278100.1 How To Drop, Create And Recreate DB Control In A 10g Database (can also be used in an 11g db)

6.b) "Convertible" data in Data Dictionary objects:

The main challenge when using Csalter/Alter Database Character Set is , besides exporting and truncating all "Convertible" User/Application Data , most of the time "Convertible" data in Data Dictionary objects.

* For 8i/9i ALL "Convertible" data in the Data Dictionary objects listed under the "[Data Dictionary Conversion Summary]" header in the toutf8.txt needs to be addressed.

* For 10g and up you do not need to take action on "convertible" Data Dictionary CLOB data. Convertible CLOB in Data Dictionary objects is handled by Csalter.
Convertible CHAR, VARCHAR2 and LONG listed under the "[Data Dictionary Conversion Summary]" header in the toutf8.txt however do need action.

10g and up: If the next select gives NO rows there is no action to take on Data Dictionary objects, even if there is "Convertible" CLOB listed under the "[Data Dictionary Conversion Summary]" header in the toutf8.txt
If it does give rows, see Note 258904.1 Solving Convertible or Lossy data in Data Dictionary objects reported by Csscan when changing the NLS_CHARACTERSET

Sqlplus / as sysdba
SELECT DISTINCT z.owner_name
|| '.'
|| z.table_name
|| '('
|| z.column_name
|| ') - '
|| z.column_type
|| ' - '
|| z.error_type
|| ' ' NotHandledDataDictColumns
FROM csmig.csmv$errors z
WHERE z.owner_name IN
(SELECT DISTINCT username FROM csmig.csm$dictusers
) minus
SELECT DISTINCT z.owner_name
|| '.'
|| z.table_name
|| '('
|| z.column_name
|| ') - '
|| z.column_type
|| ' - '
|| z.error_type
|| ' ' DataDictConvCLob
FROM csmig.csmv$errors z
WHERE z.error_type ='CONVERTIBLE'
AND z.column_type = 'CLOB'
AND z.owner_name IN
(SELECT DISTINCT username FROM csmig.csm$dictusers
)
ORDER BY NotHandledDataDictColumns
/

Note 258904.1 Solving Convertible or Lossy data in Data Dictionary objects reported by Csscan when changing the NLS_CHARACTERSET
has above (and other) selects that gives a better overview then the Csscan *.txt file output on what objects need action and how to solve common seen "Convertible" data for Data Dictionary columns.
If there are Data Dictionary columns in your Csscan output that are not listed in Note 258904.1 please log a SR if you need help.

If you do log a SR about what to do please DO provide the output of above select.
Do NOT truncate or export Data Dictionary objects itself unless this is said to be possible in Note 258904.1

This note my be useful to identify Oracle created users in your database Note 160861.1 Oracle Created Database Users: Password, Usage and Files. This note may also be useful: Note 472937.1 Information On Installed Database Components and Schema's.
To remove "Convertible" out of an Intermedia / Oracle Text Index (after it has been removed from the table) please see Note 176135.1

7) Before using Csalter / Alter Database Character Set check the database for:

7.a) Partitions using CHAR semantics:

Sqlplus / as sysdba
SELECT c.owner,
c.table_name,
c.column_name,
c.data_type,
c.char_length
FROM all_tab_columns c,
all_tables t
WHERE c.owner = t.owner
AND c.table_name = t.table_name
AND c.char_used = 'C'
AND t.partitioned ='YES'
AND c.table_name NOT IN
(SELECT table_name FROM all_external_tables
)
AND c.data_type IN ('VARCHAR2', 'CHAR')
ORDER BY 1,2
/

If this select gives rows back then the change to AL32UTF8 will fail with "ORA-14265: data type or length of a table subpartitioning column may not be changed" or " ORA-14060: data type or length of a table partitioning column may not be changed" if those columns using CHAR semantics are used as partitioning key or subpartition key seen the change to AL32UTF8 will adapt the actual byte length of CHAR semantic columns ( data_lengh in all_tab_columns).
If there are columns using CHAR semantics used as partitioning key or subpartition key the partitioned tables need to be exported, dropped and , after the change, imported. An other solution is to temporarily go back to BYTE semantics for these columns before the change to AL32UTF8 and then go back to CHAR semantics afterwards..
Note 330964.1 provides more background on this issue.

7.b) Function , Domain or Joined indexes on CHAR semantics columns.

7.b.1) Functional or domain indexes on columns using CHAR semantics

Sqlplus / as sysdba
SELECT owner,
index_name,
table_owner,
table_name,
status,
index_type
FROM dba_indexes
WHERE index_type IN ('DOMAIN','FUNCTION-BASED NORMAL','FUNCTION-BASED NORMAL/REV','FUNCTION-BASED BITMAP')
AND table_name IN
(SELECT UNIQUE (x.table_name) FROM dba_tab_columns x WHERE x.char_used ='C' AND x.data_type IN ('CHAR','VARCHAR2'))
ORDER BY 1,2
/

If this gives rows back then the change to AL32UTF8 will fail with "ORA-30556: functional index is defined on the column to be modified" or with "ORA-02262: ORA-904 occurs while type-checking column default value expression" . If there are functional or domain indexes on columns using CHAR semantics the index need to be dropped and recreated after the change.
Note that a disable will not be enough.The DDL of all those indexes can be found using:

Sqlplus / as sysdba
SET LONG 2000000
SET pagesize 0
EXECUTE dbms_metadata.set_transform_param(dbms_metadata.session_transform,'STORAGE',FALSE);
SELECT dbms_metadata.get_ddl('INDEX',index_name,owner)
FROM dba_indexes
WHERE index_type IN ('DOMAIN','FUNCTION-BASED NORMAL','FUNCTION-BASED NORMAL/REV','FUNCTION-BASED BITMAP')
AND table_name IN
(SELECT UNIQUE (x.table_name) FROM dba_tab_columns x WHERE x.char_used ='C' AND x.data_type IN ('CHAR','VARCHAR2'));
EXECUTE dbms_metadata.set_transform_param(dbms_metadata.session_transform,'DEFAULT')

7.b.2) Join indexes on columns using CHAR semantics

Sqlplus / as sysdba
SELECT owner,
index_name,
table_owner,
table_name,
status,
index_type
FROM dba_indexes
WHERE table_name IN
(SELECT UNIQUE (object_name) FROM dba_objects WHERE object_id IN
(SELECT UNIQUE obj# FROM sys.col$ WHERE property='8454440' ))
AND table_name IN
(SELECT UNIQUE (table_name) FROM dba_tab_columns WHERE char_used ='C' AND data_type IN ('CHAR','VARCHAR2'))
ORDER BY 1,2
/

If this gives rows back then the change to AL32UTF8 will fail with "ORA-54028: cannot change the HIDDEN/VISIBLE property of a virtual column" .
If there are join indexes and those columns are using CHAR semantics the index need to be dropped and recreated after the change.
Note that a disable will not be enough.
The DDL of all those indexes can be found using:

Sqlplus / as sysdba
SET LONG 2000000
SET pagesize 0
EXECUTE dbms_metadata.set_transform_param(dbms_metadata.session_transform,'STORAGE',FALSE);
SELECT dbms_metadata.get_ddl('INDEX',index_name,owner)
FROM dba_indexes
WHERE table_name IN
(SELECT UNIQUE (object_name) FROM dba_objects WHERE object_id IN (SELECT UNIQUE obj# FROM sys.col$ WHERE property='8454440' ))
AND table_name IN
(SELECT UNIQUE (x.table_name) FROM dba_tab_columns x WHERE x.char_used ='C' AND x.data_type IN ('CHAR','VARCHAR2'));
EXECUTE dbms_metadata.set_transform_param(dbms_metadata.session_transform,'DEFAULT')

There is a bug in 11g releases before 11.2.0.2 ( 11.2.0.1, 11.10.7 and 11.1.0.6) that causes the "drop index" command to not drop the hidden columns for bitmap join indexes, resulting in ORA-54028 during Csalter , even if the indexes are dropped.
The workaround is to make a copy of the data via online redefinition and dropping the old tables.Dropping the old tables will drop all columns including the orphaned columns

7.c) SYSTIMESTAMP in the DEFAULT value clause for tables using CHAR semantics.

Sqlplus / as sysdba
SET serveroutput ON
BEGIN
FOR rec IN
(SELECT owner,
table_name,
column_name,
data_default
FROM dba_tab_columns
WHERE char_used='C'
)
loop
IF upper(rec.data_default) LIKE '%TIMESTAMP%' THEN
dbms_output.put_line(rec.owner ||'.'|| rec.table_name ||'.'|| rec. column_name);
END IF;
END loop;
END;
/

This will give ORA-604 error occurred at recursive SQL level %s , ORA-1866 the datetime class is invalid during the change to AL32UTF8.
The workaround is to temporary change affected tables to use a DEFAULT NULL clause eg: ALTER TABLE tab MODIFY ( col ... DEFAULT NULL NOT NULL );
After the character set change the default clause can be restored.

7.d) Clusters using CHAR semantics.

Sqlplus / as sysdba
SELECT owner,
object_name
FROM all_objects
WHERE object_type = 'CLUSTER'
AND object_name IN
(SELECT UNIQUE (table_name) FROM dba_tab_columns WHERE char_used ='C'
)
ORDER BY 1,2
/

If this gives rows back then the change will fail with "ORA-01447: ALTER TABLE does not operate on clustered columns". Those clusters need to be dropped and recreated after the change.

7.e) Unused columns using CHAR semantics

Sqlplus / as sysdba
SELECT owner,
table_name
FROM dba_unused_col_tabs
WHERE table_name IN
(SELECT UNIQUE (table_name) FROM dba_tab_columns WHERE char_used ='C'
)
ORDER BY 1,2
/

Unused columns using CHAR semantics will give an ORA-00604: error occurred at recursive SQL level 1 with an "ORA-00904: "SYS_C00002_09031813:50:03$": invalid identifier". Note that the "SYS_C00002_09031813:50:03$" will change for each column. These unused columns need to be dropped.

ALTER TABLE table_name DROP UNUSED COLUMNS
/

7.f) Check that you have enough room to run Csalter or to import the "Convertible" data again afterwards.

In 10g and up verify at least in toutf8.txt/toutf8fin.txt the "Expansion" column found under [Database Size] and check you have at least 2 times the expansion listed for SYSTEM tablespace free.
This is the size needed for Csalter to update Data Dictionary CLOB. Otherwise you will see errors like "ORA-01691: unable to extend lob segment SYS.SYS_LOB0000058943C00039$$ by 1598 in tablespace SYSTEM " during Csalter.
In general (for any version) it's a good idea to check the "Expansion" column and see that there is enough space in each listed tablespace.
The Expansion column gives an estimation on how much more place you need in that tablespace when going to the new characterset.
The Tablespace Expansion for tablespace X is calculated as the grand total of the differences between the byte length of a string converted to the target character set and the original byte length of this string over all strings scanned in tables in X.
The distribution of values in blocks, PCTFREE, free extents, etc., are not taken into account.

7.g) Objects in the recyclebin (10g and up)

If the recyclebin was not disabled make sure there are no objects in the recyclebin

Sqlplus / as sysdba
SELECT OWNER, ORIGINAL_NAME, OBJECT_NAME, TYPE FROM dba_recyclebin ORDER BY 1,2
/

If there are objects in the recyclebin then perform

Sqlplus / as sysdba
PURGE DBA_RECYCLEBIN
/

This will remove recyclebin objects, otherwise during CSALTER an ORA-38301 may be seen.

7.h) Check if the compatible parameter is set to your base version

Sqlplus / as sysdba
sho parameter compatible

Do not try to migrate for example an 10g database with compatible=9.2

8) After any "Lossy" is solved, "Truncation" data is planned to be addressed and/or "Convertible" exported / truncated / addressed and point 7) is ok run Csscan again as final check.

$ csscan \"sys/<syspassword>@<TNSalias> as sysdba\" FULL=Y TOCHAR=AL32UTF8 LOG=TOUTF8FIN CAPTURE=Y ARRAY=1000000 PROCESS=2

8.a) For 8i/9i the Csscan output needs to be "Changeless" for all CHAR, VARCHAR2, CLOB and LONG data (Data Dictionary and User/Application data).

In order to use "Alter Database Character Set" you need to see in the toutf8fin.txt file under [Scan Summary] this message::

All character type data in the data dictionary remain the same in the new character set
All character type application data remain the same in the new character set

If this is NOT seen in the toutf8fin.txt then this means something is missed or not all steps in this note are followed. Ignoring this WILL lead to datacorruption.
If this is seen, then continue in step 9)

8.b) For 10g and up the Csscan output needs to be

* "Changeless" for all CHAR VARCHAR2, and LONG data (Data Dictionary and User/Application data )
* "Changeless" for all User/Application data CLOB
* "Changeless" and/or "Convertible" for all Data Dictionary CLOB

And in order to run Csalter you need to see in the toutf8fin.txt file under [Scan Summary] these 2 messages:

All character type application data remain the same in the new character set

and under [Data Dictionary Conversion Summary] this message:

The data dictionary can be safely migrated using the CSALTER script

You need to see BOTH, the Csalter message alone is NOT enough.

Note the difference between "All character type application data remain the same in the new character set" and "All character type application data are convertible to the new character set".
If you see "All character type application data are convertible to the new character set" then Csalter will NOT work.

If you run Csalter without these conditions met then you will see messages like " Unrecognized convertible data found in scanner result " or "Exceptional data found in scanner result" and " Checking or Converting phrase did not finish successfully No database (national) character set will be altered " in the Csalter output.
Csalter will NOT work and say " CSALTER finished unsuccessfully."

Before you can run Csalter you need
* to have above 2 (BOTH!) messages in the .txt file.
* to have that FULL=Y run been completed in the 7 days prior to running Csalter. So you can only run Csalter in the 7 days following the "Clean" FULL=Y scan.
* to be sure the session running Csalter is the ONLY session connected to the database, otherwise Csalter will give this warning 'Sorry only one session is allowed to run this script'.

9) Summary of steps needed to use Alter Database Character Set / Csalter:

9.a) For 9i and lower:

9.a.1) Export all the "Convertible" User/Application Data data (make sure that the character set part of the NLS_LANG is set to the current database character set during the export session)
9.a.2) If you have "convertible" data for Data Dictionary (sys) objects like SYS.METASTYLESHEET, SYS.RULE$ or SYS.JOB$ then follow the following note for those objects: Note 258904.1 Convertible data in Data Dictionary: Workarounds when changing character set.
9.a.3) Truncate the exported "Convertible" User/Application Data .
9.a.4) Run csscan again with the syntax of point 8) to verify you only have "changeless" Data Dictionary and User/Application Data left
9.a.5) If this now reports only Changeless data

In order to use "Alter Database Character Set" you need to see in the toutf8fin.txt file under [Scan Summary] this message::

All character type data in the data dictionary remain the same in the new character set
All character type application data remain the same in the new character set

If this is NOT seen in the toutf8fin.txt then this means something is missed or not all steps in this note are followed.
Ignoring this WILL lead to datacorruption.

then proceed to step 10.a), otherwise do the same again for the rows you've missed out.
9.a.6) Adapt any columns if needed to avoid "Truncation"
9.a.7) Import the exported data again.

9.b) For 10g and up:

9.b.1) Export all the "Convertible" User/Application Data (make sure that the character set part of the NLS_LANG is set to the current database character set during the export session)
9.b.2) Fix any non-CLOB Data Dictionary "Convertible" using Note 258904.1,
All "9i only" fixes in Note 258904.1 Convertible data in Data Dictionary: Workarounds when changing character set should NOT be done in 10g and up - those are sys-CLOB objects handled by Csalter.
9.b.3) Truncate the exported "Convertible" User/Application Data tables.
9.b.4) Run csscan with the syntax of point 8) to verify you only have "convertible" CLOB in the Data Dictionary and all other data is "Changeless".

When using Csscan in 10g and up the toutf8.txt or toutf8fin.txt need to contain this before doing step 9):

The data dictionary can be safely migrated using the CSALTER script

and

All character type application data remain the same in the new character set

If this is NOT seen in the toutf8fin.txt then Csalter will NOT work and this means something is missed or not all steps in this note are followed.

9.b.5) If this is now correct then proceed to step 10.b), otherwise do the same again for the rows you've missed out.
9.b.6) Adapt any columns if needed to avoid "Truncation"
9.b.7) Import the exported data again.

10) Running Csalter ( 10g and 11g) or Alter Database Character Set (8i and 9i)

Please perform a backup of the database. Check the backup. Double-check the backup.

10.a) The steps for 8i and 9i ONLY

Do not do steps 10.a.1) and 10.a.2) on a 10g or 11g or higher system. For 10g and up go to step 10.b)

Shutdown the listener and any application that connects locally to the database.
There should be only ONE connection the database during the WHOLE time and that's the sqlplus session where you do the change.
This select

SELECT sid, serial#, username, status,  osuser, machine, process, program FROM v$session WHERE username IS NOT NULL;

should only give ONE row - your sqlplus connection.

10.a.1) Make sure the PARALLEL_SERVER (8i) and CLUSTER_DATABASE parameter are set to false or it is not set at all. When using RAC you will need to start the database in single instance with CLUSTER_DATABASE = FALSE

Sqlplus / as sysdba
sho parameter CLUSTER_DATABASE
sho parameter PARALLEL_SERVER

10.a.2) Execute the following commands in sqlplus connected as "/ AS SYSDBA":

Sqlplus / as sysdba
SPOOL Nswitch.log
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER SYSTEM ENABLE RESTRICTED SESSION;
ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0;
ALTER SYSTEM SET AQ_TM_PROCESSES=0;
ALTER DATABASE OPEN;
ALTER DATABASE CHARACTER SET INTERNAL_USE AL32UTF8;
SHUTDOWN IMMEDIATE;
-- You NEED to restart the database before doing ANY operation on this database
-- exit this session now do not use the session that did alter database for other operations
EXIT
-- reconnect to the database and start the database
Sqlplus / as sysdba
STARTUP;
-- in 8i you need to do the startup/shutdown 2 times
SHUTDOWN;
STARTUP;

An alter database takes typically only a few minutes or less, it depends on the number of columns in the database, not the amount of data. Without the INTERNAL_USE you get a ORA-12712: new character set must be a superset of old character set

10.a.3) Restore the PARALLEL_SERVER (8i) and CLUSTER_DATABASE parameter if necessary and start the database. For RAC start the other instances.

WARNING WARNING WARNING

Do NEVER use "INTERNAL_USE" unless you did follow the guidelines STEP BY STEP here in this note and you have a good idea what you are doing.

Do NEVER use "INTERNAL_USE" to "fix" display problems, but follow Note:179133.1 The correct NLS_LANG in a Windows Environment or Note:264157.1 The correct NLS_LANG setting in Unix Environments

If you use the INTERNAL_USE clause on a database where there is data listed as convertible without exporting that data then the data will be corrupted by changing the database character set !

goto point 10.c) , do not do point 10.b)

10.b) The steps for version 10g and 11g

Csalter.plb needs to be used within 7 days after the Csscan run, otherwise you will get a 'The CSSCAN result has expired' message.

Shutdown the listener and any application that connects locally to the database.

There should be only ONE connection the database during the WHOLE time and that's the sqlplus session where the change is done.
This select

SELECT sid, serial#, username, status,  osuser, machine, process, program FROM v$session WHERE username IS NOT NULL;

should only give ONE row - your sqlplus connection. If more then one session is connected Csalter will fail and report "Sorry only one session is allowed to run this script".
If you are using RAC you will need to start the database in single instance with CLUSTER_DATABASE = FALSE

Run in sqlplus connected as "/ AS SYSDBA":

Sqlplus / as sysdba
-- Make sure the CLUSTER_DATABASE parameter is set
-- to false or it is not set at all.
-- If you are using RAC you will need to start the database in single instance
-- with CLUSTER_DATABASE = FALSE
sho parameter CLUSTER_DATABASE
--  if you are using spfile note the
sho parameter job_queue_processes
sho parameter aq_tm_processes
-- (this is Bug 6005344 fixed in 11g )
-- then do

shutdown
startup restrict
SPOOL Nswitch.log
PURGE DBA_RECYCLEBIN
/

-- do this alter system or you might run into "ORA-22839: Direct updates on SYS_NC columns are disallowed"
-- This is only needed in 11.1.0.6, fixed in 11.1.0.7, not applicable to 10.2 or lower
-- ALTER SYSTEM SET EVENTS '22838 TRACE NAME CONTEXT LEVEL 1,FOREVER';

-- then run Csalter.plb:

 @?/rdbms/admin/csalter.plb

-- Csalter will aks confirmation - do not copy paste the whole actions on one time
-- sample Csalter output:

-- 3 rows created.
...
-- This script will update the content of the Oracle Data Dictionary.
-- Please ensure you have a full backup before initiating this procedure.
-- Would you like to proceed (Y/N)?y
-- old 6: if (UPPER('&conf') <> 'Y') then
-- New 6: if (UPPER('y') <> 'Y') then
-- Checking data validility...
-- begin converting system objects

-- PL/SQL procedure successfully completed.

-- Alter the database character set...
-- CSALTER operation completed, please restart database

-- PL/SQL procedure successfully completed.
...
-- Procedure dropped.

-- if you are using spfile then you need to also

-- ALTER SYSTEM SET job_queue_processes=<original value> SCOPE=BOTH;
-- ALTER SYSTEM SET aq_tm_processes=<original value> SCOPE=BOTH;
SHUTDOWN IMMEDIATE;
-- You NEED to restart the database before doing ANY operation on this database
-- exit this session now. do not use the session where Csalter was runned for other operations.
EXIT
-- reconnect to the database and start the database
Sqlplus / as sysdba
STARTUP;
 

and the database will be AL32UTF8.

Note: in 10.1 csalter is asking for "Enter value for 1: ".

-- Would you like to proceed ?(Y/N)?Y
-- old 5: if (UPPER('&conf') <> 'Y') then
-- new 5: if (UPPER('Y') <> 'Y') then
-- Enter value for 1:

-> simply hit enter.

goto 10.c)

10.c) Check the Csalter/alter database output and the alert.log for errors, some Csalter messages do NOT have an ORA- number.

10g and up: if you see see messages in the Csalter output like " Unrecognized convertible data found in scanner result " or " Checking or Converting phrase did not finish successfully No database (national) character set will be altered " then you LAST csscan was not "clean" and you missed some steps in this note.

All versions : If during Csalter/alter database you see errors after the

Alter the database character set...

message in the alert.log or during alter database command (8i or 9i) / Csalter like for example:

ORA-41400: Bind character set (31) does not match database character set (873)

Note: the Character set bind may differ (= other number than 31 ):
or an other error like

ORA-00604: error occurred at recursive SQL level 1
or
ORA-604 signalled during: ALTER DATABASE CHARACTER SET AL32UTF8...

then the NLS_CHARACTERSET is NOT changed (fully) after running Csalter / alter database

Most common causes are:
* or another process was connected to the database AFTER Csalter / aleter database was started , so Csalter / alter database did not saw the second connection at startup. Do NOT connect with ANY session during Csalter / alter database.
* or the checks in step 7 of this note where NOT done
* or some other problem occured.

If errors are seen then the NLS_CHARACTERSET will NOT be changed succesfully.
Check your Alert.log for the exact error that was seen , remove the cause and run Csalter or alter database (9i) again.

If nothing obvious is found to be wrong please log a SR and provide the last csscan files, the FULL output of Csalter or alter database, the alert.log and any trace files generated.

You can do an additional check after the change of the NLS_CHARACTERSET

Sqlplus / as sysdba
select distinct(nls_charset_name(charsetid)) CHARACTERSET,
decode(type#, 1, decode(charsetform, 1, 'VARCHAR2', 2, 'NVARCHAR2','UNKNOWN'),
9, decode(charsetform, 1, 'VARCHAR', 2, 'NCHAR VARYING', 'UNKNOWN'),
96, decode(charsetform, 1, 'CHAR', 2, 'NCHAR', 'UNKNOWN'),
8, decode(charsetform, 1, 'LONG', 'UNKNOWN'),
112, decode(charsetform, 1, 'CLOB', 2, 'NCLOB', 'UNKNOWN')) TYPES_USED_IN
from sys.col$ where charsetform in (1,2) and type# in (1, 8, 9, 96, 112)
order by CHARACTERSET, TYPES_USED_IN
/

this should give 7 rows, one for each datatype and 2 charactersets: the NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET.

-- example of correct output for a AL32UTF8 NLS_CHARACTERSET and AL16UTF16 NLS_NCHAR_CHARACTERSET database:

CHARACTERSET TYPES_USED_IN
----------------- ---------------------------------------
AL32UTF8 CHAR
AL32UTF8 CLOB
AL32UTF8 LONG
AL32UTF8 VARCHAR2
AL16UF16 NCHAR
AL16UF16 NCLOB
AL16UF16 NVARCHAR2

If this select returns more than 7 rows (for example 2 different character sets for VARCHAR2) then there is something wrong and you will have errors in the alert.log during the conversion.

11) Reload the data pump packages after a change to AL32UTF8 in 10g and up.

For 10g or up the datapump packages need to be reloaded after a conversion to AL32UTF8. Note 430221.1 - How To Reload Datapump Utility EXPDP/IMPDP

In some cases exp (the original export tool) fails in 10g after changing to AL32UTF8. please see Note 339938.1 Full Export From 10.2.0.1 Aborts With EXP-56 ORA-932 (Inconsistent Datatypes) EXP-0

12) Import the exported data .

Note that if you had in the Csscan done in point 4) ONLY "Changeless" and NO "Convertible" (this is not often seen) then there is no data to import when using Csalter/Alter database.

12.a) When using Csalter/Alter database to go to AL32UTF8 and there was "Truncation" data in the csscan done in point 4:

Truncation data is always ALSO "Convertible", it's "Convertible" data that needs action before you can import this again. If there was "Truncation" then typically this is handled by pre-creating the tables using CHAR semantics or enlarged column size in bytes after changing the database to AL32UTF8 using Csalter/Alter database and before starting the import.

Note that simply setting NLS_LENGTH_SEMANTICS=CHAR in the init.ora will NOT work to go to CHAR semantics.

Once the measures for solving the "Truncation" are in place you can then import the "Truncation/Convertible" data.

Set the parameter BLANK_TRIMMING=TRUE to avoid the problem documented in Note 779526.1 CSSCAN does not detect data truncation for CHAR datatype - ORA-12899 when importing
Use the IGNORE=Y parameter for imp or the TABLE_EXISTS_ACTION=TRUNCATE option for Impdp to import the data into the pre-created tables.

Import the exported data back into the -now AL32UTF8- database, when using export/import using the "old" Exp/Imp tools the NLS_LANG setting is simply AMERICAN_AMERICA.<source NLS_CHARACTERSET> OR AMERICAN_AMERICA.AL32UTF8. Both are correct.
Expdp/imdpd does not use the NLS_LANG for data conversion.
Once the data is imported goto step 13.

12.b) When using (Full) export/import to go to a new/other AL32UTF8 database and there was "Truncation" data in the csscan done in point 4:

Truncation data is always ALSO "Convertible", it's "Convertible" data that needs action before you can import this again. If there was "Truncation" then typically this is handled by pre-creating the tables using CHAR semantics or enlarged column size in bytes after creating the new AL32UTF8 and before starting the import.

Note that simply setting NLS_LENGTH_SEMANTICS=CHAR in the init.ora will NOT work to go to CHAR semantics.

Once the measures for solving the "Truncation" are in place you can then import the "Truncation/Convertible" data.

Create a new AL32UTF8 database if needed and set the init.ora parameter BLANK_TRIMMING=TRUE to avoid the problem documented in Note 779526.1 CSSCAN does not detect data truncation for CHAR datatype - ORA-12899 when importing
Use the IGNORE=Y parameter for imp or the TABLE_EXISTS_ACTION=TRUNCATE option for Impdp to import the data into the pre-created tables.

Import the exported data into the new AL32UTF8 database, when using export/import using the "old" Exp/Imp tools the NLS_LANG setting is simply AMERICAN_AMERICA.<source NLS_CHARACTERSET> OR AMERICAN_AMERICA.AL32UTF8. Both are correct.
Expdp/imdpd does not use the NLS_LANG for data conversion.
Once the data is imported goto step 13.

12.c) When using Csalter/Alter database to go to AL32UTF8 and there was NO "Truncation" data, only "Convertible" and "Changeless" in the csscan done in point 4:

Set the init.ora parameter BLANK_TRIMMING=TRUE to avoid the problem documented in Note 779526.1 CSSCAN does not detect data truncation for CHAR datatype - ORA-12899 when importing

Import the exported data back into the -now AL32UTF8- database, when using export/import using the "old" Exp/Imp tools the NLS_LANG setting is simply AMERICAN_AMERICA.<source NLS_CHARACTERSET> OR AMERICAN_AMERICA.AL32UTF8. Both are correct.
Expdp/imdpd does not use the NLS_LANG for data conversion.
Once the data is imported goto step 13.

12.d) When using (full) export/import to go to a new/other AL32UTF8 database and there was NO "Truncation" data, only "Convertible" and "Changeless" in the csscan done in point 4:

Create a new AL32UTF8 database if needed and set the init.ora parameter BLANK_TRIMMING=TRUE to avoid the problem documented in Note 779526.1 CSSCAN does not detect data truncation for CHAR datatype - ORA-12899 when importing

Import the exported data into the new AL32UTF8 database, when using export/import using the "old" Exp/Imp tools the NLS_LANG setting is simply AMERICAN_AMERICA.<source NLS_CHARACTERSET> OR AMERICAN_AMERICA.AL32UTF8. Both are correct.
Expdp/imdpd does not use the NLS_LANG for data conversion.
Once the data is imported goto step 13.

13) Check your data and final things:

Use a correctly configured client or Oracle SQL Developer / iSqlplus and verify you data. Note 788156.1 AL32UTF8 / UTF8 (Unicode) Database Character Set Implications.

For RAC restore the CLUSTER_DATABASE parameter, remove the BLANK_TRIMMING=TRUE parameter if needed and restart the instance(s).

The Csmig user can also be dropped.

Sqlplus / as sysdba
SQL> drop user csmig cascade;

If you did not use CHAR semantics for all CHAR and VARCHAR2 columns and pl/sql variables it might be an idea to consider this.
Check for invalid objects as a result of the exp/imp operations.

Sqlplus / as sysdba
SELECT owner,
object_name,
object_type,
status
FROM dba_objects
WHERE status ='INVALID'
/If there are any invalid objects run utlrp.sql

SQL> @?/rdbms/admin/utlrp.sql

It is also highly recommended to collect new stats for your database

Sqlplus / as sysdba
-- where x is nr of cpu cores
EXEC DBMS_STATS.GATHER_DATABASE_STATS(DEGREE=> 'x')

Wednesday, 17 July 2013

NLS_CHARACTERSET for BOTH ARABIC and FRENCH

Applies to:

Purpose

Scope

Details

1) General remarks on going to AL32UTF8

1.a) Prerequisites:

1.b) When changing an Oracle Applications Database:

1.c) When to use (full) export / import and when to use Alter Database Character Set / Csalter?

1.d) When using Expdp/Impdp (DataPump)

1.e) Using Alter Database Character Set on 9i

1.f) What about Physical / Logical Standby databases?

1.G) How to test the steps in this note upfront?

2) Check the source database for unneeded or problematic objects:

2.a) Invalid objects.

2.b) Orphaned Datapump master tables (10g and up)

2.c) Unneeded sample schema/users.

2.d) Leftover Temporary tables using CHAR semantics.

2.e) Make sure your database is in good health.

3) Check the Source database for "Lossy" (invalid code points in the current source character set).

3.a) How to "rescue" lossy data?

3.b) Can I solve "lossy" user/application data in an other way?

4) Check for "Convertible" and "Truncation" data when going to AL32UTF8

5) Dealing with "Truncation" data.

5a) or shorten the data before export

5b)or adapt the columns to fit the expansion of the data

6) Dealing with "Convertible" data.

6.a) "Convertible" Application Data:

6.b) "Convertible" data in Data Dictionary objects:

7) Before using Csalter / Alter Database Character Set check the database for:

7.a) Partitions using CHAR semantics:

7.b) Function , Domain or Joined indexes on CHAR semantics columns.

7.b.1) Functional or domain indexes on columns using CHAR semantics

7.b.2) Join indexes on columns using CHAR semantics

7.c) SYSTIMESTAMP in the DEFAULT value clause for tables using CHAR semantics.

7.d) Clusters using CHAR semantics.

7.e) Unused columns using CHAR semantics

7.f) Check that you have enough room to run Csalter or to import the "Convertible" data again afterwards.

7.g) Objects in the recyclebin (10g and up)

7.h) Check if the compatible parameter is set to your base version

8) After any "Lossy" is solved, "Truncation" data is planned to be addressed and/or "Convertible" exported / truncated / addressed and point 7) is ok run Csscan again as final check.

8.a) For 8i/9i the Csscan output needs to be "Changeless" for all CHAR, VARCHAR2, CLOB and LONG data (Data Dictionary and User/Application data).

8.b) For 10g and up the Csscan output needs to be

9) Summary of steps needed to use Alter Database Character Set / Csalter:

9.a) For 9i and lower:

9.b) For 10g and up:

10) Running Csalter ( 10g and 11g) or Alter Database Character Set (8i and 9i)

10.a) The steps for 8i and 9i ONLY

10.b) The steps for version 10g and 11g

10.c) Check the Csalter/alter database output and the alert.log for errors, some Csalter messages do NOT have an ORA- number.

11) Reload the data pump packages after a change to AL32UTF8 in 10g and up.

12) Import the exported data .

12.a) When using Csalter/Alter database to go to AL32UTF8 and there was "Truncation" data in the csscan done in point 4:

12.b) When using (Full) export/import to go to a new/other AL32UTF8 database and there was "Truncation" data in the csscan done in point 4:

12.c) When using Csalter/Alter database to go to AL32UTF8 and there was NO "Truncation" data, only "Convertible" and "Changeless" in the csscan done in point 4:

12.d) When using (full) export/import to go to a new/other AL32UTF8 database and there was NO "Truncation" data, only "Convertible" and "Changeless" in the csscan done in point 4:

13) Check your data and final things:

No comments:

Post a Comment