Mapping Tables for Neoview Character Sets
For information about an algorithm you can use to map unlinked GB18030 characters in the range 0x90308130
through 0xE3329A35 to their Unicode values, see “Algorithm for Mapping Unlinked GB18030 Characters to Unicode
Values”.
Algorithm for Mapping Unlinked GB18030 Characters to Unicode Values
For the approximately 1,048,576 GB18030 characters in the range 0x90308130 through 0xE3329A35, the algorithm to
map the characters to Unicode values is as follows:
Let GBC = some GB18030 character in the range 0x90308130 - 0xE3329A35
Then the UCS4 value (also known as UTF32 values) is given by the formula:
UCS4val = 0x10000 + ( GBC % 0x10 ) +
( ( ( ( GBC & 0x0000FF00 ) >> 8 ) - 0x81 ) * 10 ) +
( ( ( ( GBC & 0x00FF0000 ) >> 16 ) - 0x30 ) * 1260 ) +
( ( ( ( GBC & 0xFF000000 ) >> 24 ) - 0x90 ) * 12600 )
The UTF16 value can then be calculated as follows:
The first 16-bit word of the two-word UTF16 value equals:
0xD800 + ( ( UCS4val - 0x10000 ) / 1024 )
The second 16-bit word of the two-word UTF16 value equals:
= 0xDC00 + ( ( UCS4val - 0x10000 ) % 1024 )
The UTF8 value can be calculated as follows:
The first byte of the 4-byte UTF8 value equals:
0xF0 + ( ( UCS4val >> 18 ) % 8 )
The second byte of the 4-byte UTF8 value equals:
0x80 + ( ( UCS4val >> 12 ) % 64 )
The third byte of the 4-byte UTF8 value equals:
0x80 + ( ( UCS4val >> 6 ) % 64 )
The fourth byte of the 4-byte UTF8 value equals:
0x80 + ( ( UCS4val >> 0 ) % 64 )
Where:
X % Y means X module Y
X << Y means X left-shifted by Y bits
X >> Y means X right-shifted by Y bits
X & Y means X bit-wise ANDed with Y
Links to the Character Set Mapping Tables
Table 1 contains links to the main mapping table for each of the seven East Asian character sets supported by this