Books
in black and white
Main menu
Share a book About us Home
Books
Biology Business Chemistry Computers Culture Economics Fiction Games Guide History Management Mathematical Medicine Mental Fitnes Physics Psychology Scince Sport Technics
Ads

Teradata RDBMS forUNIX SQL Reference - NCR

NCR Teradata RDBMS forUNIX SQL Reference - NCR, 1997. - 913 p.
Download (direct link): teradataforunix1997.pdf
Previous << 1 .. 227 228 229 230 231 232 < 233 > 234 235 236 237 238 239 .. 241 >> Next


+ -- +----------------------------------------------------+

41| IBM Extended non-Kanji set (134 ch) |

+------------------------------------------------------ +

42| Alphanumeric (94 ch) |

43| Basic non-Kanji sep Zen-Katakana (99 ch) |

44| Basic non-Kanji sep .... Hiragana(153 ch) |

+------------------------------------------------------ +

45| IBM Basic Kanji set (3,226 ch) |

. | |

. | (arranged based on frequency of use) |

. | |

55| |

+------------------------------------------------------ +

56| IBM Extended Kanji set (3,483 ch) |

1st . | |

Byte . | (arranged according to Kanji |

(41-7F) . | ideographical order) |

68| |

+------------------------------------------------------ +

69| User Area (4,370 ch) |

. | |

. | |

. | |

7F| |

+------------------------------------------------------ +

80| Marker reserved area |

. | |

. | |

. | |

FE| |

Table H-2 Selected Characters for EBCDIC Kanji Double byte Space Double byte Underscore Double byte Percent SO SI
0x4040 0x426D 0x426C 0x0E 0x0F

H-12 Teradata RDBMS for UNIX SQL Reference
Japanese Character Sets

Extended UNIX Code (EUC)

Extended UNIX Code (EUC)

For UNIX client systems, the Teradata RDBMS supports the Extended UNIX Code (EUC).

EUC is composed of one primary and three supplementary codesets. The primary codeset, codeset 0, is used for ACSII characters. The three supplementary code sets, code sets 1, 2, and 3, can be assigned to different character sets by the user. There is a system default assignment for these codesets.

The primary codeset is defined to be a single byte with the most significant (high-order) bit set to 0. The supplementary codesets can be multiple bytes, and the most significant bit of each is set to 1.

Code sets 2 and 3 have a preceding single-shift character, known as ss2 and ss3, respectively, where ss2 is 0x8E and ss3 is 0x8F. Differentiation between codesets is as follows:

If the most significant bit is 0, then the code set is one-byte ASCII.

If the most significant bit is 1, then the byte is checked for ss2 or ss3 to determine the code set. The length in bytes of characters from that code set is retrieved from an ANSI localization table governing character classification, and that number of bytes is read in.

Table H-3 and Table H-4, and Figure H-9 show the EUC code sets for the Japanese Language Environment localizations and selected EUC characters.

Teradata RDBMS for UNIX SQL Reference

H-13
Japanese Character Sets

Extended UNIX Code (EUC)

Table H-3

EUC Code Set Localization

Table H-4

Selected Characters for EUC Kanji

Codeset EUC Representation (In Bits) Japanese Language Environment Implementation
cs0 0xxxxxxx U.S. ASCII
cs1 lxxxxxxx lxxxxxxx JIS-x0208 (Kanji Characters). The first lxxxxxxx must not be ss2 or ss3. The valid range of the first byte is Al-FE and the valid range of the second byte is Al-FE. Those ranges are implied by the JIS-x0208 standard.
cs2 SS2 lxxxxxxx JIS-x020l (half-size Katakana). The valid range of the second byte is Al-DF.
cs3 SS3 lxxxxxxx lxxxxxxx JIS-x02l2 The valid range of the first byte is Al-FE and the valid range of the second byte is Al-FE. These ranges are implied by the JIS-x02l2 standard.

Double Byte Space Double Byte Underscore Double Byte Percent Shift-Out Shift-In
0xAlAl 0xAlB2 0xAlF3 (NA) (NA)

H-14

Teradata RDBMS for UNIX SQL Reference
Japanese Character Sets

Extended UNIX Code (EUC)

Figure H-8

KanjiEUC Encoding for Kanji

2nd byte

Teradata RDBMS for UNIX SQL Reference

H-15
Japanese Character Sets

Shift-JIS (DOS Kanji) Encoding

Table H-5 Shift-JIS Encoding

Table H-6

Selected Characters for Shift-JIS Kanji

Shift-JIS (DOS Kanji) Encoding

DOS/V is an implementation of a Japanese character set that uses the undefined columns of JIS-x0201; those bytes are the first bytes for 2-byte Kanji characters. This encoding is referred to as the Shift-JIS encoding.

The following tables show the Shift-JIS encoding according to character values, and selected Shift-JIS characters. Figure H-10 illustrates the encoding ranges.

Note: Even though Figure H-10 shows data at second byte 7F, there is none, as indicated in Table H-5.

Hex Representation of Shift-JIS Shift-JIS Implementation
0x00-0x7E, 0xA1-0xDF JIS-x0201
0x81-0x9F, 0xE0-0xFC First byte of double-byte representation. Its mapping is as follows: 1. 0x81-0x9F--Contains rows 1 to 62 from JIS-x0208. 2. 0xE0-0xEF--Contains rows 63 to 94 from JIS-x0208. 3. 0xF0-0xF9--Contains 1,880 Gaiji characters. 4. 0xFA-0xFC--Contains IBM-defined characters.
0x40-0x7E, 0x80-0xFC Second byte of double byte representation.

Double byte Space Double byte Underscore Double byte Percent Shift-Out Shift-In
0x8140 0x8151 0x8193 (NA) (NA)

H-16

Teradata RDBMS for UNIX SQL Reference
Japanese Character Sets

Shift-JIS (DOS Kanji) Encoding

Figure H-9

Shift-JIS Encoding for Kanji
Previous << 1 .. 227 228 229 230 231 232 < 233 > 234 235 236 237 238 239 .. 241 >> Next