Books
in black and white
Main menu
Share a book About us Home
Books
Biology Business Chemistry Computers Culture Economics Fiction Games Guide History Management Mathematical Medicine Mental Fitnes Physics Psychology Scince Sport Technics
Ads

Teradata RDBMS forUNIX SQL Reference - NCR

NCR Teradata RDBMS forUNIX SQL Reference - NCR, 1997. - 913 p.
Download (direct link): teradataforunix1997.pdf
Previous << 1 .. 210 211 212 213 214 215 < 216 > 217 218 219 220 221 222 .. 241 >> Next


G-12

Teradata RDBMS for UNIX SQL Reference
International and Japanese Character Support

Naming User-Defined Kanji Character Sets

User Defined Japanese Character Sets: Considerations

Naming User-Defined Kanji Character Sets

A special naming convention allows a Japanese character site to

identify the characteristics of a user-defined character set.

• They must have a three character suffix, beginning with _ (underscore), that defines the attributes of the set.

• Each character position after the underscore identifies one attribute; the character value defines the value of the attribute.

• The first position is a numeric value. The value of the first position should be 0. Other values are reserved for future use.

• The second position is an alphabetic value. The value in the second suffix position indicates the type encoding being named, as follows:

Code Description
A Single byte character ASCII.
E Single byte character EBCDIC.
I IBM (SO/SI style) mixed character single byte characters/multibyte characters and graphic multibyte characters.
U EUC mixed character single byte characters/multibyte characters and graphic multibyte characters.
S Shift-JIS mixed character single byte characters/multibyte characters and graphic multibyte characters.

For example, the suffix _0I in the name KANJIEBCDIC5026_0I implies IBM mixed single byte characters/multibyte characters.

The suffix is part of the character set name and must be included in the SET SESSION CHARSET command.

User-defined Japanese character sets involve the following considerations:

• Utilities and interfaces may not be prepared to handle character set names containing multibyte characters. For that, a utility or interface needs to know the encoding of the lexical units such as letters and ideographs), which depends on the particular character set. It is therefore recommended that characters set

Teradata RDBMS for UNIX SQL Reference

G-13
International and Japanese Character Support

Naming User-Defined Kanji Character Sets

names start with an unaccented uppercase Roman letter and consist only of unaccented uppercase Roman letters, digits, underscore, dollar sign, or number sign.

The lexical units of uppercase Latin letters and English punctuation characters are same across several different character sets. Therefore, if the names of user-defined character sets comprise only these common units (plus the appropriate suffix), a utility will be able to deal with them.

• The VARGRAPHIC function is not available for user-defined character sets.

• The new character set may require a different collation.

• The following characters cannot be used reliably in data streams. This restriction is necessary to properly identify single byte characters among the various Kanji character sets.

• 0x0E (internal representation of SO)

• 0x0F (internal representation of SI)

• 0x80 (internal representation for cs2 KanjiEUC data)

• 0xFF (internal representation for cs3 KanjiEUC data)

0x0E and 0x0F internal codes are best associated with suffix_0I character sets, 0x80 and 0xFF codes with suffix_0U character sets.

• Object Names. Allowed character ranges for KanjiEBCDIC, KanjiEUC, and KanjiShift-JIS object names, and object name validation criteria are described in Chapter 4, “Teradata SQL Lexicon,” under “Character Ranges”.

• Passwords. The range of valid characters in object names applies also to passwords. Note that the password formatting feature does not apply to the MBC (multibyte) character set.

G-14 Teradata RDBMS for UNIX SQL Reference
International and Japanese Character Support

Columns in DBC. CharTranslations View

Columns in DBC. CharTranslations View

Use the view named DBC.CharTranslations to load your character IntrOductiOn set hexadecimal codes.

DBC.CharTranslations Attributes

Attributes and use of each column in the DBC.CharTranslations view are:

• CharSetName (TranslateName) CHAR(30) NOT NULL

A name that uniquely identifies the character set for which these translation codes are being defined.The name should be meaningful to users, in that it should indicate the compatibility of the set with a particular type of host.

For example, a character set for the Swedish language on a DEC VAX host might be named SWED_ASCII, while a character set for the French language on an IBM VM host might be named FRENCH_EBCDIC.

For a Japanese character site, a name is a string that conforms to the following rules (see also Chapter 4, “Teradata SQL Lexicon,” “Creating Names Under Kanji Character Sets”):

• The length can be from 1 to 30 bytes.

• The contents can include the following:

• Uppercase and lowercase simple Latin letters (A...Z, a...z) in single byte and multibyte forms.

• Digits (0-9) in single byte and multibyte forms. These cannot appear as the first character in the string.

• The special characters $ (dollar sign), # (number sign), and _ (underscore) in single byte and multibyte forms.
Previous << 1 .. 210 211 212 213 214 215 < 216 > 217 218 219 220 221 222 .. 241 >> Next