Books
in black and white
Main menu
Share a book About us Home
Books
Biology Business Chemistry Computers Culture Economics Fiction Games Guide History Management Mathematical Medicine Mental Fitnes Physics Psychology Scince Sport Technics
Ads

The data worehouse ETL Toolkit - Kimpball R.

Kimpball R., Caserta J. The data worehouse ETL Toolkit - John Wiley & Sons, 2005. - 526 p.
SBN: 0-764-57923-1
Download (direct link): thedatawarehouse2005.pdf
Previous << 1 .. 48 49 50 51 52 53 < 54 > 55 56 57 58 59 60 .. 233 >> Next

¦ Transfer data to the warehouse system in its native EBCDIC format. This option is viable only if your ETL tools or process can process EBCDIC data. Several types of tools can perform this task.
¦ Use robust ETL tools that can process native EBCDIC, including accurately handling numeric data store in any mainframe-type numeric formats.
¦ Use a utility program that can reformat data into display format on the warehouse platform. If you receive EBCDIC data and are writing the ETL process without the aid of a specialized ETL tool, we strongly recommend purchasing a utility program that can perform the numeric format conversion and EBCDIC-to-ASCII translation duties. Some relatively inexpensive, commercially available programs handle this task quite well.
Working with Redefined Fields
Rather than wasting space—remember it used to be expensive—mainframe engineers devised REDEFINES, which allow mutually exclusive data elements to occupy the same physical space. Figure 3.8 contains an excerpt from a COBOL Copybook that helps illustrate the concept of REDEFINES in mainframe data files. The excerpt describes the data fields that represent an employee's wage information. Notice EMPLOYEE-TYPE, which is a one-byte code that indicates whether the employee is exempt or hourly. Also, notice that two separate series of fields carry the wage information for the employee. The field set used depends on whether the employee is exempt or hourly. Exempt employees' wages are represented in three fields (PAY-GRADE, SALARY, and PAY-PERIOD), which take up a total of eight
Extracting 85
05 EMPLOYEE-TYPE PIC X.
8 8 EXEMPT VALUE 'E'.
8 8 HOURLY VALUE 'H'.
05 WAGES.
10 EXEMPT-WAGES.
15 PAY-GRADE PIC X(2).
15 SALARY PIC 9(6)V99 COMP3
15 PAY-PERIOD PIC X.
88 BI-WEEKLY VALUE '1'.
88 MONTHLY VALUE '2'.
10 NON-EXEMPT-WAGES
REDEFINES EXEMPT-WAGES.
15 PAY-RATE PIC 9(4)V99.
15 JOB-CLASS PIC X(1).
15 FILLER PIC X.
Len Positions
1 71 I - I 71

8 72 - 79
8 72 - 79
2 72 - 73
5 74 - 78
1 79 - 79

8 72 79
6 72 - 77
1 78 - 78
1 79 - 79
Figure 3.8 REDEFINES clause in a COBOL copybook.
bytes. Hourly employees use a different set of fields that take up seven bytes (PAY-RATE and JOB-CLASS).
Since an employee is exempt or hourly, never both, only one of the two field sets is ever used at a time. The exempt wage fields occupy positions 72 through 79 in the file, and the hourly wage fields occupy positions 72 though 78. Furthermore, notice that the fields for exempt and hourly wages use different data types even though they occupy the same positions. When reading the employee record, the program must determine how to interpret these positions based on the value of EMPLOYEE-TYPE in position 71.
The same positions can have more than one REDEFINES associated with them, so rather than just two possible uses, the same positions can have two, three, or more possible uses. REDEFINES introduce one further complication that renders mere EBCDIC-to-ASCII character-set translation insufficient.
? When you encounter multiple REDEFINES in your sources, you should consider making each definition a separate pass of the extract logic over the source data if the subsequent processing is quite different (using Exempt versus Hourly as an example). This would allow you to build separate code lines for each extract rather than one complex job with numerous tests for the two conditions.
Multiple OCCURS
Mainframe and COBOL precede relational databases and Edward Codd's normalization rules. Prior to utilizing relational theory to design databases, repeating groups were handled with mainframe COBOL programs that use an OCCURS clause to define data fields that repeat within a data file. For
86 Chapter 3
Len Position
05 PERFORMANCE-RATING-AREA PIC X(100). 100 80 - 179
0 5 PERFORMANCE-RATINGS 100 80 - 179
REDEFINES PERFORMANCE-RATINGS-AREA.
07 PERFORMANCE-RATING OCCURS 5 TIMES. 20 80 - 99
15 PERF-RATING PIC X(3). 3 80 - 82
15 REVIEWER-ID PIC X(9). 9 83 - 91
15 REVIEW-DATE. 8 92 - 99
20 REVIEW-DATE-YYYY PIC 9(4). 4 92 - 95
20 REVIEW-DATE-MM PIC 9(2). 2 96 - 97
20 REVIEW-DATE-DD PIC 9(2). 2 98 - 99
Figure 3.9 COBOL copybook with OCCURS clause to define repeating groups within a data record.
example, in Figure 3.9 you can see an area of an employee record that stores information about performance ratings. The record is designed to keep track of up to five performance ratings. But rather than creating the needed fields five times—remember, this precedes relational theory so there won't be a separate performance rating table with a foreign key that points back to the employee—they are named only once within a special OCCURS field. The OCCURS clause indicates the number of times the fields within it repeat. Essentially, the OCCURS clause defines an array contained within the file. Thus, in the employee record, data for the first performance rating occupies positions 80 to 99, the second rating from 100 to 119, the third from 120 to 139, the fourth from 140 to 159, and the fifth—and last—from 160 to 179.
Previous << 1 .. 48 49 50 51 52 53 < 54 > 55 56 57 58 59 60 .. 233 >> Next