Books
in black and white
Main menu
Share a book About us Home
Books
Biology Business Chemistry Computers Culture Economics Fiction Games Guide History Management Mathematical Medicine Mental Fitnes Physics Psychology Scince Sport Technics
Ads

The data worehouse ETL Toolkit - Kimpball R.

Kimpball R., Caserta J. The data worehouse ETL Toolkit - John Wiley & Sons, 2005. - 526 p.
SBN: 0-764-57923-1
Download (direct link): thedatawarehouse2005.pdf
Previous << 1 .. 96 97 98 99 100 101 < 102 > 103 104 105 106 107 108 .. 233 >> Next

In a Type 3 SCD, instead of issuing a new row when a change takes place, a new column is created (if it does not already exist), and the old value is placed in this new field before the primary value is overwritten. For the example of the product category, we assume the main field is named Category. To implement the Type 3 SCD, we alter the dimension table to add the field Old Category. At the time of the change, we take the original value of Category and write it into the Old Category field; then we overwrite the Category field as if it were a Type 1 change. See Figure 5.18. No keys need to be changed in any dimension table or in any fact table. Like the Type 1 SCD, if aggregate tables have been built directly on the field undergoing
Prod Key Prod ID Prod (PK) (NK) Name Size Category Color
1127648 A 107B Denim 38 Men's Blue
pants wear
Add field; Transfer old value
I) ^
Prod Key Prod ID Prod Old
(PK) (NK) Name Size Category Category Color
1127648 A107B Denim 38 Leisure Men's Blue
pants wear wear
A
Overwrite with new value
Figure 5.18 Implementing the Type 3 SCD for a product-category description.
Delivering Dimension Tables 193
the Type 3 change, these aggregate tables need to be recomputed. This procedure is described in Chapter 6.
^ Type 3 changes often do not come down the normal data-flow pipeline. Rather, they are executive decisions communicated to the ETL team, often verbally. The product-category manager says, "Please move Brand X from Mens Sportswear to Leather Goods, but let me track Brand X optionally in the old category." The Type 3 administration is then kicked off by hand, and can even involve a schema change, if the changed attribute (in this case, brand) does not have an alternate field.
When a new record is added to a dimension that contains Type 3 fields, a business rule must be invoked to decide how to populate the old value field. The current value could be written into this field, or it could be NULL, depending on the business rule.
We often describe the Type 3 SCD as supporting an alternate reality. In our product-category example, the end user could choose between two versions of the mapping of products to categories.
The Type 3 SCD approach can be extended to many alternate realities by creating an arbitrary number of alternate fields based on the original attribute. Occasionally, such a design is justified when the end user community already has a clear vision of the various interpretations of reality. Perhaps the product categories are regularly reassigned but the users need the flexibility to interpret any span of time with any of the category interpretations. The real justification for this somewhat awkward design is that the user interface to this information falls out of every query tool with no programming, and the underlying SQL requires no unusual logic or extra joins. These advantages trump the objections to the design using positionally dependent attributes (the alternate fields).
Hybrid Slowly Changing Dimensions
The decision to respond to changes in dimension attributes with the three SCD types is made on a field-by-field basis. It is common to have a dimension containing both Type 1 and Type 2 fields. When a Type 1 field changes, the field is overwritten. When a Type 2 field changes, a new record is generated. In this case, the Type 1 change needs to be made to all copies of the record possessing the same natural key. In other words, if the ethnicity attribute of an employee profile is treated as a Type 1, if it is ever changed (perhaps to correct an original erroneous value), the ethnicity attribute must be overwritten in all the copies of that employee profile that may have been spawned by Type 2 changes.
194 Chapter 5
Sales Team Dimension

Q

S
Sales Team Key (PK)
Sales Team Number (NK)
Sales Team Name
Sales Physical Address (remains unchanged)
District (the district assignment valid between the following dates)
Begin Effective Date (the first date this record is valid)
End Effective Date (the last date this record is vaild)
Obsolete District1 (a selected obsolete definition)
Obsolete District2 (a different obsolete definition, etc.)
Current District (the most current district assignment; periodically overwritten) ... plus other unrelated sales team attributes
Figure 5.19 A hybrid SCD showing all three types.
It is possible to combine all the SCD types in a single dimension record. See Figure 5.19. In this example, the district assignment field for the sales team is a Type 2 attribute. Whenever the district assignment changes, a new record is created, and the beginning and effective dates are set appropriately. The set of yearly old district assignments are Type 3 fields, implementing many alternate realities. And finally, the current district assignment is a Type 1 field, and it is overwritten in all copies of the sales team dimension records whenever the current district is reassigned.
Previous << 1 .. 96 97 98 99 100 101 < 102 > 103 104 105 106 107 108 .. 233 >> Next