Download (direct link):
The challenge for the ETL team is building and maintaining the group entity table. In the health care example, as patient-treatment records are presented to the system, the ETL system has the choice of either making
Health Care Billing N Diagnosis Group / Diagnosis Group Bridge \ Diagnosis Dimension
Line Item Fact Dimension
/ \ /
Figure 5.20 Using a bridge table to represent multiple diagnoses.
Delivering Dimension Tables 197
Incoming diagnosis group list
lookup in existing diag groups
create new diag group add to existing diag gp table
use existing diag group key
insert diag gp key in fact table
Figure 5.21 Processing diagnosis groups in an outpatient setting.
each patient's set of diagnoses a unique diagnosis group or reusing diagnosis groups when an identical set of diagnoses reoccurs. There is no simple answer for this choice. In an outpatient setting, diagnosis groups would be simple, and many of the same ones would appear with different patients. In this case, reusing the diagnosis groups is probably the best choice. See Figure 5.21. But in a hospital environment, the diagnosis groups are far more complex and may even be explicitly time varying. In this case, the diagnosis groups should probably be unique for each patient and each hospitalization. See Figure 5.22 and the discussion of time-varying bridge tables that follows. The admission and discharge flags are convenient attributes that allow the diagnosis profiles at the time of admission and discharge to be easily isolated.
Administering the Weighting Factors
The diagnosis group tables illustrated in Figures 5.20 and 5.22 include weighting factors that explicitly prorate the additive fact (charge dollars) by each diagnosis. When a requesting query tool constrains on one or more
Diagnosis Group key (FK)
Diagnosis key (FK)
Begin Effective Date (FK)
End Effective Date (FK)
Admission Flag (T/F)
Discharge Flag (T/F)
Figure 5.22 A time-varying diagnosis group bridge table appropriate for a hospital setting.
198 Chapter 5
diagnoses, the tool can chose to multiply the weighting factor in the bridge table to the additive fact, thereby producing a correctly weighted report. A query without the weighting factor is referred to as an impact report. We see that the weighting factor is nothing more than an explicit allocation that must be provided in the ETL system. These allocations are either explicitly fetched from an outside source like all other data or can be simple computed fractions depending on the number of diagnoses in the diagnosis group. In the latter case, if there are three diagnoses in the group, the weighting factor is 1/3 = 0.333 for each diagnosis.
In many cases, a bridge table is desirable, but there is no rational basis for assigning weighting factors. This is perfectly acceptable. The user community in this case cannot expect to produce correctly weighted reports. These front-room issues are explored in some depth in the Data Warehouse Toolkit, Second Edition in the discussion of modeling complex events like car accidents.
Time-Varying Bridge Tables
If the multivalued dimension is a Type 2 SCD, the bridge table must also be time varying. See Figure 5.23 using the banking example. If the bridge table were not time varying, it would have to use the natural keys of the customer dimension and the account dimension. Such a bridge table would potentially misrepresent the relationship between the accounts and customers. It is not clear how to administer such a table with natural keys if customers are added to or deleted from an account. For these reasons, the bridge table must always contain surrogate keys. The bridge table in Figure 5.23 is quite sensitive to changes in the relationships between accounts and customers. New records for a given account with new begin-date stamps and end-date stamps must be added to the bridge table whenever:
™ The account record undergoes a Type 2 update
¦ Any constituent customer record undergoes a Type 2 update
ø A customer is added to or deleted from the account or
¦ The weighting factors are adjusted
Account key (FK)
Customer key (FK)
Begin effective date (FK)
End effective date (FK)
Primary Account Holder Flag (T/F)