Books
in black and white
Main menu
Share a book About us Home
Books
Biology Business Chemistry Computers Culture Economics Fiction Games Guide History Management Mathematical Medicine Mental Fitnes Physics Psychology Scince Sport Technics
Ads

The data worehouse ETL Toolkit - Kimpball R.

Kimpball R., Caserta J. The data worehouse ETL Toolkit - John Wiley & Sons, 2005. - 526 p.
SBN: 0-764-57923-1
Download (direct link): thedatawarehouse2005.pdf
Previous << 1 .. 222 223 224 225 226 227 < 228 > 229 230 231 232 .. 233 >> Next

PICtures, using, 81-83 redefined fields, working with,
84-85
variable record lengths, 89-90 parallel processing queries, 288-290 process time, load time, estimating, 321 -323 Web log sources described, 97-98 name value pairs, 100-101 W3C common and extended formats, 98-100 XML sources character sets, 94 described, 93-94 DTD, 95-96
meta data, described, 94-95 namespaces, 97 XML Schema, 96-97
F
fact tables accumulating snapshot, 222-224 aggregations, 241-247 deleting facts, 230 described, 209-212 dimensional data, delivering to OLAP cubes, 247-253 ETL data structure, 45-46 factless, 232-233 fundamental grains, 217-224 graceful modifications, 235-236 incremental loading, 228 indexes, 224 inserting facts, 228 late arriving facts, 239-241 loading data, 226-227 logically deleting facts, 232 multiple units of measure, 237-238 negating facts, 229-230 partitions, 224-226 periodic snapshot, 220-222 physically deleting facts, 230-232 provider, cleaning and conforming, 152 referential integrity, 212-214, 285 revenue, collecting in multiple currencies, 238-239 rollback log, outwitting, 226 surrogate key pipeline, 214-217 transaction grain, 218-220 Type 1, augmenting with Type 2 history, 234-235 updating and correcting facts,
228-229 updating facts, 230 facts adding, 236
cleaning and conforming, 151-152 delivering to OLAP cubes,
250-252
476 Index
failures, long-running processes, analyzing, 324-325 family of schemas, 245-246 filtering development, 269 disabling, 293
ETL throughput, increasing, 297 flat file, 37
throughput, improving, 294 final load, parallel processing, 291-292 financial-reporting issues, 5 fixed length flat files, extracting, 91-92
flat and snowflaked, dimension tables, 167-170 flat files ETL data structure, 35-38 extracting delimited, processing, 93 described, 90-91 fixed length, processing, 91-92 integrating data from, 261-262 redirecting from staging database tables, 296-297 space for long-running processes, 328-329 foreign key adding, 236
constraints, when to eliminate, 281 defined, 162 fact table, 210
snapshot fact tables, overwriting, 222-223 format date, 98
dates in nondate fields, 72 disk space, saving, 83 obsolete and archaic formats, 347-348
packed numeric format, 83, 271 zoned numeric, 82, 266 forward engineering, 68 fragmented data, reorganizing, 281 front room, metadata, 356-359 FTP
interrupted process, restarting, 261 security issues, 345 full rollup, 299
fundamental grains, fact tables,
217-224
G
gawk Unix utility aggregated extracts, 274-276 field extracts, 273 subset, extracting, 271-272 globalization, 423 graceful modifications, fact tables, 235-236 grain
dimension tables, 165-166 fact tables, 46, 210 graphical user interfaces (GUI), 306, 307-308
greater/less than (<>) operators,
110
Group By clauses, 286 group entity, multivalued dimensions, 196-197 GUI (graphical user interfaces), 306, 307-308
H
half-bytes, 83 halt condition, 136 hand coding, ETL tool versus, 10-16 hard copy, long-term archiving and recovery, 348-349 hard disk failure, long-running processes and, 326-327
Index 477
hard disk space, numeric data formats, 83 heterogeneous data sources, integrating, extracting, 73-77 hierarchy complex dimension, 168 mapping tables, referential integrity, 286 XML structures, 39 HINT keyword, 110 historic data, purging, operations, 330
horizontal task flow, 14,324 hot extract, 20 hot spot, 336 HTTP status, 99 hub-and-spoke architecture, 434-436
hybrid slowly changing dimension tables, 193-194
I
identifiers, 68
IDMS, extracting data from, 90 impact analysis metadata, 380
planning and design standards, 49 reporting, 198 implied conversions, 61 IMS, extracting data from, 90 incremental loading, fact tables, 228 indexes
fact tables, avoiding bottlenecks with, 224 natural key as, 216 processes, tuning, 341-343 space required by long-running processes, 328 throughput, increasing, 295-296 usage report, 338 WHERE clause columns, 109
initial and incremental loads, changed data, detecting, 109 INSERT statement safety of, 212 screens, running, 138 UPDATE versus, 184, 226-227 inserts fact tables, 228 processes, tuning, 340-341 speeding with database bulk loader utilities, 276-280 instrument dimension table, 190 integrated scheduling and support tool, 311-312 integrated source system, 127 integrity checking, nonrelational data sources, 44 internal versus external hires, 397 interval of relevance, date-time stamps, 191 intra-day execution, scheduling and support, 305 I/O contention, 296 IP address, user's ISP, 99
J
jobs
ordinal position of, 282-286 staging area, 34 joins, NULL value failures, 71-72 junk dimension, 176
K
key constituencies, 117-119 keys
dimension tables, 162-165 foreign
constraints, when to eliminate, 281 defined, 162 dimension, adding, 236
478 Index
keys (cant.) fact table, 210
snapshot fact tables, overwriting, 222-223 natural, 162-163 primary calendar date dimension, 172 fact table, 211 surrogate creating, 164 defined, 162
Previous << 1 .. 222 223 224 225 226 227 < 228 > 229 230 231 232 .. 233 >> Next