Books
in black and white
Main menu
Share a book About us Home
Books
Biology Business Chemistry Computers Culture Economics Fiction Games Guide History Management Mathematical Medicine Mental Fitnes Physics Psychology Scince Sport Technics
Ads

The data worehouse ETL Toolkit - Kimpball R.

Kimpball R., Caserta J. The data worehouse ETL Toolkit - John Wiley & Sons, 2005. - 526 p.
SBN: 0-764-57923-1
Download (direct link): thedatawarehouse2005.pdf
Previous << 1 .. 226 227 228 229 230 231 < 232 > 233 >> Next

S
SAP implementation, 103-105 Sarbanes-Oxley Act, 4-5 scalar functions, ordinal position of jobs, 287 SCD (slowly changing dimension) type, 58-59, 76,183 scheduling automation, 14 batches, 377
command-line execution, 306 custom application, 313 described, 302 integrated tool, 311-312 intra-day execution, 305 load dependencies, 314 metadata, 314-315, 377 nested batching, 307-309 notification and paging, 306-307 operating system, 312-313 parameter management, 309-311 real-time execution, 305 reliability, availability, and manageability analysis,
302-303 strategy, 303-304 third-party scheduler, 312 token aware function, 304-305
Index 487
schema, family of, 245-246 scope, defining, 405 screens
anomaly detection phase, 131-134 column distribution reasonability, 146-147 column length restriction, 143 column nullity, 140-141 column numeric and data ranges,
141-143 column property enforcement,
134-135
data and value rule enforcement,
135-136
data and value rule reasonability, 147
data-profiling checklist, 139-140 data-quality, 114 dealing with unexpected conditions, 138-139 invalid column explicit values,
144
known table row counts, 140 overall process flow, 136-138 structure enforcement, 135 table row count reasonability,
144-146 valid column explicit values, 143-144
scripting languages, development, 260
scripts, launching sequence, 309 seconds, time dimension, building, 173
security, 6-7,15-16 sequential execution, 308-309 server contention, 335 IP address, 99
proven technology, acquiring,
12
SET operators, 110 shared memory, 334 short-term archiving and recovery, operations, 345-346 simplicity, planning and leadership, 390
skills, available, 9 slowly changing dimension (SCD) type, 58-59, 76,183 snowflake, subdimensions, 284 SORT utility program, 264-266 sorting data DBMS or dedicated packages, deciding between, 37 on mainframe systems, 264-266 during preload, 263 on Unix and Windows systems, 266-269 source data quality control, handling, 123,
124
referencing, 38 source file, records, subset of,
269-270 source system analyzing, 67-71
collecting and documenting, data discover, 63 disk failure, 326 metadata, 353-354, 361-362 structured, writing to flat files or relational tables, 18 tracking, 63-66 SQL
bridge table hierarchies,
202-204 column length screening, 143 current dimension attribute values, 185 default unknown values, finding, 143-147
488 Index
SQL (cont.) explicit values, finding, 144 fact tables, drilling across, 151 interface, benefits of, 41 MINUS query, 232 null values, returning, 140-141 numeric ranges outside normal, 142
sorting data during preload,
263-264 transformation within logical map, 61 updating facts, 230 staff
building and retaining a team, 398-400
churn, advantages of tool suites versus hand-coding, 12 internal versus external hires,
397
outsourcing development, 400-401
recruiters, working with, 396-397 team members, selecting, 398 team roles and responsibilities, 394-396 staging benefits of, 8, 30-31, 37 processing in memory versus, 29-31
tables, 30, 231-232, 294 staging area described, 17-18 designing, 31-35 disk failure, 326 stale data, 305
standard scenario, snapshot fact table, 222 standards enforcing, 388-389
long-term archiving and recovery,
348-349 star schema, 125,126 straddle constraints, 206 strategy, scheduling and support, operations, 303-304 streaming data flow, batch extracts from source data versus, 13-14 structure enforcement, cleaning and conforming, 135 subdimensions of dimension tables, 180-182 referential integrity, 284 subject area, 20 subset, extracting, 269-270,
270-271, 273 surrogate key creating, 164 defined, 162
dimensional domains, confusing, 170
mapping tables, 48 pipeline, fact tables, 214-217 surviving cleaning and conforming,
158-159 dimension, managing, 450-451 rules, establishing, 75 SyncSort SORT utility program,
264-266 system inventory, metadata, 364-365
system-of-record, determining, 63, 66-67
T
table
relationships between, 70 row count, 140,144-146 usage report, 338
Index 489
team
database expertise, 387-388 members, selecting, 398 roles and responsibilities, 394-396 technical definitions, 50 technical metadata business rules, 366-367 data definitions, 365-366 data models, 365 described, 363-364 system inventory, 364-365 temp space, 286, 327-328 testing
position of fixed length flat files, 92
processes, 408-409
project management, 409-410,
411
QA, 408-409 UAT, 409 unit, 408 text string, replacing or substituting, 37-38 third normal form entity/relation models, ETL data structure, 42 third-party scheduler, 312 360 degree view of the customer, 7 throughput optimizing, 390-391 processing time, 332 time
of additions or modifications, 106-107 data retrieval, 323 development, 260-261 dimension tables, 170-174 stamping dimension records, 191 W3C format, 98-99 timed extracts, detecting changed data, 108
Previous << 1 .. 226 227 228 229 230 231 < 232 > 233 >> Next