Datasheet

Chapter 1: SSIS Solution Architecture

different extraction requirements (such as optimization, incremental or targeted extraction, change data

capture, or data staging), and also addresses common challenges from the various source types, 32 - bit or

64 - bit platform.

If someone ever tells you that his or her source data is perfect, he or she is lying. Don ’ t believe it! The fact

is that data sources are rarely pristine, and often require handling data anomalies, missing data,

typographical errors, and just plain bad data. Chapter 6 delves into the practical design steps to handle

data cleansing in SSIS using the fuzzy logic, text parsing, and scripting components for data cleansing

and text mining. Chapter 6 also demonstrates the use of the SSIS Data Profile Task to better understand

the source data that you are dealing with.

Data Warehouse ETL and Cube Processing

The “ Design ” section of this chapter already reviewed the overall ETL architecture of a data warehouse

and BI system. As a background to Chapter 7 (which discusses dimension tables) and Chapter 8 (which

discusses fact table ETL), databases designed for data warehousing are created in a structure called a

dimensional model , which involves two types of tables:

Dimension tables hold informational data or attributes that describe entities.

Fact tables capture metrics or numeric data that describe quantities, levels, sales, or other

statistics.

A data warehouse involves both dimension tables and fact tables. Therefore, your ETL processes need to

handle the transformation and loading of dimension tables and fact tables.

Chapter 7 focuses on loading data into data warehouse dimension tables, which requires handling

attribute changes, managing business keys, and dealing with surrogate keys. Chapter 7 dives into the

best practices for loading different dimension table types, and examines how to optimize the SSIS

process to scale to large and complex dimension tables.

Chapter 8 focuses on loading data warehouse fact tables, which often contain millions or billions of

rows. Loading data into a fact table requires designing an SSIS package that can scale and also handle

the various loading scenarios (such as table partitions, large record updates, and missing keys). Chapter

8 considers the design practices for loading the different types of fact tables.

A data warehouse is often accompanied by a set of cubes. A Microsoft cubing engine, which is a part of

SQL Server, is called SQL Server Analysis Services (SSAS). Data warehouse or business intelligence

solutions that use SSAS need a data-processing architecture that loads data from the data warehouse or

mart into SSAS. Chapter 9 focuses on how to use SSIS to load SSAS data cube structures through the out -

of - the - box SSAS processing components, as well as through scripting and the command - line tools. Also

included in Chapter 9 is a discussion on how to manage SSAS partitions, and how to deal with

incremental data updates.

❑

c01.indd 30c01.indd 30 9/24/09 11:26:44 AM9/24/09 11:26:44 AM