Datasheet

Chapter 1: SSIS Solution Architecture
30
different extraction requirements (such as optimization, incremental or targeted extraction, change data
capture, or data staging), and also addresses common challenges from the various source types, 32 - bit or
64 - bit platform.
If someone ever tells you that his or her source data is perfect, he or she is lying. Don t believe it! The fact
is that data sources are rarely pristine, and often require handling data anomalies, missing data,
typographical errors, and just plain bad data. Chapter 6 delves into the practical design steps to handle
data cleansing in SSIS using the fuzzy logic, text parsing, and scripting components for data cleansing
and text mining. Chapter 6 also demonstrates the use of the SSIS Data Profile Task to better understand
the source data that you are dealing with.
Data Warehouse ETL and Cube Processing
The Design section of this chapter already reviewed the overall ETL architecture of a data warehouse
and BI system. As a background to Chapter 7 (which discusses dimension tables) and Chapter 8 (which
discusses fact table ETL), databases designed for data warehousing are created in a structure called a
dimensional model , which involves two types of tables:
Dimension tables hold informational data or attributes that describe entities.
Fact tables capture metrics or numeric data that describe quantities, levels, sales, or other
statistics.
A data warehouse involves both dimension tables and fact tables. Therefore, your ETL processes need to
handle the transformation and loading of dimension tables and fact tables.
Chapter 7 focuses on loading data into data warehouse dimension tables, which requires handling
attribute changes, managing business keys, and dealing with surrogate keys. Chapter 7 dives into the
best practices for loading different dimension table types, and examines how to optimize the SSIS
process to scale to large and complex dimension tables.
Chapter 8 focuses on loading data warehouse fact tables, which often contain millions or billions of
rows. Loading data into a fact table requires designing an SSIS package that can scale and also handle
the various loading scenarios (such as table partitions, large record updates, and missing keys). Chapter
8 considers the design practices for loading the different types of fact tables.
A data warehouse is often accompanied by a set of cubes. A Microsoft cubing engine, which is a part of
SQL Server, is called SQL Server Analysis Services (SSAS). Data warehouse or business intelligence
solutions that use SSAS need a data-processing architecture that loads data from the data warehouse or
mart into SSAS. Chapter 9 focuses on how to use SSIS to load SSAS data cube structures through the out -
of - the - box SSAS processing components, as well as through scripting and the command - line tools. Also
included in Chapter 9 is a discussion on how to manage SSAS partitions, and how to deal with
incremental data updates.
c01.indd 30c01.indd 30 9/24/09 11:26:44 AM9/24/09 11:26:44 AM