Datasheet

Chapter 1: SSIS Solution Architecture
11
The tendency when developing a new integration or ETL system is to get it done as quickly as possible.
What often happens is that the overall architecture is not integrated well into an organization ’ s
environment. Maybe some time is saved (and that is even questionable), but in the end, more time and
money will be wasted.
A solution architecture should have several key data-processing objectives. The following apply to SSIS -
based solutions, but also relate generically to any data-processing solution architecture:
The solution should coordinate with other data - centric solutions in the enterprise. Do not build
a separate data silo, especially if your effort is a data warehouse or data mart that causes
multiple versions and variations of the data.
Source data that is required for the solution must be extracted as close to the source as possible
and not plugged at the end of a long dependency chain. (Be sure to follow the previous bullet
point).
The solution should have a centralized administration and execution strategy that coordinates
with other systems, or follows the practices of other corporate systems. This does not require
limiting a scale - out architecture, but simply that the support structures are centralized.
Real - time execution auditing is also needed to know what is happening and what did happen.
This information will go a long way in supporting a system. In addition, you should have a way
to track data back to the source. This tracking is critical for both data validation and
troubleshooting data errors.
The processes must have rollback layers. In other words, if your requirements necessitate a
complicated or long process, don t require the whole process to be re - run from the start if the
last part breaks. Plan for restarting at interim points after the issues are identified. Doing so also
enables you to easily compartmentalize changes in the ETL solution.
These objectives represent the larger picture of an overall solution architecture. Other aspects, of course,
are important and situational to what you are building in SSIS. Two common types of data-processing
efforts are discussed in the following sections: system integration and data warehouse ETL. These fit
well into the SSIS toolset.
Data Integration or Consolidation
One common use of SSIS is to integrate data between systems, or to synchronize data. For example, you
may want to create a business - to - business portal site, and you may need the site to interface with the
source data on the mainframe. In this case, you may get the data delivered in nightly extracts from the
mainframe and load it into your SQL Server table.
Another very common ETL task that DBAs face is receiving files from File Transfer Protocol (FTP)
servers (or on network shares) that must be processed and loaded into another system. This type of
process involves moving files, and then processing the data, which may involve de - duping (removing
duplicates), combining files, cleaning bad data, and so on. Two systems may also need to talk to one
another or pass business keys in order for records to be matched between environments.
Figure 1 - 5 shows an example solution architecture that integrates data between different systems in an
enterprise.
c01.indd 11c01.indd 11 9/24/09 11:26:33 AM9/24/09 11:26:33 AM