Datasheet
Chapter 1: SSIS Solution Architecture
23
Disk Volumes and Configuration
Following are some general principles to follow as you try to estimate your disk volumes:
Limit the use of internal server storage, and especially don ’ t put your databases on the boot/
system drive (
C: ).
Go with smaller, faster drives, and more of them, rather than bigger and slower drives (except
for backups and archive). You can get a lot more throughput for the drives because you can
stripe more drives.
Separate your staging and
TempDB databases on separate drives (even when using a storage area
network, or SAN) because, for ETL operations, you will create a bad I/O bottleneck if your
source or destinations share the same drives as staging.
Logs should also be on a separate drive because, even if your database is set up as simple
recovery (where the log file gets truncated), you will still generate a lot of log activity during
ETL operations.
If you estimate your destination database will be 1TB, you will probably need 3 – 4TB of raw
drive space to accommodate for logs, temp, staging, disk striping, redundancy (RAID), and so
on. Set this expectation upfront!
Storage Area Network (SAN) Versus Direct Attached Storage (DAS)
Both Storage Area Networks (SANs) and Direct Attached Storage (DAS) have their benefits and
drawbacks. SANs come at a higher price, but have the benefit of adding better redundancy, caching,
controller throughput, more drives in a stripe, fault tolerance (clusters in different cities), advanced disk
mirroring (where a mirror can be split and mounted on other servers), dual read in a mirror (where both
drives in a mirror can be read at the same time), and so on.
DAS has the benefit of cost (a fraction of the cost of a SAN), but can also achieve similar throughput
(and, in some cases, faster throughput, but without the caching) and easier control of the setup and
configuration.
For mission - critical ETL processes and databases, use a SAN. But if your solution doesn ’ t need that high
availability, you can consider DAS. Or, you can consider DAS for your staging environment, and SAN
for your production databases.
This is an SSIS ETL book, so you should consult the current best practices out there for
recommendations on drive configuration. However, just remember that ETL generates a lot of I/O in a
short amount of time and, therefore, you should watch out for recommendations that are targeted for
transactional systems.
A lot of varying and seemingly contradictory recommendations are out there, but each is based on a set
of assumptions for different types of data - centric solutions. Be careful to understand those assumptions
in your decision.
The next section is related to hardware, and addresses the question of where you should run your SSIS -
based ETL operations.
❑
❑
❑
❑
❑
c01.indd 23c01.indd 23 9/24/09 11:26:39 AM9/24/09 11:26:39 AM