Specifications

Data Management CHAPTER 6 115
any necessary data to each compute node so that it can process the query in parallel with
other compute nodes without requiring data from other locations during processing. This
feature, called data colocation, ensures that each compute node can execute its portion of the
parallel query with no effect on the query performance of the other compute nodes.
Hub-and-Spoke Architecture
Rather than using Parallel Data Warehouse exclusively for a data warehouse, you can use a
hub-and-spoke architecture to support both a corporate data warehouse and special purpose
data marts. These data marts reside on servers outside of the appliance. The data warehouse
at the hub is the primary data source for the spokes. A spoke can be a data mart, a host for
Analysis Services, or even a development or test environment. You can enforce business rules
and data quality standards for all data at the hub, and then you can quickly copy data as
needed from the Parallel Data Warehouse to the spokes residing outside the appliance.
Data Management
Loading, processing, and backing up terabytes of data with balanced hardware resources is
vitally important in a very large data warehouse. Parallel Data Warehouse uses carefully bal-
anced hardware to maximize the efciency of each hardware component and avoid the need
to over-purchase hardware. Parallel Data Warehouse accomplishes this goal of balancing
speed and hardware by using a shared nothing (SN) architecture.
In addition to the shared nothing architecture, there are other differences from other edi-
tions of SQL Server to notice. For example, SQL commands to create a database and tables
are slightly different from their standard Transact-SQL counterparts. In addition, although
Parallel Data Warehouse supports most of the SQL Server 2008 data types, there are a few
exceptions. Last, the architecture requires a new approach to query processing and data
load processing.
Shared Nothing Architecture
An SN architecture is a type of architecture in which each node of a system uses its own CPU,
memory, and storage to avoid performance bottlenecks caused by resource contention with
other nodes. In Parallel Data Warehouse, each compute node contains its own data, CPU, and
storage to function as a self-sufcient and independent unit. Although the SN architecture
is gaining popularity as a data warehousing architecture, performance can still be slow when
a parallel query must rst move data among the nodes before execution. When a SQL join
operation requires data that is not already on the requisite compute nodes, Parallel Data
Warehouse copies data to these nodes temporarily for use during query execution.