Specifications

Data Management CHAPTER 6 115

any necessary data to each compute node so that it can process the query in parallel with

other compute nodes without requiring data from other locations during processing. This

feature, called data colocation, ensures that each compute node can execute its portion of the

parallel query with no effect on the query performance of the other compute nodes.

Hub-and-Spoke Architecture

Rather than using Parallel Data Warehouse exclusively for a data warehouse, you can use a

hub-and-spoke architecture to support both a corporate data warehouse and special purpose

data marts. These data marts reside on servers outside of the appliance. The data warehouse

at the hub is the primary data source for the spokes. A spoke can be a data mart, a host for

Analysis Services, or even a development or test environment. You can enforce business rules

and data quality standards for all data at the hub, and then you can quickly copy data as

needed from the Parallel Data Warehouse to the spokes residing outside the appliance.

Data Management

Loading, processing, and backing up terabytes of data with balanced hardware resources is

vitally important in a very large data warehouse. Parallel Data Warehouse uses carefully bal-

anced hardware to maximize the efciency of each hardware component and avoid the need

to over-purchase hardware. Parallel Data Warehouse accomplishes this goal of balancing

speed and hardware by using a shared nothing (SN) architecture.

In addition to the shared nothing architecture, there are other differences from other edi-

tions of SQL Server to notice. For example, SQL commands to create a database and tables

are slightly different from their standard Transact-SQL counterparts. In addition, although

Parallel Data Warehouse supports most of the SQL Server 2008 data types, there are a few

exceptions. Last, the architecture requires a new approach to query processing and data

load processing.

Shared Nothing Architecture

An SN architecture is a type of architecture in which each node of a system uses its own CPU,

memory, and storage to avoid performance bottlenecks caused by resource contention with

other nodes. In Parallel Data Warehouse, each compute node contains its own data, CPU, and

storage to function as a self-sufcient and independent unit. Although the SN architecture

is gaining popularity as a data warehousing architecture, performance can still be slow when

a parallel query must rst move data among the nodes before execution. When a SQL join

operation requires data that is not already on the requisite compute nodes, Parallel Data

Warehouse copies data to these nodes temporarily for use during query execution.