Specifications
110 CHAPTER 6 Scalable Data Warehousing
Warehouse software. When the assembly process is complete, the vendor ships the appliance
to you using shockproof pallets. When it arrives, you remove the appliance from the pallets,
plug it into a power source, and connect it to your network.
Parallel Data Warehouse is a data warehouse appliance that includes all server, networking,
and storage components required to host a data warehouse. In addition, your purchase of
Parallel Data Warehouse includes cables, power distribution units, and racks. Furthermore, the
components have redundancy to prevent downtime caused by a failure. The vendor installs all
software at the factory and congures Parallel Data Warehouse to balance CPU, memory, and
disk space. After you receive the Parallel Data Warehouse at your location, you use a congu-
ration tool that Parallel Data Warehouse includes to complete the network setup and con-
gure appliance settings for your environment. You can also install Microsoft or third-party
software to use when copying data between your corporate network and the appliance.
Processing Architecture
A traditional data warehouse deployment of SQL Server is an SMP architecture, in which iden-
tical processors share memory on a single server. One physical instance of a database pro-
cesses all queries. You can improve performance by partitioning the data, thereby achieving
multi-threaded parallelization. You can add higher powered servers with more CPU, memory,
storage, and networking capacity to scale up, but the cost to scale up is high.
By contrast, Parallel Data Warehouse is an MPP architecture that uses multiple database
servers that operate together to process queries. Behind the scenes, each database server
runs one SQL Server instance with its own dedicated CPU, RAM, storage, and network
bandwidth. Each database managed by Parallel Data Warehouse is distributed across mul-
tiple database servers that execute Parallel Data Warehouse queries in parallel. Parallel Data
Warehouse’s architecture includes a controlling server to coordinate these parallel queries
and all other database activity across the multiple database servers. This controlling server
also presents the distributed database as a single logical database to users. If you need to
scale out the MPP hardware, you can simply add inexpensive commodity servers and storage
rather than expensive high-end servers and storage.
The Multi-Rack System
Parallel Data Warehouse is congured as a multi-rack system in which there is a control rack
and one or more data racks, as shown in Figure 6-1. Each rack is a collection of nodes, each of
which has a dedicated role within the appliance. These nodes transfer data among themselves
using an InniBand network that ships with the appliance. Only the nodes in the control rack
communicate with the corporate Ethernet network. The nodes in the data rack can export
tables to a corporate SMP SQL Server database by using the InniBand network.