Datasheet

19
Chapter 1: What’s in a Data Warehouse?
Although the vendor community has consolidated, innovation hasn’t ceased.
More cost-effective solutions have emerged, led by Microsoft enabling
small and mid-sized businesses to implement data warehousing solutions.
Additionally, less expensive alternatives are emerging from a new set of
vendors, those within the open source community, including vendors such
as Pentaho and Jaspersoft. Open source business intelligence tools enable
corporate application vendors to embed data warehousing solutions into
their software suites. And other innovations have emerged, including data
warehouse appliances from vendors such as Netezza and DATAllegro
(acquired by Microsoft), and performance management appliances that
enable real-time performance monitoring. These innovative solutions can
also provide cost savings because they’re often plug-compatible to legacy
data warehouse solutions.
While time ticks by, you need to have a plan in place before you begin your
data warehousing process. Know the focus of what you’re trying to do and
the questions you’re likely to be asking. Will you be asking mostly about
sales activity? If so, put plans in place for regular monthly (or weekly or even
daily) extractions of data about customers, the products they buy, and the
amounts of money they spend. If you work at a bank and your business focus
is managing the risk across loan portfolios, for example, get information from
the bank’s applications that handle loan payments, delinquencies, and other
data you need; then, add in data from the credit bureau about your customers’
respective overall financial profiles.
Is a Bigger Data Warehouse
a Better Data Warehouse?
A common misconception that many data warehouse aficionados hold is that
the only good data warehouse is a big data warehouse — an enormously big
data warehouse. Many people even take the stance that unless they have
some astronomically large number of bytes stored, it isn’t truly a data ware-
house. “Five hundred gigabytes? Okay, that’s a real data warehouse; it would
be a better data warehouse, however, if it had at least a terabyte (1 trillion
bytes) of data. Twenty-five gigabytes? Sorry, that’s a data mart, not a data
warehouse.” (See Chapter 4 for a discussion of the differences between data
marts and data warehouses.)
The size of a data warehouse is a characteristic — almost a by-product — of
a data warehouse; it’s not an objective. No one should ever set out with a
mission to “build a 500-gigabyte data warehouse that contains (whatever).”
05_407479-ch01.indd 1905_407479-ch01.indd 19 1/26/09 7:23:41 PM1/26/09 7:23:41 PM