Datasheet
Figure 1-11 Data is usually stored in an Excel spreadsheet using the flat-file format.
In order to get the customer information for each invoice, several fields exist
for customer-specific information: customer name, address, city, and so on.
Because most firms sell to customers more than once, you can see that cus-
tomer information is often repeated. Duplicate information is one of the main
drawbacks of the flat-file format.
What is wrong with duplicate data? Initially, the duplicate data may not
appear to be a potential source of future problems, but upon further examina-
tion, you discover the shortcomings. First is file size. Duplicate data wastes
space, both on the computer hard drive, where the file is stored, and in the
computer’s memory, where the data resides when it is being operated on.
Although the enormous amounts of memory that are standard with today’s
machines go a long way toward handling excessive demands, having dupli-
cate data wastes valuable computer space and resources. The duplicate infor-
mation is not valuable to us. In fact, it leads to problems, particularly when
data needs to be updated. As you can see in Figure 1-11, a number of different
invoices have been recorded for CORRUL Corp. You can also see that the
information for CORRUL Corp. is repeated for every invoice. What if COR-
RUL Corp.’s customer information changes, though? What if it acquires new
office space and you want to reflect this change of location in your data? You
would have to update the change in several different places, ensuring that
every invoice correctly maps back to its relevant customer information.
Although excellent functions are available that can find and replace data in
Excel, there is still a danger that you might not make all of the updates cor-
rectly. Whenever you are changing the same, duplicate information, the risk of
introducing unintentional errors is always present. This could significantly
affect your data analysis. For example, suppose that CORRUL Corp. moved to
a different city. Figure 1-12 demonstrates how easy it is to incorrectly update
the data.
20 Chapter 1
05_59978X ch01.qxp 12/1/05 7:46 PM Page 20