Datasheet

SQL and Relational Database Management Systems 1

The process of grouping the relevant data together, eliminating redundancies along

the way, is called

normalization

and is discussed in Chapter 2, ‘‘Fundamental SQL

Concepts and Principles.’’ It is not part of SQL

per se

, but it does impose limits on the SQL query

efﬁciency.

There is no theoretical limit to the number of rows a table could have, although some imple-

mentations impose restrictions. Also, there are (or at least ought to be) practical considerations

to the limits: data retrieval speed, amount of storage, and so on.

Relationships

Tables in RDBMS might or might not be related. As mentioned before, RDBMS is built upon

relationships between tables, but unlike in legacy databases (hierarchical and network), these

relations are based solely on the values in the table columns — these relationships are meaningful

in logical terms, not in low-level computer speciﬁc pointers. Let’s take the example of our ﬁctitious

order entry database (the one that you will design, build, and use throughout the book). The

ORDER HEADER table is related to the CUSTOMER table since both of these tables have a common set of

values. The ﬁeld

ORDHDR CUSTID FN (customer ID) in ORDER HEADER (and its values) corresponds

CUST ID N in CUSTOMER. The ﬁeld CUST ID N is declared to be a primary key for the CUSTOMER

table and also is the target for a foreign key from the ORDER HEADER table (in the form of the ﬁeld

ORDHDR CUSTID FN).

Primary key

The primary key holds more than one job in an RDBMS. As mentioned previously, this is an important

component of relationships. It also can carry ordinary data, such as a department number, part code,

or employee ID; but its primary role is to identify each record in a table uniquely.

In the days of legacy databases, the records were always stored in some predeﬁned order — if

such an order had to be broken (because somebody had inserted records in a wrong order or

business rule was changed), then the whole table (and, most likely, the whole database) had to

be rebuilt. The RDBMS abolishes this ﬁxed-order rule for the records, but it still needs some

mechanism of identifying the records uniquely, and the primary key, based on the idea of a ﬁeld

(or ﬁelds) that contains set unique values, serves exactly this purpose.

In an RDBMS, the primary key is just a particular case of

candidate key

—asetof

one or more columns that has unique value for every row in this table. Each table

can have any number of candidate keys, but only one of them can be declared a primary key.

By its very nature, the primary key cannot be empty. This means that in a table with a deﬁned

primary key, the primary key ﬁelds must contain non-null data for each record.

Although it is not a requirement to have a primary key on each and every table, it is

considered good practice to have one. In fact, many RDBMS implementations would