Datasheet

Part I ✦ Laying the Foundation

Another consequence of null’s unknown value is that a null is not equal to another null.

Therefore, to test for the presence of null, SQL uses the

IS NULL syntax.

Both of these behaviors can be overridden. SQL Server will ignore nulls in expression when

the connection setting

concat_null_yields_null is set to off. The ANSI nulls setting con-

trols whether null can be equal to another null.

Null Controversy

Most database developers design columns that allow nulls when they make sense.

Extreme database purists detest nulls and require that any relational database model not

allow nulls. One method of avoiding nulls is to separate any null columns into a separate

supertype/subtype table. This method merely replaces a nullable column with a nullable row,

which requires a left-outer join to test for the presence of data or retrieve data. The resulting

complexity affects not only performance but also data integrity. Instead of retrieving data

with a null column, the application developer must be aware of the subtype table and be flu-

ent with left-outer joins. This view focuses on a misguided understanding of a single data

store objective, data integrity, at the expense of performance and usability.

Performance

Presenting readily usable information is a key aspect of the Information Architecture

Principle. Although the database industry has achieved a high degree of performance, the

ability to scale that performance to very large databases with more connections is still an

area of competition between database engine vendors.

Because physical disk performance is the most significant bottleneck, the key to performance

is reducing the number of physical page reads or writes required to perform a task. The five

primary performance factors all seek to reduce the number of physical page reads.

Design

The database schema design can dramatically affect performance. The physical design must

consider the query path. An overly complicated design, resulting in too many tables, requires

additional joins when retrieving data, and additional writes when inserting or updating data.

Some database schemas discourage set-based processing by requiring that data move from

one bucket to another as it’s processed. If server-side application code includes several cur-

sors, maybe it’s not the fault of the application developer but the database designer.

Set-Based Processing

Relational algebra, the SQL language, and relational database engines are all optimized to

work with sets of data. Poorly written code, whether row-by-row cursor-based code, or just

poorly written SQL, is a common source of poor performance.

Writing excellent set-based queries requires a creative understanding of joins and sub-

queries, as discussed in Chapter 9, “Merging Data with Joins and Unions,” and Chapter 10,

“Including Subqueries and CTEs.”

Cross-

Reference

05_542567 ch01.qxp 9/27/06 9:58 PM Page 16