Datasheet

16
Part I Laying the Foundation
Another consequence of null’s unknown value is that a null is not equal to another null.
Therefore, to test for the presence of null, SQL uses the
IS NULL syntax.
Both of these behaviors can be overridden. SQL Server will ignore nulls in expression when
the connection setting
concat_null_yields_null is set to off. The ANSI nulls setting con-
trols whether null can be equal to another null.
Null Controversy
Most database developers design columns that allow nulls when they make sense.
Extreme database purists detest nulls and require that any relational database model not
allow nulls. One method of avoiding nulls is to separate any null columns into a separate
supertype/subtype table. This method merely replaces a nullable column with a nullable row,
which requires a left-outer join to test for the presence of data or retrieve data. The resulting
complexity affects not only performance but also data integrity. Instead of retrieving data
with a null column, the application developer must be aware of the subtype table and be flu-
ent with left-outer joins. This view focuses on a misguided understanding of a single data
store objective, data integrity, at the expense of performance and usability.
Performance
Presenting readily usable information is a key aspect of the Information Architecture
Principle. Although the database industry has achieved a high degree of performance, the
ability to scale that performance to very large databases with more connections is still an
area of competition between database engine vendors.
Because physical disk performance is the most significant bottleneck, the key to performance
is reducing the number of physical page reads or writes required to perform a task. The five
primary performance factors all seek to reduce the number of physical page reads.
Design
The database schema design can dramatically affect performance. The physical design must
consider the query path. An overly complicated design, resulting in too many tables, requires
additional joins when retrieving data, and additional writes when inserting or updating data.
Some database schemas discourage set-based processing by requiring that data move from
one bucket to another as it’s processed. If server-side application code includes several cur-
sors, maybe it’s not the fault of the application developer but the database designer.
Set-Based Processing
Relational algebra, the SQL language, and relational database engines are all optimized to
work with sets of data. Poorly written code, whether row-by-row cursor-based code, or just
poorly written SQL, is a common source of poor performance.
Writing excellent set-based queries requires a creative understanding of joins and sub-
queries, as discussed in Chapter 9, “Merging Data with Joins and Unions,” and Chapter 10,
“Including Subqueries and CTEs.
Cross-
Reference
05_542567 ch01.qxp 9/27/06 9:58 PM Page 16