44063c01.qxd:WroxBeg 9/12/06 10:30 PM Page 1 1 TE RI AL Database Fundamentals HT ED MA Before you start to look at accessing databases from C# code, there are a few basics that you need to know. It is necessary to have a formal definition of what is meant by the term database, and that’s the first thing you examine in this chapter.
44063c01.qxd:WroxBeg 9/12/06 10:30 PM Page 2 Chapter 1 What Is a Database? It’s fair to say that most computing applications make use of data in one form or another, whether in an obvious way or with more subtlety. In most cases this data is persistent, which means that it is stored externally to the application and doesn’t disappear when the application isn’t running.
44063c01.qxd:WroxBeg 9/12/06 10:30 PM Page 3 Database Fundamentals scenario does not apply. If you have a phone number and want to know to whom it belongs, you won’t find a phone book particularly useful. While it is technically possible, it’s not something you’d want to do unless you have a lot of time on your hands. And it would be a lot easier to simply dial the number and ask to whom you were speaking.
4063c01.qxd:WroxBeg 9/12/06 10:30 PM Page 4 Chapter 1 The following sections examine tables; relationships; an important property of relational databases that emerges from these constraints — normalization; and one of the underlying mechanisms by which all this is achieved: keys. Tables Characteristically, an RDBMS splits the data stored in a database into multiple locations, each of which contains a specific set of data. These locations are called tables.
44063c01.qxd:WroxBeg 9/12/06 10:30 PM Page 5 Database Fundamentals Keys Within database tables it is often important to uniquely identify rows, especially when defining relationships. The position of a row isn’t enough here, because rows may be inserted or deleted, so any row’s position might change. The order of rows is also an ambiguous concept because rows may be ordered by one or more columns in a way that varies depending on how you are using the data in the table — this is not a fixed definition.
44063c01.qxd:WroxBeg 9/12/06 10:30 PM Page 6 Chapter 1 both important and extremely useful. For example, in a sales database you might want to record both the products on sale and the orders placed for products. You can envisage a single table containing all this information, but it is far easier to use multiple tables — one for products, and one for orders. Each row in the orders table would be associated with one or more rows in the products table.
44063c01.qxd:WroxBeg 9/12/06 10:30 PM Page 7 Database Fundamentals Figure 1-3: The PhoneBookEntry table with a primary key Showing one-to-many links as a line with a key at one end and linked circles at the other (the “infinity” symbol) is just one way to display this information. You may also see lines with a 1 at one end and an ellipsis at the other. In this book, however, you’ll see the format shown here throughout.
44063c01.qxd:WroxBeg 9/12/06 10:30 PM Page 8 Chapter 1 Object Oriented Database Management Systems There are some situations where the integration between applications and databases must be far stronger than is possible when using RDBMSes, again mostly in high-performance applications. One approach that’s been quite successful is for databases to store objects directly so that OOP applications can store and retrieve objects directly, without resorting to serialization techniques.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 9 Database Fundamentals In this section you look at each of these, getting a flavor for them but without going into too much depth at this stage. Joins In the earlier relationship discussion, it may have seemed like accessing related data from multiple tables might involve a convoluted procedure. In actual fact — luckily for us — that isn’t the case. It is possible to fetch data from multiple tables simultaneously, and end up with a single set of results.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 10 Chapter 1 in this way, the DBMS is capable of optimizing things further for you. It might, for example, cache view data so that retrieving its compound information becomes much faster than querying individual tables might be. In addition, views can be defined in some quite complicated ways using functions, including userdefined functions, such that your applications can retrieve highly processed data with ease.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 11 Database Fundamentals Indexes Indexes are another way of optimizing performance by letting the DBMS know how you intend to make use of data. An index is an internally maintained table in the database that enables quick access to a row (or rows) containing specific data, such as a particular column value, a column value that contains a certain word, and so on.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 12 Chapter 1 their usual account details, and this login is then forwarded on to the database by any applications that are used. An advantage here is that at no point does the application need to be aware of the security details entered by the user — it simply forwards them on from its context.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 13 Database Fundamentals that the DBMS is unaware of whether rows are “checked out” at any given time, which makes it impossible to implement pessimistic concurrency control. There are, however, ways in which optimistic concurrency control can be implemented, as you will see later in the book. Transactions It is often essential to perform multiple database operations together, in particular where it is vital that all operations succeed.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 14 Chapter 1 Backups After the key tasks of being able to add, edit, and delete data in a database, perhaps the most important function of a DBMS is to enable you to back that data up. However good computers are, and however good the software you use, there are some things that are impossible to predict. Hardware failure happens — it’s a fact of life, albeit an unpleasant one.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 15 Database Fundamentals A good management tool does far more than let you look at and edit data in tables. It enables you to create and manage databases, configure backups, handle mirroring and partitioning, add stored procedures and views, create database diagrams, and administer security.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 16 Chapter 1 DB2, Oracle, and SQL Server DB2, Oracle, and SQL Server are the three RDBMS heavy-hitters of the database world. DB2 (www306.ibm.com/software/data/db2) is made by IBM — in fact, there’s a whole family of DB2 products — and SQL Server (www.microsoft.com/sql) is made by Microsoft, which gives you as much an idea of the intended audience as anything else. Oracle (www.oracle.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 17 Database Fundamentals that doesn’t depend on the exact RDBMS you are using. However, in being standardized it suffers by not giving access to proprietary features, and it is slower because its instructions must be translated into native instructions. In this book, you learn specifically about SQL Server 2005 data access with C#, using the Express Edition for simplicity (and because it’s free to download and use).
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 18 Chapter 1 Edition interface. In addition, there is a free tool — Microsoft SQL Server Management Studio Express — that mimics the more advanced administration tool that ships with the full version of SQL Server 2005. The installation of this tool is shown in Appendix A. In this book you’ll use both of these techniques, and in most cases you’ll be writing no more SQL code than is necessary.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 19 Database Fundamentals statement, for example. SQL statements are often made up of several parts, which are known as clauses. SQL statements may also contain embedded (nested) statements known as subqueries. SQL statements are executed, and may return one or more results. Multiple statements can be executed as a batch, which simply means “execute statements sequentially in the order that they are written.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 20 Chapter 1 It is worth noting that this shorthand notation can result in some (minor) overhead and reduction in performance because the RDBMS must work out what the columns are for you. That isn’t something to worry too much about, and certainly not when prototyping code, but it may be something you return to when optimizing the performance of an application if you used it a lot when developing your code. The best practice is to avoid using *.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 21 Database Fundamentals useful in different circumstances. Briefly, the types of join you can perform are as follows (don’t worry too much about these definitions at this stage — they will be explained in more detail shortly): ❑ Cross join: Each row in table A is joined to each row in table B. ❑ Inner join: Rows in table A and table B are joined according to the data contained in rows and the criteria you choose to compare these rows by.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 22 Chapter 1 Figure 1-7: Result of a cross join between Product and ProductCategory tables There are six results because every row in Product (3) is combined with every row in ProductCategory (2): 3×2=6. Note that there are two ProductCategoryId columns — one from each table.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 23 Database Fundamentals Adding a where clause identical to the one used earlier for the cross-join query would result in a single row being returned — row 2 in Figure 1-8. Inner joins are powerful — even more so when you consider that they can link any number of tables together. For example: SELECT [Order].OrderId, [Order].CustomerName, [Order].CustomerAddress, Product.ProductId, Product.ProductName, Product.ProductCost, ProductCategory.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 24 Chapter 1 You’ll return to inner joins in subsequent chapters. One last note for now: It is possible to use operators other than = to join tables. For numeric fields you can, for example, use >. The results of this are more difficult to illustrate (and to explain the usefulness of), but it’s something to bear in mind for later. Understanding Outer Joins The other types of join that you can perform are outer joins.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 25 Database Fundamentals This time, the result set is different, as shown in Figure 1-11. Figure 1-11: The result of a right outer join between Product and ProductCategory based on ProductCategoryId columns The additional row (row 4 in Figure 1-11) includes the data from the new column in ProductCategory combined with a bunch of null values in the columns taken from Product.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 26 Chapter 1 This may be simple, but there are several points to note: ❑ The INTO keyword is optional, but it makes the query easier to read. If you want, however, you can omit it from your queries. ❑ The list of columns specified may be a subset of the columns defined in the table for several reasons: ❑ Some columns may allow null values, in which case adding data to them is optional.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 27 Database Fundamentals The INSERT keyword can also be used to insert multiple columns into a table at the same time. However, this cannot be achieved by specifying all of the column values as part of the statement. Instead, you must specify column values indirectly by using a SQL statement that returns a set of data — for example, a select query. In insert statements of this form you must omit the VALUES keyword.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 28 Chapter 1 This query would delete all the rows from a table called ProductA that didn’t have ProductName values that started with the letter A. Here’s another example: DELETE FROM ProductCategory WHERE ProductCategoryId = ‘3bd514c0-97a1-11da-a72b-0800200c9a66’ Because ProductCategory.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 29 Database Fundamentals For example, the following command, CREATE DATABASE, would create a new database within the DBMS: CREATE DATABASE MyDatabaseOfWonders Once created, you can add tables using additional SQL statements, although first you need to specify the name of the database where the statements will execute.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 30 Chapter 1 Then you would add the foreign key, in the form of another constraint: ALTER TABLE [dbo].[Product] WITH CHECK ADD CONSTRAINT [FK_Product_ProductCategory] FOREIGN KEY ([ProductCategoryId]) REFERENCES [dbo].[ProductCategory] ([ProductCategoryId]) GO Here the FK_Product_ProductCategory foreign key is added, linking the Product.ProductCategoryId column with the ProductCategory.ProductCategoryId column.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 31 Database Fundamentals The markup in XML documents consists of data enclosed in elements, where that data may consist of nested (child) elements. Every XML document contains a single root (or document) element, and elements nested within the root element make up a hierarchy of data. Apart from the root element, XML documents may contain a single XML declaration, and zero or more processor directives.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 32 Chapter 1 Storing XML Data In the XML document shown in the previous section, it is fairly obvious how the data might map to a database table. For instance, you could have a FoodStuff table containing columns for Category (possible a foreign key field linking to a separate Category table), text columns for Name, Size, and Rating, and a bit type column for IsNasty.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 33 Database Fundamentals This is not a complete, legal XML document as it stands (it has multiple root elements for one thing), but it would be easy to turn it into one.
44063c01.qxd:WroxBeg 9/12/06 10:31 PM Page 34 Chapter 1 Exercises 1. 2. Database tables must include primary keys. Is this statement true or false? Which of the following are actual types of joins between tables? a. Inner joins b. Sideways joins c. Internal joins d. Left outer joins e. Dovetail joins 3. If you wanted to perform two update queries in which either both queries must succeed or both must fail, what technology would you use? 4.