Datasheet

ManualsBrandsWiley Manualsbooks978-0-470-08985-9

sequencing technologies improved steadily, but such technologies still

tended to concentrate on mining individual genes for information. During this

period, biologists were mostly sequencing DNA fragments that were a few

thousand nucleotides in length, simply because they were interested in spe-

cific genes that they had started working on years before. Most of the bioin-

formatics tools available today were created during that period. They include

 All basic sequence-alignment programs

 Phylogenetic and classification methods

 Various display tools adapted to relatively small-sequence objects (such

as protein sequences no more than a few thousand characters long)

Genomics: Getting all the genes at once

The determination of the first complete genome sequence terminated the

gene-by-gene routine and initiated the era of

genomics, the genetic mapping,

physical mapping, and sequencing of entire genomes. As a consequence, the

DNA sequences we have to work with now are much longer — close to a

million-bp in length for microbes and up to several billion-bp in length for

animals and humans. This revolution called for the design of new bioinformatic

tools and databases capable to store, query, analyze, and display these huge

objects in a user-friendly manner. Chapters 3, 5, and 7 present some of the

questions that biologists address at the genome scale, and show the relevant

bioinformatic tools in action.

In contrast to the early days of the gene-by-gene approach, DNA sequences

are now often obtained (along with the presumed protein sequences derived

from those DNA sequences) without any prior knowledge of what is actually

there. In essence, genes are both sequenced

and discovered at the same time.

This development prompted the emergence of an entirely new branch of

bioinformatics devoted to the parsing of large DNA sequences into their

components (genes, transcription units, protein-coding regions, regulatory

elements, and so forth). This first pass is then followed by a longer phase of

genome

annotation, where the biological functions of these various elements

are (more or less tentatively) predicted. Part IV of this book presents you

with some of these most advanced techniques.

Figure 1-10, representing the whole genome of the bacterium

Rickettsia

conorii,

illustrates this new level of complexity. This circular DNA molecule is

1.3 million bp long, on the small side for a bacterium. Each little rectangle in

the two most external circles of features (one circle per strand) corresponds

to a protein-coding gene in the circular genome. Each rectangle corresponds

to approximately 1000 bp. Nobody knew which genes — or which proteins —

were in that bacterium before the sequencing started. Almost everything we

know now about this bacterium (and many others we can describe as fairly

inaccessible, such as those thriving on the ocean floor near volcanic vents at

100°C) has been derived from bioinformatic analyses.

Chapter 1: Finding Out What Bioinformatics Can Do for You

05_089857 ch01.qxp 11/6/06 3:52 PM Page 27