Datasheet

the basic elements of RNA secondary structure; they’re made up of loops (the

unpaired C-U in Figure 1-8) and

stems (the paired regions).

Just for fun, verify for yourself that a palindromic RNA sequence results in a

perfect hairpin, with no loop. While attempting to pair as many nucleotides

as possible, the RNA chain folds in space, resulting in a specific 3-D structure

that’s dictated by its sequences. As with proteins, the linear sequence of the

building blocks dictates the final 3-D shape. The biological function of RNA

molecules derives from their 3-D shapes or from their sequence complemen-

tarity with specific genes.

Computing (predicting) the final fold of an RNA molecule from its sequence is

a challenging problem that drove many historical developments in bioinfor-

matics. The recent discovery that small RNA molecules can switch off the

activity of a number of genes is what triggered a renewed interest in these

sticky sequences. (Go directly to Chapter 12 if your main interest is in RNA

bioinformatics.)

More on nucleic acid nomenclature

Don’t panic if you get the impression that books, courses, and the technical

literature all use many different words and abbreviations to designate the

building blocks of nucleic acids: That’s actually true — for example, you’ll find

“base,” “base pair,” “nucleoside,” and “nucleotide” — but note: These different

names designate slightly different chemical entities, and those differences are

irrelevant for us just now. So far we’ve used the term

nucleotide — abbreviated

nt (as in “a 400-nt-long sequence”). This way of labeling a sequence refers to

the length of the DNA (or RNA) molecules in terms of the number of positions

they have available for nucleotides. For instance, the sequence in Figure 1-5 is

5 nt long.

Notice that we say

number of positions rather than number of nucleotides. A

400-nt long DNA molecule has 400 positions for nucleotides, but it actually

contains twice that many (800) because every position contains a pair of

nucleotides. To make this clearer, DNA sequence sizes are often given in

base pairs, abbreviated bp. Thus the DNA sequence in Figure 1-5 is 5 bp

long. Larger units, such as

kb (1000 bp) or Mb (mega-bp) are also used.

DNA Coding Regions: Pretending

to Work with Protein Sequences

Of the hundreds of thousands of protein sequences found in current data-

bases, only a small percentage correspond to molecules that have actually

been isolated by somebody or experimented upon. That’s because determining

Chapter 1: Finding Out What Bioinformatics Can Do for You

05_089857 ch01.qxp 11/6/06 3:52 PM Page 23