Datasheet

the basic elements of RNA secondary structure; they’re made up of loops (the
unpaired C-U in Figure 1-8) and
stems (the paired regions).
Just for fun, verify for yourself that a palindromic RNA sequence results in a
perfect hairpin, with no loop. While attempting to pair as many nucleotides
as possible, the RNA chain folds in space, resulting in a specific 3-D structure
that’s dictated by its sequences. As with proteins, the linear sequence of the
building blocks dictates the final 3-D shape. The biological function of RNA
molecules derives from their 3-D shapes or from their sequence complemen-
tarity with specific genes.
Computing (predicting) the final fold of an RNA molecule from its sequence is
a challenging problem that drove many historical developments in bioinfor-
matics. The recent discovery that small RNA molecules can switch off the
activity of a number of genes is what triggered a renewed interest in these
sticky sequences. (Go directly to Chapter 12 if your main interest is in RNA
bioinformatics.)
More on nucleic acid nomenclature
Don’t panic if you get the impression that books, courses, and the technical
literature all use many different words and abbreviations to designate the
building blocks of nucleic acids: That’s actually true — for example, you’ll find
“base,” “base pair,” “nucleoside,” and “nucleotide” — but note: These different
names designate slightly different chemical entities, and those differences are
irrelevant for us just now. So far we’ve used the term
nucleotide — abbreviated
nt (as in “a 400-nt-long sequence”). This way of labeling a sequence refers to
the length of the DNA (or RNA) molecules in terms of the number of positions
they have available for nucleotides. For instance, the sequence in Figure 1-5 is
5 nt long.
Notice that we say
number of positions rather than number of nucleotides. A
400-nt long DNA molecule has 400 positions for nucleotides, but it actually
contains twice that many (800) because every position contains a pair of
nucleotides. To make this clearer, DNA sequence sizes are often given in
base pairs, abbreviated bp. Thus the DNA sequence in Figure 1-5 is 5 bp
long. Larger units, such as
kb (1000 bp) or Mb (mega-bp) are also used.
DNA Coding Regions: Pretending
to Work with Protein Sequences
Of the hundreds of thousands of protein sequences found in current data-
bases, only a small percentage correspond to molecules that have actually
been isolated by somebody or experimented upon. That’s because determining
23
Chapter 1: Finding Out What Bioinformatics Can Do for You
05_089857 ch01.qxp 11/6/06 3:52 PM Page 23