Datasheet

Reading protein sequences from N to C
The twenty amino-acid molecules found in proteins have different bodies
(their characteristic residues, listed in Table 1-1) — but all have the same
pair of
hooks — NH
2
and COOH. These groups of atoms are used to form the
so-called
peptidic bonds between the successive residues in the sequence.
Figure 1-1 shows free individual amino acids floating about, displaying their
hooks for all to see.
13
Chapter 1: Finding Out What Bioinformatics Can Do for You
Seven additional amino acid codes
When you work with databases or analysis pro-
grams, you’re likely to have some unusual let-
ters popping up now and then in your protein
sequences. These letters are either used to
designate exotic amino acids, or are used to
denote various levels of ambiguity — that is, a
total lack of information — about certain posi-
tions in the sequence. We’ve listed these par-
ticular letters in the following table.
The B and Z codes (which are now becoming
obsolete) indicated how hard it was to distin-
guish between Asp and Asn (or Glu and Gln) in
the early days of protein sequence determina-
tion. In contrast, the J code shows how difficult
it is to distinguish between Ile and Leu using
mass spectrometry, the latest sequencing tech-
nique. The Pyl and Sec exotic amino acids are
specified by the UAG (Pyl) and UGA (Sec) stop
codons read in a specific context. The
X
code is
still very much used as a placeholder letter
when you don’t know the amino acid at a given
position in the sequence. Alignment programs
use “
-” to denote positions apparently missing
from the sequence.
Seven Codes for Ambiguity or Exceptional Amino Acids
1-Letter Code 3-Letter Code Meaning
B Asn or Asp Asparagine or aspartic acid
J Xle Isoleucine or leucine
O (letter) Pyl Pyrrolysine
U Sec Selenocysteine
Z Gln or Glu Glutamine or glutamic acid
X Xaa Any residue
-- ----- No corresponding residue (gap)
05_089857 ch01.qxp 11/6/06 3:52 PM Page 13