Skip to main content
Biology LibreTexts

11.3: Gene and Protein Colinearity and Triplet Codons

  • Page ID
    88963
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Serious efforts to understand how proteins are encoded began after Watson and Crick used the experimental evidence of Maurice Wilkins and Rosalind Franklin (among others) to determine the structure of DNA. Most hypotheses about the genetic code assumed colinearity, namely that DNA (i.e., genes) and polypeptides were colinear.

    11.3.1. Colinearity of Genes and Proteins

    For genes and proteins, colinearity just means that the length of a DNA sequence in a gene is proportional to the length of the polypeptide encoded by the gene. The gene-mapping experiments in E. coli (which we have already discussed) certainly supported this hypothesis. Figure 11.2 illustrates the concept of the colinearity of genes and proteins.

    Screen Shot 2022-05-20 at 3.29.07 PM.png
    Figure 11.2: Colinearity of genes and proteins (polypeptides) in bacteria

    If the genetic code is colinear with the polypeptides it encodes, then a one-base codon obviously would not work because such a code would only account for four amino acids. A two-base genetic code also doesn’t work because it could only account for sixteen (\(4^2\)) of the twenty amino acids found in proteins. However, three-nucleotide codons could code for a maximum of sixty-four (\(4^3\)) amino acids, more than enough to encode the twenty amino acids. And of course, a four-base code also works; it satisfies the expectation that genes and proteins are colinear, with the “advantage” that there would be 256 (\(4^4\)) possible codons to choose from.

    11.3.2 How Is a Linear Genetic Code “Read” to Account for All genes in the genome of an Organism?

    George Gamow (a Soviet-American physicist working at the George Washington University) was the first to propose triplet codons to encode the twenty amino acids. This was the simplest hypothesis to account for the colinearity of genes and proteins, and for encoding twenty amino acids. Once the concept of colinearity was accepted, a remaining concern was: Is there enough DNA in an organism’s genome to fit all the codons needed to make all of its proteins? Assuming that genomes did not have a lot of extra DNA lying around, scientists still wondered how genetic information might be compressed into short DNA sequences consistent with colinearity and assumptions about the number of genes required by an organism. One idea assumed that there were twenty meaningful three-base codons (one for each amino acid) and forty-four meaningless codons, and that the meaningful codons in a gene (i.e., an mRNA) would be read and translated in an overlapping manner. A code where codons overlap by one base is shown in Figure 11.3 (below). You can figure out how compressed a gene could get with codons that overlapped by two bases.

    Screen Shot 2022-05-20 at 3.32.01 PM.png
    Figure 11.3: A single base overlapping genetic code would fit more genetic information in less DNA!

    As attractive as an overlapping codon hypothesis was in achieving genomic economies, it sank of its own weight almost as soon as it was floated! If you look carefully at the example above, you can see that each succeeding amino acid would have to start with a specific base. A look back at the table of sixty-four triplet codons quickly shows that only one of sixteen amino acids, those that begin with a C, can follow the first one in the illustration. Based on the amino-acid sequences already accumulating in the literature, it was clear that virtually any amino acid could follow another in a polypeptide. Therefore, overlapping genetic codes are untenable. The genetic code must be nonoverlapping!

    Sydney Brenner and Francis Crick performed elegant experiments directly demonstrating the nonoverlapping genetic code. They showed that bacteria with a single base deletion (and, likewise, a double base deletion) in the coding region of a gene failed to make the expected protein. On the other hand, a bacterium containing a mutant version of a gene in which three bases were deleted was able to make the protein. But the protein it made was less active than the protein made by bacteria with genes that had no deletions.

    CHALLENGE

    Here are two questions: (1) The protein made by Brenner and Crick was less active than the one in normal cells - why? (2) What would happen if four consecutive nucleotides were deleted from a gene?

    The next issue was whether there were only twenty meaningful codons and forty-four meaningless ones. If only twenty codons encoded amino acids, how would the translation machinery know the correct twenty to translate? What would prevent the translational machinery from “reading the wrong” triplets (i.e., reading an mRNA out of phase)?

    For example, if the translation machine began reading an mRNA from the second or third base of a codon, wouldn’t it likely encounter a meaningless three-base sequence in short order? An alternate hypothesis speculated that the code was punctuated. That is, perhaps there were the chemical equivalents of commas between the meaningful triplets. The commas would of course, be additional nucleotides. In this punctuated code, the translation machinery would recognize the “commas” and would not translate any meaningless three-base triplets, avoiding out-of-phase translation attempts. Of course, a code with nucleotide commas would increase the amount of DNA needed to specify a polypeptide by a third! Finally, Crick proposed the Comma-less Genetic Code. He divided the sixty-four triplets into twenty meaningful codons (encoding the amino acids) and forty-four meaningless ones that did not encode amino acids. His code was such that when the twenty meaningful codons are placed in any order, any of the triplets read in overlap would be among the forty-four meaningless codons. In fact, he could arrange several different sets of twenty and forty-four triplet codons with this property! Crick had cleverly shown how to read the triplets in correct sequence without nucleotide commas.

    202 Speculations about a Triplet Code

    Of course, we know now that while the genetic code is indeed comma-less, it is not comma-less in the sense that Crick had envisioned. What’s more, thanks to experiments to be described next, we know that ribosomes read the correct codons in the right order because they know exactly where to start reading the mRNA!


    This page titled 11.3: Gene and Protein Colinearity and Triplet Codons is shared under a not declared license and was authored, remixed, and/or curated by Gerald Bergtrom.

    • Was this article helpful?