14.2: The Complexity of Genomic DNA

Last updated
Save as PDF

Page ID: 88983

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

By the 1960s, when Roy Britten and Eric Davidson were studying eukaryotic gene regulation, they knew that there was more than enough DNA to account for the genes needed to encode an organism. It was also likely that DNA was more structurally complex than originally thought. They knew that cesium chloride (CsCl) density gradient centrifugation separated molecules based on differences in density and that fragmented DNA would separate into main and minor bands of different density in the centrifuge tube. The minor band was dubbed satellite DNA, recalling the Sputnik satellite that had recently been launched by Russia (or moons as satellites of planets!). DNA bands with different densities could not exist if the proportions of A, G, T, and C in DNA (already known to be species-specific) were the same throughout a genome. Instead, there must be regions of DNA that are richer in A-T than G-C pairs, and vice versa. Analysis of the satellite bands that moved farther on the gradient (i.e., that were denser) than the main band were indeed richer in G-C content. Those that lay above the main band were more A-T-rich.

Consider early estimates of how many genes it might take to make a human, mouse, chicken, or petunia: about a hundred thousand! Such estimates may have been based on notions of how many proteins a eukaryotic life requires. We know now that it takes fewer! Nevertheless, the genome of a typical eukaryote contains one hundred to one thousand times more DNA than necessary to account for even the inflated one hundred thousand gene estimate. So, how do we explain this extra DNA? Britten and Davidson’s elegant experiments to measure DNA renaturation kinetics revealed some physical characteristics of genes and what seemed to be excess DNA. Let’s look at these experiments in some detail.

14.2.1 The Renaturation Kinetic Protocol

The first step in a renaturation kinetic experiment is to shear DNA isolates to an average size of 10 Kbp (kilobase pairs) by pushing high molecular weight DNA through a hypodermic needle at constant pressure. The resulting double-stranded fragments (dsDNA fragments) are next heated to \(100^{\circ}C\) to denature (separate) the two strands. The solutions are then cooled to \(60^{\circ}C\) to allow the single-stranded DNA (ssDNA) fragments to slowly re-form complementary double strands. At different times after incubation at \(60^{\circ}C\), the partially renatured DNA was sampled, and ssDNA and dsDNA were separated and quantified. The experiment is summarized below in Figure 14.1.

Screen Shot 2022-05-23 at 6.00.11 PM.png — Figure 14.1: Renaturation kinetics: double-stranded DNA (dsDNA) is mechanically cut to ~10 kbp fragments, heated to denature the DNA, and then cooled to allow the ss DNAs find and renature with their complements.

14.2.2 Renaturation Kinetic Data

Britten and Davidson then plotted the percentage of DNA that had renatured over time. Figure 14.2 is the plot of data from a renaturation kinetics experiment using rat DNA, showing the rate of dsDNA formed at different times (out to many days!).

Screen Shot 2022-05-23 at 6.01.10 PM.png — Figure 14.2: Plot of rat dsDNA formed over time during renaturation of denatured DNA.

CHALLENGE

Why plot the time allowed for renaturation as the log of time rather than simply as time?

In this example, the DNA fragments could be placed in three main groups with different overall rates of renaturation. Britten and Davidson hypothesized that the dsDNA that had formed most rapidly was composed of sequences that must be more highly repetitive than the rest of the DNA. The rat genome also had a lesser amount moderately repeated (middle repetitive) dsDNA fragments (which took longer to anneal than the highly repetitive fraction), and even less of a very-slowly reannealing unique sequence DNA fraction (which took the longest time to re-anneal).

The latter sequences were so rare in fact, that it could take days for them to reform double strands, and they were classified as nonrepetitive, unique- (or nearly unique-) sequence DNA, as illustrated below in Figure 14.3.

Screen Shot 2022-05-23 at 6.03.15 PM.png — Figure 14.3: The three “phases” of the curve in Figure 14.2 have here been highlighted to identify the three fractions of repeated and almost unique DNA sequences in the rat genome.

It became clear that the rat genome, and in fact most eukaryotic genomes, consists of different classes of DNA that differ in their redundancy. From the graph, a surprisingly a large fraction of the genome was repetitive, to a greater or lesser extent.

238 Discovery of Repetitive DNA

When renaturation kinetics were determined for E. coli DNA, only one “redundancy class” of DNA was seen (Figure 14.4).

Screen Shot 2022-05-23 at 6.04.55 PM.png — Figure 14.4: Plot of *E. coli* dsDNA formed over time during renaturation of denatured DNA.

Based on E. coli gene-mapping studies and the small size of the E. coli “chromosome,” the reasonable assumption was that there is little room for “extra” DNA in a bacterial genome and that the single class of DNA on this plot must be unique-sequence DNA.

14.2.3 Genomic Complexity

Britten and Davidson defined the relative amounts of repeated and unique (or single-copy) DNA sequences in an organism’s genome as its genomic complexity. Thus, prokaryotic genomes have a lower genomic complexity than eukaryotes. Using the same data as is in the previous two graphs, Britten and Davidson demonstrated the difference between eukaryotic and prokaryotic genome complexity by a simple expedient. Instead of plotting the fraction of dsDNA formed vs time of renaturation, they plotted the percentage of reassociated DNA against the concentration of the renatured DNA multiplied by the time the DNA took to reanneal (the CoT value). When CoT values from rat and E. coli renaturation data are plotted on the same graph, you get the CoT curves in the graph in Figure 14.5.

Screen Shot 2022-05-23 at 6.07.17 PM.png — Figure 14.5: DNA complexity is revealed by plotting rat and *E. coli* DNA renaturation kinetics as the percentage of reassociated dsDNA over CoT (Concentration of reassociated dsDNA × time).

This deceptively simple extra calculation (from the same data!) allows direct comparison of the complexities of different genomes. These CoT curves tell us that ~100% of the bacterial genome consists of unique sequences (curve at the far right), compared to the rat genome, which has two DNA redundancy classes and (at the right of the curve), only a small fraction of unique- sequence DNA. Prokaryotic genomes are indeed largely composed of unique (nonrepetitive) sequence DNA that must include single-copy genes (or operons) that encode proteins, ribosomal RNAs, and transfer RNAs.

239 CoT Curves and DNA Complexity Explained!

14.2.4 Functional Differences between CoT Classes of DNA

The next questions, of course, were what kinds of sequences are repeated and what kinds of sequences are “unique” in eukaryotic DNA? Eukaryotic satellite DNAs, transposons, and ribosomal RNA genes were early suspects.

To start answering these questions, satellite DNA was isolated from the CsCl gradients, made radioactive, and then heated to separate the DNA strands. In a separate experiment, renaturing rat DNA was sampled at different times of renaturation. The isolated CoT fractions were once again denatured and mixed with heat-denatured radioactive satellite DNA probe. The mixture was then cooled a second time to allow renaturation. The experimental protocol is illustrated in Figure 14.6.

Screen Shot 2022-05-23 at 6.10.17 PM.png — Figure 14.6: Isolated CoT fractions of eukaryotic DNA were separately mixed with radioactive satellite DNA isolated from a CsCl density gradient, then reheated to denature all of the DNA, and finally, cooled to allow satellite DNA find complementary fragments in the CoT fractions.

The results of this experiment showed that radioactive satellite DNA only annealed to DNA from the low-CoT fraction of DNA. Satellite DNA is thus highly repeated in the eukaryotic genome.

In similar experiments, isolated radioactive rRNAs formed radioactive RNA-DNA hybrids when mixed and cooled with the denatured middle CoT of eukaryotic DNA. Thus, rRNA genes were moderately repetitive. With the advent of recombinant DNA technologies, the redundancy of other kinds of DNA was explored by probing renatured DNA fractions from renaturation kinetics experiments using cloned genes encoding rRNAs, mRNAs, transposons, and other sequences. Table 14.1 (below) summarizes the results of such experiments.

Screen Shot 2022-05-23 at 6.11.49 PM.png

The table compares properties (lengths, copy number, functions, percentage of the genome, location in the genome, etc.) of different kinds of repetitive-sequence DNA. The resulting observations—that most of a eukaryotic genome is made up of repeated DNA and that transposons can be as much as 80% of a genome—came as a surprise! We’ll focus next on the different kinds of transposable elements.

240 Identifying Different Kinds of DNA Each CoT Fraction

241 Some Repetitive DNA Functions

Search

Text Color

Text Size

Margin Size

Font Type

CHALLENGE