# 4.2: Analysis of Renaturation curves with Multiple Components

• • Contributed by Ross Hardison
• T. Ming Chu Professor (Biochemistry and Molecular Biology) at The Pennsylvania State University

In this section, the analysis in Section 4.2 is applied quantitatively in an example of renaturation of genomic DNA. If an unknown DNA has a single kinetic component, meaning that the fraction renatured increases from 0.1 to 0.9 as the value of C0t increases 100-fold, then one can calculate its complexity easily. Using equation (6), all one needs to know is its C0t1/2, plus the $$C_0t_{1/2}$$ and complexity of a standard renatured under identical conditions (initial concentration of DNA, salt concentration, temperature, etc.).

The same logic applies to the analysis of a genome with multiple kinetic components. Some genomes reanneal over a range of C0tvalues covering many orders of magnitude, e.g. from 10-3 to 104. Some of the DNA renatures very fast; it has low complexity, and as we shall see, high repetition frequency. Other components in the DNA renature slowly; these have higher complexity and lower repetition frequency. The only new wrinkle to the analysis, however, is to treat each kinetic component independently. This is a reasonable approach, since the DNA is sheared to short fragments, e.g. 400 bp, and it is unlikely that a fast-renaturing DNA will be part of the same fragment as a slow-renaturing DNA.

Some terms and abbreviations need to be defined here.

• f = fraction of genome occupied by a component
• $$C_0t_{1/2}$$ for pure component = (f) ($$C_0t_{1/2}$$ measured in the mixture of components)
• R = repetition frequency
• G = genome size. G can be measured chemically (e.g. amount of DNA per nucleus of a cell) or kinetically (see below).

One can read and interpret the $$C_0t$$ curve as follows. One has to estimate the number of components in the mixture that makes up the genome. In the hypothetical example in Figure 4.5, three components can be seen, and another is inferred because 10% of the genome has renatured as quickly as the first assay can be done. The three observable components are the three segments of the curve, each with an inflection point at the center of a part of the curve that covers a 100-fold increase in $$C_0t$$ (sometimes called 2 logs of $$C_0t). The fraction of the genome occupied by a component, f, is measured as the fraction of the genome annealing in that component. The measured\(C_0t_{1/2}$$ is the value of $$C_0t$$ at which half the component has renatured. In Figure 4.5, component 2 renatures between $$C_0t$$ values of 10-3 and 10-1, and the fraction of the genome renatured increased from 0.1 to 0.3 over this range. Thus f is 0.3-0.1=0.2. The C0t value at half-renaturation for this component is the value seen when the fraction renatured reached 0.2 (i.e. half-way between 0.1 and 0.3; this C0t value is 10-2, and it is referred to as the C0t1/2for component 2 (measured in the mixture of components). Values for the other components are tabulated in Figure 4.5. Figure 4.5.

All the components of the genome are present in the genomic DNA initially denatured. Thus the value for C0 is for all the genomic DNA, not for the individual components. But once one knows the fraction of the genome occupied by a component, one can calculate the C0 for each individual component, simply as C0 ´ f. Thus the $$C_0t_{1/2}$$ for the individual component is the $$C_0t_{1/2}$$ (measured in the mixture of components) ´ f. For example the $$C_0t_{1/2}$$ for individual (pure) component 2 is 10-2 ´ 0.2 = 2 ´ 10-3 .

Knowing the measured $$C_0t_{1/2}$$for a DNA standard, one can calculate the complexity of each component.

$Nn= C_0t_{1/2}_{pure}, n$ ´ • where n refers to the particular component, i.e. (1, 2, 3, or 4)

The repetition frequency of a given component is the total number of base pairs in that componentdivided by the complexity of the component. The total number of base pairs in that component is given by fn ´ G.

Rn =

For the data in Figure 4.5, one can calculate the following values:

Component f $$C_0t_{1/2}$$, mix $$C_0t_{1/2}$$, pure N (bp) RR
1 foldback 0.1 < 10-4 < 10-4

2 fast 0.2 10-2 2 x 10-3 600 105
3 intermediate 0.1 1 0.1 3 x 104 103
4 slow (single copy) 0.6 103 600 1.8 x 108 1
std bacterial DNA 10 3 x 106 1

The genome size, G, can be calculated from the ratio of the complexity and the repetition frequency.

G= E.g. If G = 3 x 108 bp, and component 2 occupies 0.2 of it, then component 2 contains 6 x 107 bp. But the complexity of component 2 is only 600 bp. Therefore it would take 105 copies of that 600 bp sequence to comprise 6 x 107 bp, and we surmise that R = 105.

Exercise 4.1

If one substitutes the equation for Nn and for G into the equation for Rn, a simple relationship for R can be derived in terms of $$C_0t_{1/2}$$ values measured for the mixture of components . What is it?

## Types of DNA in each kinetic component for complex genomes

Eukaryotic genomes usually have multiple components, which generates complex C0t curves. Figure 4.6 shows a schematic C0t curve that illustrates the different kinetic components of human DNA, and the following table gives some examples of members of the different components. Figure 4.6.

Table 4.2. Four principle kinetic components of complex genomes
Renaturation kinetics C0t descriptor Repetition frequency Examples
too rapid to measure "foldback" not applicable inverted repeats
fast renaturing low C0t highly repeated, > 105 copies per cell interspersed short repeats (e.g. human Alu repeats); tandem repeats of short sequences (centromeres)
intermediate renaturing mid C0t moderately repeated, 10-104 copies per cell families of interspersed repeats (e.g. human L1 long repeats); rRNA, 5S RNA, histone genes
slow renaturing high C0t low, 1-2 copies per cell, "single copy" most structural genes (with their introns); much of the intergenic DNA

N, R for repeated DNAs are averages for many families of repeats. Individual members of families of repeats are similar but not identical to each other.

The emerging picture of the human genome reveals approximately 30,000 genes encoding proteins and structural or functional RNAs. These are spread out over 22 autosomes and 2 sex chromosomes. Almost all have introns, some with a few short introns and others with very many long introns. Almost always a substantial amount of intergenic DNA separates the genes.

Several different families of repetitive DNA are interspersed throughout the the intergenic and intronic sequences. Almost all of these are repeats are vestiges of transposition events, and in some cases the source genes for these transposons have been found. Some of the most abundant families of repeats transposed via an RNA intermediate, and can be called retrotransposons. The most abundant repetitive family in humans are Alu repeats, named for a common restriction endonuclease site within them. They are about 300 bp long, and about 1 million copies are in the genome. They are probably derived from a modified gene for a small RNA called 7SL RNA. (This RNA is involved in translation of secreted and membrane bound proteins). Genomes of species from other mammalian orders (and indeed all vertebrates examined) have roughly comparable numbers of short interspersed repeats independently derived from genes encoding other short RNAs, such as transfer RNAs.

Another prominent class of repetitive retrotransposons are the longL1 repeats. Full-length copies of L1 repeats are about 7000 bp long, although many copies are truncated from the 5' end. About 50,000 copies are in the human genome. Full-length copies of recently transposed L1s and their sources genes have two open reading frames (i.e. can encode two proteins). One is a multifunctional protein similar to the pol gene of retroviruses. It encodes a functional reverse transcriptase. This enzyme may play a key role in the transposition of all retrotransposons. Repeats similar to L1s are found in all mammals and in other species, although the L1s within each mammalian order have features distinctive to that order. Thus both short interspersed repeats (or SINEs) and the L1 long interspersed repeats (or LINEs) have expanded and propogated independently in different mammalian orders.

Both types of retrotransposons are currently active, generating de novomutations in humans. A small subset of SINEs have been implicated as functional elements of the genome, providing post-transcriptional processing signals as well as protein-coding exons for a small number of genes.

Other classes of repeats, such as L2s (long repeats) and MIRS (short repeats named mammalian interspersed repeats), appear to predate the mammalian radiation, i.e. they appear to have been in the ancestral eutherian mammal. Other classes of repeats are transposable elements that move by a DNA intermediate.

Other common interspersed repeated sequences in humans