# 4.2: Reassociation kinetics measure sequence complexity

### Low complexity DNA sequences reanneal faster than do high complexity sequences

The components of complex genomes differ not only in repetition frequency (highly repetitive, moderately repetitive, single copy) but also in sequence complexity. Complexity (denoted by N) is the number of base pairs of unique or nonrepeating DNA in a given segment of DNA, or component of the genome. This is different from the length (L) of the sequence if some of the DNA is repeated, as illustrated in this example.

E.g. consider 1000 bp DNA.

• 500 bp is sequence a, present in a single copy.
• 500 bp is sequence b (100 bp) repeated 5 times:

a b b b b b

|___________|__|__|__|__|__| L = length = 1000 bp = a + 5b

N = complexity = 600 bp = a + b

Some viral and bacteriophage genomes have almost no repeated DNA, and L is approximately equal to N. But for many genomes, repeated DNA occupies 0.1 to 0.5 of the genome, as in this simple example. The key result for genome analysis is that less complex DNA sequences renature faster than do more complex sequences. Thus determining the rate of renaturation of genomic DNA allows one to determine how many kinetic components (sequences of different complexity) are in the genome, what fraction of the genome each occupies, and the repetition frequency of each component.

Before investigating this in detail, let's look at an example to illustrate this basic principle, i.e. the inverse relationship between reassociation kinetics and sequence complexity.

### Inverse Relationship between Reassociation Kinetics and Sequence Complexity

Let a, b, ... z represent a string of base pairs in DNA that can hybridize (see Fig. 4.2.). For simplicity in arithmetic, we will use 10 bp per letter.

• DNA 1 = ab (This is very low sequence complexity, 2 letters or 20 bp)
• DNA 2 = cdefghijklmnopqrstuv. (This is 10 times more complex (20 letters or 200 bp)).
• DNA 3 =izyajczkblqfreighttrainrunninsofastelizabethcottonqwftzxvbifyoudontbelieveimleavingyoujustcountthe

(This is 100 times more complex (200 letters or 2000 bp).

A solution of 1 mg DNA/ml is 0.0015 M (in terms of moles of bp per L) or 0.003 M (in terms of nucleotides per L). We'll use 0.003 M = 3 mM, i.e. 3 mmoles nts per L. (nts = nucleotides).

Consider a 1 mg/ml solution of each of the three DNAs. For DNA 1, this means that the sequence ab (20 nts) is present at 0.15 mM or 150 mM (calculated from 3 mM / 20 nt in the sequence). Likewise, DNA 2 (200 nts) is present at 15 mM, and DNA 3 is present at 1.5 mM. Melt the DNA (i.e. dissociate into separate strands) and then allow the solution to reanneal, i.e. let the complementary strand reassociate.

Since the rate of reassociation is determined by the rate of the initial encounter between complementary strands, the higher the concentration of those complementary strands, the faster the DNA will reassociate. So for a given overall DNA concentration, the simple sequence (ab) in low complexity DNA 1 will reassociate 100 times faster than the more complex sequence (izyajcsk ....trad) in the higher complexity DNA 3. Fast reassociating DNA is low complexity.

Fig. 4.2.

#### Kinetics of renaturation

In this section, we will develop the relationships among rates of renaturation, complexity, and repetition frequency more formally.

Figure 4.3.

The time required for half renaturation is inversely proportional to the rate constant. Let C = concentration of single-stranded DNA at time $$t$$ (expressed as moles of nucleotides per liter). The rate of loss of single-stranded (ss) DNA during renaturation is given by the following expression for a second-order rate process:

$\dfrac{-dC}{dt}= kC^2$

or

$\dfrac{dC}{C^2}=-kdt$

Integration and some algebraic substitution shows that

$\dfrac{C}{C_o}=\dfrac{1}{1+kC_ot} \;\;\; \label{1}$

Thus, at half renaturation, when

$\dfrac{C}{C_o}=0.5 \; \text{at} \; t=t_{1/2}$

one obtains:

$C_ot_{1/2}=\dfrac{1}{k} \; \;\; \label{2}$

where $$k$$ is the rate constant in in liters (mole nt)-1 sec-1

The rate constant for renaturation is inversely proportional to sequence complexity.

The rate constant, k, shows the following proportionality:

$k \propto \dfrac{\sqrt{L}}{N} \;\;\; \label{3}$

where

• L = length and
• N = complexity.

Empirically, the rate constant k has been measured as

$k = 3x10^5 \dfrac{\sqrt{L}}{N}$

in 1.0 M Na+ at $$T = T_m - 25^oC$$

The time required for half renaturation (and thus Cot1/2) is directly proportional to sequence complexity.

From Equations \ref{2} and \ref{3},

$C _0 t_{1/2} \propto \dfrac{N}{\sqrt{L}} \;\;\;\; \label{4}$

For a renaturation measurement, one usually shears DNA to a constant fragment length L (e.g. 400 bp). Then L is no longer a variable, and

$C_o t_{1/2}\propto N \;\;\;\; \label{5}.$

The data for renaturation of genomic DNA are plotted as $$C_0 t$$ curves:

Figure 4.4.

Renaturation of a single component is complete (0.1 to 0.9) over 2 logs of $$C_0t$$ (e.g., 1 to 100 for E. coli DNA), as predicted by Equation \ref{1}.

Sequence complexity is usually measured by a proportionality to a known standard

If you have a standard of known genome size, you can calculate $$N$$ from $$C_0t_{1/2}$$:

$\dfrac{N^{unknown}}{N^{known}} = \dfrac{C_0t_{1/2}^{unknown}}{C_0t_{1/2}^{known}} \;\;\;\; (6)$

A known standard could be

• E. coli with N = 4.639 x 106 bp
• pBR322 with N = 4362 bp

More complex DNA sequences renature more slowly than do less complex sequences. By measuring the rate of renaturation for each component of a genome, along with the rate for a known standard, one can measure the complexity of each component.