9.1: DNA Isolation, Sequencing, and Synthesis
- Page ID
- 14969
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)↵
(Learning goals written by Claude, Sonnet 4.6, Anthropic)
Genomic DNA, cDNA, and DNA Purification
- Distinguish genomic DNA (gDNA, the full chromosomal complement including introns, UTRs, promoters, enhancers, and regulatory sequences) from complementary DNA (cDNA, synthesized by reverse transcriptase from mature mRNA and therefore containing only exon-encoded sequences) — explaining why cDNA is required when expressing eukaryotic proteins in prokaryotic hosts (which lack spliceosomes), how the eukaryotic gene is organized into exons, introns, 5' and 3' UTRs, and distal regulatory elements, and how prokaryotic operons differ by producing polycistronic mRNAs from intron-free, contiguous gene sequences.
- Explain the chemical principles underlying silica-gel spin column DNA purification — describing how high-concentration chaotropic salts (guanidinium chloride, guanidinium thiocyanate) disrupt water structure, denature proteins, and enable bridging cations (Na⁺, guanidinium⁺) to mediate adsorption of the polyanion DNA to the negatively charged silica surface through hydrogen bonding and ion-ion interactions, how washing removes contaminants, and how low-salt buffer or water strips the bridging cations and elutes pure DNA — and interpret spectrophotometric purity data (A₂₆₀/A₂₈₀ ≈ 1.8 for pure DNA; A₂₆₀ = 1.0 corresponds to 50 μg/mL dsDNA) and explain the hyperchromic effect (37% increase in A₂₆₀ on denaturation due to base unstacking) and how its cooperative sigmoidal thermal denaturation curve defines the melting temperature Tm.
DNA Sequencing: From Sanger to Nanopore
- Explain the chain-termination (Sanger) sequencing mechanism — describing how ddNTPs (lacking the 3'-OH required for phosphodiester bond formation) are incorporated stochastically at ~1/100 the frequency of the corresponding dNTP, generating a nested set of extension fragments whose sizes map the position of each base, how the four reactions (one per ddNTP) are resolved by denaturing polyacrylamide gel electrophoresis (smallest = closest to primer = earliest termination = lowest gel position) or capillary electrophoresis with fluorescent ddNTPs to produce a chromatogram — and explain why Sanger sequencing (reads ~400–500 bp, ~$0.50–1/reaction) was succeeded by next-generation sequencing (NGS) methods that offer massively parallel, lower-cost, higher-throughput sequencing.
- Compare the detection principles of four NGS platforms — Illumina (sequencing by synthesis with fluorescently labeled, reversibly terminating dNTPs added one at a time to adapter-ligated, PCR-amplified clusters, imaged between each cycle, 100–150 bp reads), Roche 454 (pyrosequencing detecting light emission from pyrophosphate release on nucleotide incorporation into bead-bound PCR amplicons, up to 1 kb reads), Ion Torrent (detection of H⁺ released on phosphodiester bond formation using semiconductor pH sensors, ~200 bp reads), and nanopore sequencing (detection of characteristic base-specific current disruptions as ssDNA threads through a protein pore driven by transmembrane voltage and controlled by a helicase motor, real-time, long reads, detects epigenetic modifications) — and explain how expandomer technology addresses the signal-to-noise limitation of nanopore sequencing by using CuAAC click chemistry to attach each dNTP to a large macrocyclic polymer tether (~16 kDa) that physically spaces the bases as they transit the pore, enabling the Telomere-to-Telomere T2T Consortium to complete the full human genome sequence (2022) by resolving previously unsequenceable centromeric and telomeric repeats.
PCR, Gene Synthesis, and DNA Vaccine Applications
- Describe the PCR cycle — explaining how (1) denaturation at ~95°C melts the dsDNA template, (2) annealing at 50–65°C allows sequence-specific forward and reverse primers to bind the 3' end of each template strand, preventing template re-hybridization, and (3) extension at 72–80°C allows thermostable Taq polymerase to synthesize new strands 5'→3', doubling the target sequence each cycle for exponential (~2ⁿ) amplification — and compare standard PCR to qPCR (real-time detection using SYBR Green minor-groove binding or TaqMan hydrolysis probes for quantification during amplification) and RT-PCR (reverse transcription of mRNA to cDNA before PCR, enabling quantification of gene expression levels), noting limitations including contamination sensitivity, primer dimer formation, and non-specific annealing.
- Explain oligonucleotide and gene synthesis — describing how solid-phase phosphoramidite synthesis adds protected nucleoside phosphoramidites sequentially in the 3'→5' direction (opposite to biological synthesis) with cycles of deprotection, coupling, capping, and oxidation — connecting the practical ~200 bp limit for oligonucleotide fidelity to the need to assemble longer synthetic genes from overlapping oligomers — and explain how codon optimization (exploiting the degeneracy of the genetic code to replace rare codons with those matching abundant tRNAs, potentially improving expression 10- to 100-fold) and removal of mRNA secondary structures can dramatically improve heterologous protein expression, and how DNA vaccines (antigen-encoding gene in a non-replicative expression plasmid, delivered to host cells for in vivo antigen production and MHC-mediated immune priming) offer advantages over egg-based vaccines for rapidly emerging pathogens including influenza H5N1.
Genomic and complementary DNA
The ability to sequence an organism's DNA has revolutionized our understanding of biology and evolution. DNA can be isolated from living, dead, and even extinct species, and the "primary" sequence of A, G, C, and T bases in the molecule can be determined. We can read (sequence), write (synthesize), and edit (mutate) DNA at will. Before we explore how to purify, sequence (read), and synthesize (write) DNA, it's important to differentiate between two types of DNA: genomic DNA and complementary DNA (cDNA), which is made by reverse transcription of messenger RNA into DNA. Since mRNA has no nucleotides encoded by introns, cDNA provides just the coding sequences for protein.
Genomic deoxyribonucleic acid (gDNA) is chromosomal DNA, which does not include the extra-chromosomal DNA found in the mitochondria of eukaryotes or plasmids in bacteria (plasmids will be discussed in more detail in section 5.3 during the discussion of gene cloning and expression). Most organisms have the same genomic DNA in every cell (one exception is the genomic DNA for antibodies in B cells and T-cell receptors in T cells, which are altered as the cells become more terminally differentiated). It is also important to remember that only certain genes are active (expressed) in each cell. The subset of expressed genes is specific to a given differentiated cell type and enables the expression of specific cell functions. Liver cells, for example, don't express the gene for the protein opsin, which is expressed in retinal cells and is required for vision.
The genome of an organism (encoded by the genomic DNA) is the (biological) hereditary information passed from one generation of an organism to the next. That genome is transcribed to produce various RNAs, which are necessary for the function of the organism. RNA polymerase II transcribes precursor mRNA (pre-mRNA) in the nucleus. pre-mRNA is then processed by splicing to remove introns, leaving the exons in the mature messenger RNA (mRNA). Additional processing includes the addition of a 5' cap and a poly(A) tail to the pre-mRNA. The mature mRNA may then be transported to the cytosol and translated by the ribosome into a protein. Other types of RNA include ribosomal RNA (rRNA) and transfer RNA (tRNA). These types are transcribed by RNA polymerase II and RNA polymerase III, respectively, and are essential for protein synthesis. However, 5s rRNA is the only rRNA transcribed by RNA Polymerase III. cDNA is derived from mRNA and contains only exons, not introns.
Figure \(\PageIndex{1}\) shows the flow of information stored in eukaryotic DNA and its eventual expression in mRNA.
Red indicates coding exons, which are separated by gray introns. At the beginning and end of a gene sequence (encoded by three exons in the figure below) are 5' and 3' untranslated regions (UTRs), which are also transcribed and are represented in the mature mRNA. Also, there are potential regulatory sequences (yellow) that are not transcribed on both sides of the transcribed part of the gene. The 5'-end promoter is where transcription factors and RNA polymerase assemble before transcription starts. In addition, there are regulatory enhancers and silencers that are more distal to the gene sequences. An open reading frame (ORF) is a region of the DNA that can be decoded into an mRNA and doesn't have a stop signal (codon) in it that would prematurely terminate transcription.
In contrast, complementary DNA (cDNA) is synthesized from a single-stranded RNA (e.g., messenger RNA (mRNA) or microRNA) template in a reaction catalyzed by the enzyme reverse transcriptase. Reverse transcriptase is an enzyme found in retroviruses, such as HIV, which have RNA as their core genetic material. Upon entering the host cell, the RNA is reverse-transcribed to produce a cDNA copy, which can then be integrated into the host's genomic DNA. In biotechnology, reverse transcriptase is often used to create cDNA from the mRNA expressed in specific cells or tissues. In this way, eukaryotic genes can be cloned without introns in their structure. This is especially useful if the goal is to express the protein in a prokaryotic (bacterial) host. Recall that bacterial DNA contains no intron sequences within its chromosomal DNA. Thus, if you are using a prokaryotic system to express eukaryotic proteins, you must use cDNA, as the prokaryotic system will not be able to remove intron sequences following gene transcription. The term cDNA is also used, typically in a bioinformatics context, to refer to an mRNA transcript's sequence expressed as DNA bases (GCAT) rather than RNA bases (GCAU).
The gene organization of prokaryotes differs in that they lack introns. In addition, some genes for a common pathway, for example, are continuous in the DNA. These stretches of DNA are called operons. Transcription from an operon produces a polycistronic RNA transcript. The words cis and trans are used in chemistry to describe R groups on the same size (cis) or opposite sides (trans) of a double bond. In DNA, cis-elements are in a single DNA section, while trans-elements usually refer to proteins (away from the gene) binding to the DNA. Hence, the term polycistronic is used for bacterial operons (with multiple genes sequentially arranged in the DNA sequence). Figure \(\PageIndex{2}\) shows the organization of prokaryotic gene structure.
DNA Extraction/Purification
The first DNA isolation was done in 1869 by Friedrich Miescher. Now, purification kits are available from multiple manufacturers.
DNA can be isolated from whole tissue or cell cultures. Let's consider just DNA extraction from cells grown in the lab. Cells are collected by centrifugation and then treated with detergents, such as sodium dodecyl sulfate, to lyse the cell membranes. Proteases and DNAase-free RNAase can be added to digest proteins and RNA.
Methods involving phenol/chloroform extractions:
In older methods, a mixture of phenol and chloroform or phenol/chloroform/isoamyl alcohol is used to extract DNA from the solution. Students who have performed liquid-liquid extractions in chemistry lab courses should recognize that the mixture will form a biphasic system with water. Nonpolar substances like lipids and cellular debris will partition into the nonpolar phase (chloroform/phenol) or into the interface between them (as suspended insoluble material). Chloroform is very dense as it contains a chlorine atom. Phenol is somewhat soluble in water (8 g/100 g water) but very soluble in chloroform. During mixing during extraction, the dissolved phenol alters water properties sufficiently to shift the delicate equilibrium of proteins from the native to the denatured state, leading to aggregation and precipitation. On settling, DNA will remain in the aqueous phase. The use of chloroform/phenol in DNA extractions has a potential problem. Phenol (hydroxylated benzene) can lose one electron from the oxygen atom, forming a free radical that can be stabilized by resonance with pi electrons in the aromatic ring. Free radicals can damage DNA, so most new purification methods do not use phenol/chloroform extractions.
Most methods involve precipitating the extracted DNA at some point in the purification process using cold ethanol or isopropanol. DNA is to a first approximation a long polyanion so it would be very difficult to purify "naked" DNA from solution since the extensive negative charges on the DNA would prevent aggregate and precipitate formation. This is not a problem if the ionic strength of the medium is sufficiently high so that positively charged counter-ions can shield the negative charges from each other, allowing precipitation.
Methods involving adsorption chromatography using silica gel: Nucleic acids bind or adsorb to a solid phase (silica or other), depending on the pH and the salt concentration of the buffer solution. Small spin columns are used when small amounts are required (e.g., for isolating a recombinant plasmid from bacteria). This method relies on the fact that nucleic acids bind to the solid-phase silica gel under certain conditions and are released when those conditions are altered. These features are illustrated in Figure \(\PageIndex{3}\).

Figure \(\PageIndex{3}\): DNA-Silica gel interactions. Image by Squidonius
The DNA solution is applied to a small spin column containing silica gel, which is then placed in a mini-centrifuge. On spinning, the nucleic acid will bind to the silica gel membrane as the solution passes through. After multiple "spin" washes to remove nonspecific cellular components from the column, the DNA is eluted with a low-salt elution buffer (or simply water). Unlike RNA, which degrades very quickly, DNA is quite stable and can be stored for long periods at -20 oC.
Many student readers have used silica gel chromatography or silica gel thin-layer chromatography to separate and analyze organic mixtures. These techniques are usually performed in a mixture of organic solvents (e.g., hexane/ethyl acetate). Using silica gel to purify DNA from an aqueous solution might seem strange, so we will briefly explore how DNA binds.
In silica gel, each silicon atom is tetrahedrally bonded (sp3 hybridization) to four oxygen atoms, with each oxygen atom covalently linked to two silicon atoms. At the particle's surface, the oxygen atoms are capped with H atoms, so the entire surface contains a "sea" of OH groups, and therefore hydrogen bond donors and acceptors. At lower pHs, some could ionize to form O- ions. If salt concentrations are high enough, the percentage of O- ions increases since they are stabilized by the cations in the salt (shifting the equilibrium to the ionized state).
DNA, a long negatively charged anion, can bind to the silica surface using two types of noncovalent interactions. It can form hydrogen bonds with the silica gel's surface hydroxyl groups. In addition, it can interact with the surface through ion-ion interactions mediated by bridging cations (like Na+ from the high-concentration salt solution), as illustrated in the figure above. The binding solutions used in the spin column adsorption steps have high concentrations of chaotropic salts that disrupt water structure and hydrogen bonding. The salts also denature proteins and, in effect, dehydrate the DNA. Some chaotropic salts include sodium iodide, sodium perchlorate, guanidinium thiocyanate, and guanidinium chloride. The sodium or guanidinium acts as a bridging cation, allowing the adsorption of the negative DNA to the negative charges on the silica gel surface. Sodium acetate and Tris-HCl are included to buffer pH from 6-7. Now, it becomes easy to understand how pure water or low salt concentration solutions elute the bound DNA after extensive column washing, since pure water or low salt solutions would strip the bound intermediary cations from the silica column.
After isolation, the DNA is dissolved in a slightly alkaline buffer, usually Tris-EDTA or ultra-pure water. EDTA binds divalent cations, such as Ca2+, which activate nucleases. Modifications to these standard techniques are often made when the tissue being used is difficult to break down, contaminants persist in the lysis solution and inhibit further reactions, or the sample is extremely limited, as is often the case in forensic investigations. In addition, different commercial kits will be tailored to isolate larger genomic DNA or smaller plasmid DNA.
The purity of a DNA preparation is usually determined by measuring the absorbance of the solution at 230, 260 (peak absorbance nucleic acids), and 280 nm (peak absorbance proteins), often using an instrument that requires a tiny droplet of solution. Figure (\PageIndex{4}\) below shows the relative absorbance spectra of proteins and nucleic acids.

Figure (\PageIndex{4}\) relative absorbance spectra of proteins and nucleic acids. Brianna Bibel. The Bumbling Chemist https://thebumblingbiochemist.com/36...purity-ratios/
The A260/A280 ratio provides a measure of protein contamination, with a value around 1.8 indicating "pure" DNA. The A260/A230 gives information on protein and other solution contamination. A pure DNA solution with A260 = 1.0 has a concentration of 50 μg/mL (50 ng/μL).
When the temperature of a dsDNA solution is increased, the A260 increases by about 37% over a certain range. This is called the hyperchromic effect, which occurs when the bases in DNA unstack on denaturation of dsDNA to single-stranded DNA as the intrastrand hydrogen bonds break. Figure (\PageIndex{5}\) below shows a graph of A260 vs temperature. The midpoint of the cooperative change in absorbance indicates that 50% of the DNA molecules are denatured.
Given the cooperative nature of unfolding, it is less likely that the population consists of individual molecules that are 50% denatured (i.e., a single molecule being 50% double-stranded and 50% single-stranded). Accordingly, a lower concentration (33 μg/mL) of fully single-stranded DNA gives an A260 = 1.
DNA Sequencing Techniques
DNA sequencing determines the order of nucleotides in a DNA sequence. It includes any method or technology used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery.
Knowledge of DNA sequences has become indispensable for basic biological research and in numerous applied fields such as medical diagnosis, biotechnology, forensic biology, virology, and biological systematics. Comparing healthy and mutated DNA sequences can diagnose various diseases, including cancers, characterize antibody repertoires, and guide patient treatment. Having a quick way to sequence DNA enables faster, more individualized medical care and allows more organisms to be identified and cataloged.
The rapid speed of modern DNA sequencing technology has been instrumental in sequencing complete DNA sequences, or genomes, of numerous types and species of life, including the human genome and those of many animal, plant, and microbial species.
The first DNA sequences were obtained in the early 1970s by academic researchers using laborious methods based on two-dimensional chromatography. Following the development of fluorescence-based sequencing methods with a DNA sequencer, DNA sequencing has become easier and faster.
The canonical structure of DNA has four bases: thymine (T), adenine (A), cytosine (C), and guanine (G). DNA sequencing is the determination of the physical order of these bases in a DNA molecule. However, epigenetic processes often modify DNA bases to control gene expression. Thus, more modified bases may be present in a DNA molecule than the standard four bases. For example, in some viruses (specifically, bacteriophages), cytosine may be replaced by hydroxymethylcytosine or hydroxymethylglucosylcytosine. In eukaryotic DNA, variant bases with methyl groups or phosphosulfate may be found as shown in Figure (\PageIndex{6}\) below. Depending on the sequencing technique, a particular modification, e.g., 5mC (5-methylcytosine), common in humans, may or may not be detected.
Early DNA sequencing methods
The first method for determining DNA sequences involved a location-specific primer extension strategy established by Ray Wu at Cornell University in 1970. DNA polymerase catalysis and specific nucleotide labeling, which figure prominently in current sequencing schemes, were used to sequence the cohesive ends of lambda phage DNA. Between 1970 and 1973, Wu, R. Padmanabhan, and colleagues demonstrated that this method could be used to determine any DNA sequence using synthetic location-specific primers. Frederick Sanger then adopted this primer-extension strategy to develop more rapid DNA sequencing methods at the MRC Center, Cambridge, UK. He published a method for "DNA sequencing with chain-terminating inhibitors" in 1977. Walter Gilbert and Allan Maxam at Harvard also developed sequencing methods, including one for "DNA sequencing by chemical degradation." In 1973, Gilbert and Maxam reported the sequence of 24 base pairs using a method known as wandering-spot analysis. Sequence advancements were aided by the concurrent development of recombinant DNA technology, allowing DNA samples to be isolated from sources other than viruses.
Maxam-Gilbert sequencing requires radioactive labeling at one 5' end of the DNA and purification of the DNA fragment to be sequenced. Chemical treatment then generates breaks at a small proportion of one or two of the four nucleotide bases in each of the four reactions (G, A+G, C, C+T). The concentration of the modifying chemicals is controlled to introduce, on average, one modification per DNA molecule. Thus, a series of labeled fragments is generated from the radiolabeled end to the first "cut" site in each molecule. The fragments in the four reactions are electrophoresed side by side in denaturing acrylamide gels for size separation. To visualize the fragments, the gel is exposed to X-ray film for autoradiography, yielding a series of dark bands, each corresponding to a radiolabeled DNA fragment, from which the sequence may be inferred.
The technical aspects of Maxam-Gilbert sequencing caused it to go out of favor once the Sanger sequencing method had been well established, as described below.
Sanger Sequencing Method
The chain-termination method developed by Frederick Sanger and coworkers in 1977 soon became the method of choice, owing to its relative ease and reliability. When it was invented, the chain-terminator method used fewer toxic chemicals and lower amounts of radioactivity than the Maxam-Gilbert method. Because of its comparative ease, the Sanger method was soon automated and used in the first generation of DNA sequencers.
The classical chain-termination method requires a single-stranded DNA template, a DNA primer, a DNA polymerase, normal deoxynucleotide triphosphates (dNTPs), and modified di-deoxynucleotide triphosphates (ddNTPs), the latter of which terminate DNA strand elongation. These chain-terminating nucleotides lack a 3'-OH group required to form a phosphodiester bond between two nucleotides, causing DNA polymerase to cease the extension of DNA when a modified ddNTP is incorporated. The ddNTPs may be radioactively or fluorescently labeled for detection in automated sequencing machines.
The DNA sample is divided into four separate sequencing reactions, containing all four standard deoxynucleotides (dATP, dGTP, dCTP, and dTTP) and the DNA polymerase. To each reaction is added only one of the four dideoxynucleotides (ddATP, ddGTP, ddCTP, or ddTTP), while the other added nucleotides are ordinary ones, as shown in (\PageIndex{7}\) below.
The dideoxynucleotide concentration should be approximately 100-fold lower than that of the corresponding deoxynucleotide (e.g., 0.005mM ddTTP : 0.5mM dTTP) to allow enough fragments to be produced while still transcribing the complete sequence. Four separate reactions are needed in this process to test all four ddNTPs. This is illustrated in Figure (\PageIndex{8}\) below.
Following rounds of template DNA extension from the bound primer, the resulting DNA fragments are heat-denatured and separated by size using gel electrophoresis. This technique was frequently performed using a denaturing polyacrylamide-urea gel with each of the four reactions run in one of four individual lanes (lanes A, T, G, C). The DNA bands may then be visualized by autoradiography or UV light, and the DNA sequence can be directly read off the X-ray film or gel image, as shown in Figure (\PageIndex{9}\) below.

Figure (\PageIndex{9}\): Traditional Sanger Sequencing Gel. Sequence visualized by autoradiography. Each lane contains a single reaction that has all four regular nucleotides and a small amount of one of the dideoxynucleotides (ddNTPs). Over time, the ddNTPs will be incorporated at each position containing that specific nucleotide. The gel can then be read from bottom to top, as the smallest fragments (those terminated closest to the primer at the 5'-end) will run the farthest distance in the gel. The sequence of this fragment is 5'-TACGAGATATATGGCGTTAATACGATATATTGGAACTTCTATTGC-3'. Image by John Schmidt
Automation of the Sanger sequencing method became possible with the shift from radioactively to fluorescently tagged nucleotides. In automated sequencing, capillary gel electrophoresis is used rather than gel electrophoresis to separate the samples. The output from capillary electrophoresis is fluorescent peak trace chromatograms, as shown in Figure (\PageIndex{10}\) below. Automated DNA-sequencing instruments (DNA sequencers) can sequence up to 384 DNA samples in a single batch. Batch runs may occur up to 24 times a day, greatly increasing the speed at which samples can be sequenced and analyzed. Common challenges of DNA sequencing with the Sanger method include poor quality in the first 15-40 bases due to primer binding, and deteriorating sequencing trace quality after 400-500 bases.
Figure (\PageIndex{10}\): Side-by-Side Comparison of Gel Electrophoresis and Capillary Electrophoresis. The left-hand diagram shows the traditional autoradiogram of Sanger sequencing samples. The Right-hand Diagram shows the same reactions using fluorescently tagged ddNTPs separated by capillary electrophoresis. The chromatogram output is shown on the far right. Image by Abizar
Sanger sequencing is the method that prevailed from the 1980s until ~2005. Over that period, great advances were made in the technique, such as fluorescent labeling, capillary electrophoresis, and general automation. These developments enabled much more efficient sequencing, resulting in lower costs. In mass production form, the Sanger method is the technology that produced the first human genome in 2001, ushering in the age of genomics.
Microfluidic Sanger Sequencing
Microfluidic Sanger sequencing is a lab-on-a-chip application for DNA sequencing, in which the Sanger sequencing steps (thermal cycling, sample purification, and capillary electrophoresis) are integrated on a wafer-scale chip using nanoliter-scale sample volumes (Figure (\PageIndex{11}\)). This technology generates long, accurate sequence reads while obviating many of the significant shortcomings of the conventional Sanger method (e.g., high consumption of expensive reagents, reliance on expensive equipment, and personnel-intensive manipulations) by integrating and automating the Sanger sequencing steps.
Next-generation sequencing (NGS)
Next-generation sequencing (NGS), or high-throughput sequencing, is the catch-all term for several modern sequencing technologies. These technologies allow for the sequencing of DNA and RNA much more quickly and cheaply than the previously used Sanger sequencing and, as such, revolutionized the study of genomics and molecular biology. We present information from specific companies that have developed these new technologies without endorsements. Such technologies include:
Illumina Sequencing - In NGS, vast numbers of short reads are sequenced in a single stroke using the lab-on-a-chip technology described above. To do this, the input sample must be cleaved into short sections. In Illumina sequencing, 100-150 bp reads are used. Somewhat longer fragments are ligated to generic adaptors and annealed to a slide using the adaptors. PCR is performed to amplify each read, creating a spot containing many copies of the same read. They are then separated into single-stranded DNA to be sequenced, as shown in Figure (\PageIndex{12}\) below.
Figure (\PageIndex{12}\): Procedure for Illumina Sequencing. (A) The slide with PCR-amplified DNA fragments is flooded with nucleotides and DNA polymerase. These nucleotides are fluorescently labeled, with each color corresponding to a specific base. The reactions also contain a terminator, so that only one base is added at a time. (B) An image of the slide is taken. At each reaction location, a fluorescent signal indicates that a specific base has been added. (C) The data is recorded, and the slide is then prepared for the next cycle. In preparation, the terminators are removed to allow the next base to be added, and the fluorescent signal is cleaved to prevent it from contaminating the next image. The process is repeated, adding one nucleotide at a time (G, A, T, or C) and imaging in between. All of the sequence reads will be the same length as single bases are added at each cycle. Image modified from EMBL-EBI
Roche 454 sequencing is similar to the Illumina process but can produce much longer reads. Like Illumina, it does this by sequencing multiple reads simultaneously by reading optical signals as bases are added. As with Illumina, the DNA or RNA is fragmented into shorter reads, in this case up to 1kb (1,000 bp). Generic adaptors are added to the ends and annealed to beads, with one DNA fragment per bead. The fragments are then amplified by PCR using adaptor-specific primers. Each bead is then placed in a single well of a slide, with each well containing a single bead and many PCR copies of a single sequence. The wells also contain DNA polymerase and sequencing buffers (Figure (\PageIndex{13}\).
Figure (\PageIndex{13}\): Procedure for Roche 454 Sequencing. (A) Once the PCR product is bound to the bead, the slide is flooded with one of the four NTPs. Where this nucleotide is next in the sequence, it is added to the sequence read. If that single base repeats, then more will be added. So if we flood with Guanine bases, and the next in the sequence is G, one G will be added; however, if the next part of the sequence is GGGG, then four Gs will be added. (B) The addition of each nucleotide releases a light signal. These signal locations are detected and used to determine which beads the nucleotides are added to. (C) The NTP mix is washed away. The next NTP mix is now added, and the process is repeated, cycling through the four NTPs. All of the sequence reads from 454 sequencing will be of different lengths, because different numbers of bases will be added with each cycle. Image modified by EMBL-EBI
Newer technologies, such as the Ion Torrent Technology, detect sequence data using electrical signals on a semiconductor chip rather than optically reading dye-labeled nucleotides. This is possible as the addition of a dNTP to the DNA polymer causes the release of an H+ ion (Figure (\PageIndex{14}\)). As in other kinds of NGS, the input DNA or RNA is fragmented, this time to ~200 bp. Adaptors are added, and one molecule is placed on a bead. The molecules are amplified on the bead by emulsion PCR. Each bead is placed into a single well of a slide. This is illustrated in Figure (\PageIndex{14}\) below.
Figure (\PageIndex{14}\): Ion Torrent Sequencing Technology. (A) Similar to 454 sequencing, the slide is flooded with a single dNTP species, along with buffers and polymerase. The pH is monitored in each well after the addition of the specific dNTP. The pH will decrease when a dNTP is incorporated into the polymer, leading to the release of a proton (H+). The changes in pH allow the determination of different dNTP species. (C) The pH change, if any, is used to determine how many bases (if any) were added with each cycle. Image modified from EMBL-EBI
Nanopore Sequencing
In this technique, flow cells are constructed that contain nanopores in a nonlipid membrane that is electrically insulating. When a voltage is applied across a membrane separating two salt solutions, a sensor chip can detect the current through each nanopore channel. When a larger molecule moves through the pore, a disruption in basal current occurs. Computer algorithms have been developed to detect base-specific changes in the current as the base (even chemically modified ones) moves through the membrane. The sequence is then decoded in real time.
Single-stranded DNA or RNA can be driven through the pore by a transmembrane potential in a process similar to electrophoresis. The enzyme DNA helicase, a motor protein, can be attached to the outer part of the pore protein. This enzyme binds to single-stranded DNA and moves along it, requiring ATP. If the helicase is attached to the pore protein, the single DNA would move, allowing control of its movement through the pore.
The nanopores are made of real membrane proteins (which we discuss in Chapter 11). One example is α-hemolysin, a heptamer with an inner pore diameter of 1 nm. When embedded in real cells, it can allow the flow of K+ (diameter around 250 pm, or 0.25 nm) and other ions across the cell membrane, altering the osmotic balance and lysing the cell. The pore size of the proteins used in nanopore sequencing allows single-stranded DNA to flow through it. An interactive iCn3D model of protein used for nanopore sequencing, Curli transport lipoprotein CsgG (4uv3), is shown in the membrane bilayer in Figure \(\PageIndex{15}\).
Figure \(\PageIndex{15}\): Curli transport lipoprotein CsgG (4uv3) for DNA nanopore sequencing. (Copyright; author via source).
Click the image for a popup or use this external link: https://structure.ncbi.nlm.nih.gov/i...LC5K3vSjYjtHN7
Nanopore technologies have enabled the production of small, handheld DNA sequencing devices that can be plugged into a USB port on a laptop and used in the field under real-time collection conditions. Future modifications might replace protein pores with synthetic solid-state nanopores. For instance, the membrane might be made of graphene with pores of a specific size.
Figure \(\PageIndex{16}\) shows an animation of a single-stranded DNA moving through a protein pore (blue) assisted by a motor protein (magenta)
Recent Updates: 12/6/2025
DNA sequencing using Expandomers
One problem with nanopore sequencing is the signal-to-noise ratio as each base transits the pore. The distance between bases in a strand is only 3.4 Å, so transit of each base through the pore is very quick, crowding the signal, causing lower resolution (i.e., lower signal-to-noise ratio). How to solve the problem? One way would be to increase the distance between the bases. This might at first seem ludicrous, since it would require a significant change in the structure of the single-stranded DNA passing through the pore and a new synthetic method to achieve it. Yet this feat has been accomplished by Mark Kokoris et al., using remarkable chemistry and genetic engineering.
They used click chemistry (See Chapter 7.5.6 Click Chemistry and Bioorthogonal Reactions) to attach a modified dNTP containing alkynes to a larger polymer to form an expandomer. Deoxynucleotides have an average molecular weight of 328 and a -1 charge for the monophosphate. In contrast, the expandomer has an average molecular weight of 16K and a charge of -38 (similar in size and charge to a deoxynucleotide of 35 base pairs. Figure \(\PageIndex{xx}\) below shows (left) the mechanism of the copper-catalyzed azide-alkyne cycloaddition (CuAAC) Click Chemistry and (right) a click chemistry-modifiable dCTP
Figure \(\PageIndex{xx}\): (left) Mechanism of the Copper-Catalyzed Azide-Alkyne Cycloaddition (CuAAC) Click Chemistry and (right) click chemistry-modifiable dCTP. Left: mechanism from the Organic Chemistry Portal. https://www.organic-chemistry.org/na...chemistry.shtm.
A cartoon representation of a typical expandomer (for dCTP) is shown in Figure \(\PageIndex{xx}\) below.
Figure \(\PageIndex{xx}\): A cartoon representation of a typical expandomer (for dCTP)
A chemical structure of a typical expandomer for X-dCTP is shown in Figure \(\PageIndex{xx}\) below.
Figure \(\PageIndex{xx}\): Chemical structure of an expandomer for dCTP. After Sequencing by Expansion (SBX) – a novel, high-throughput single-molecule sequencing technology, https://www.biorxiv.org/content/10.1....639056v2.full.
The covalent bond between the P and N in the X-dNTP is broken by acid-catalyzed hydrolysis, allowing the macrocyclic, symmetrically synthesized report tether to be linearized and expanded, thereby enabling each expandomer to be threaded through the nanopore. Each expandomer base in the synthesized sequence is separated by one tether length.
The expandomer is made from the target DNA strand to be sequenced, which hybridizes to a solid phase extension oligomer (EO) from which the expandomer will be synthesized (in a 5' to 3' direction). Figure \(\PageIndex{xx}\) shows the extension oligo (EO) connected through a concentrator-leader-spacer to the solid support
Figure \(\PageIndex{xx}\): Extension oligo (EO) connected through a concentrator-leader-spacer to the solid support
The leader carries a high negative charge, which helps concentrate the structure on the positive potential side of the membrane pore. The concentrator is highly hydrophobic, facilitating interaction with the lipid bilayer. Both features dramatically increase the diffusion of the EO structure to the nanopore. After the entire complementary strand is synthesized, the expandomer can be cleaved from the solid support by photolysis.
The genetically engineered expandomer synthase (XP synthase), a mutated form (36 amino acid substitutions) of Dpo4 polymerase, a DNA polymerase IV from Saccharolobus solfataricus (a hyperthermophile), is used. It has a more open active site that can accommodate even damaged bases, such as thymine dimers. It reads just short stretches of DNA, so it is distributive rather than processive (reading long stretches before the polymerase dissociates from the chain), and has a very high error rate. Hence, it can accommodate the large expanded version of each incoming deoxynucleotide in its active site.
Let's look in more detail at the components of the expandomer.
Translocation Control Element (TCE)
The structure in Figure xx above shows that the TCE has a more hydrophobic polyethylene glycol chain (-O-CH2-CH2-O-) and a polyphosphate polyanionic chain. It is likely that at low voltage, the hydrophobic chain folds over and closes the pore, whereas at higher transpore voltage, it moves through the pore. With repeated high-voltage pulses, the expandomer ratchets through the chain, one expandomer base section per pulse.
Base Reporter
The TCE likely positions one arm of the Base Reporter in the pore barrel, thereby inhibiting ion flow through the pore to an extent determined by the specific base in the XNTP and the exact structure of the reporter used for each base.
Enhancer
The polyamine repeat in the enhancer confers a significant net positive charge, promoting interaction between the expandomer and XP synthase. Mutations in the XP synthase increase the negative charge density at the open binding site of the synthase.
Here is a link to an animation showing DNA sequencing using expandomers. The method and the video are from Roche. Expandomer sequencing was recently used to sequence an entire human genome in less than 4 hours!
Single Molecule, Real-Time (SMRT) sequencing
In this technique, either RNA or DNA is converted to dsDNA. Deoxynucleotide "adapters" are added to connect the 5'-end of strand 1 to the 3'-end of strand 2 and another adapter to connect the 3'-end of strand 1 to the 5'-end of strand 2, resulting in a "circular" ss-DNA molecule. This single molecule is then drawn into a nanophotonic nanowell made in a thin metal film deposited on glass. The dimensions of each well allow only a single circular ss-DNA molecule. Hundreds of different circular ss-DNA molecules entering individual wells are shown in Figure \(\PageIndex{17}\). The blue stretches represent the adapters.
The wells are approximately 100 nm in diameter. DNA polymerase and dNTPs can be added to the nanowalls, which contain a single immobilized circular ssDNA molecule. The DNA is immobilized by its biotinylated or attachment to magnetic beads, which interact with streptavidin-coated wells. When confined to the wells, the apparent concentration of reactants for the polymerization can be quite high, allowing robust DNA polymerase activity.
DNA sequencing using real-time fluorescence monitoring can be done in a massively parallel fashion. The fluorophore is connected to the terminal phosphate of the dNTP. When DNA polymerase forms a phosphodiester bond, the fluorophore is released as a leaving group, leaving natural, unmodified DNA to continue growing. The YouTube video below shows the entire process of single-molecule real-time sequencing.
The four main advantages of Next Generation Sequencing (NGS) over classical sequencing are described below.
Sample size
NGS is cheaper, quicker, needs significantly less DNA, and is more accurate and reliable than Sanger sequencing. Let us look at this more closely. For Sanger sequencing, a large amount of template DNA is required per read. Several strands of template DNA are needed for each base being sequenced (i.e., for a 100bp sequence, you'd need many hundreds of copies; for a 1000 bp sequence, you'd need many thousands of copies), as a strand that terminates on each base is needed to construct a full sequence. In NGS, a sequence can be obtained from a single strand. In both sequencing methods, multiple staggered copies are used for contig construction and sequence validation.
Speed
NGS is quicker than Sanger sequencing in two ways. Firstly, in some NGS methods, the chemical reaction and signal detection are combined, whereas in Sanger sequencing, they are separate processes. Secondly, and more significantly, only one read (up to ~1kb) can be taken at a time in Sanger sequencing. In contrast, NGS is massively parallel, allowing 300 GB of DNA to be read on a single chip during a single run.
Cost
The reduced time, people power, and NGS reagents mean the costs are much lower. The first human genome sequence cost about $2.7 billion in 2003. Using modern Sanger sequencing methods, aided by data from the known sequence, a full human genome still cost $300,000 in 2006. Sequencing a human genome with NGS today costs roughly $1,000.
Accuracy
Repeats are intrinsic to NGS, as each read is amplified before sequencing, and because it relies on many short overlapping reads, each section of DNA or RNA is sequenced multiple times. Also, because it is much quicker and cheaper, it is possible to repeat more than Sanger sequencing. More repeats mean greater coverage, leading to a more accurate and reliable sequence, even if individual reads are less accurate in NGS.
Nanopore and single-molecule real-time (SMRT) sequencing were recently employed to complete the full human genome sequence (2022). Previous genomic sequences lacked regions with highly repetitive sequences at centromeres and telomeres. The "Telomere-to-Telomere (T2T) Consortium performed the analysis, adding 200 megabases of new sequence information missing from the previous best sequence.
DNA Synthesis Techniques
DNA synthesis is the natural or artificial creation of deoxyribonucleic acid (DNA) molecules. The term DNA synthesis can refer to DNA replication (which will be covered in more detail in Chapter XX), the polymerase chain reaction (PCR), or gene synthesis (the physical creation of artificial gene sequences).
Polymerase Chain Reaction (PCR)
The Polymerase chain reaction (PCR) is widely employed in the basic and biomedical sciences. PCR is a laboratory technique utilized to amplify specific segments of DNA for a wide range of laboratory and/or clinical applications. Building on the work of Panet and Khorana’s successful amplification of DNA in vitro, Kary Mullis and coworkers developed PCR in the early 1980s, and it was awarded the Nobel Prize only a decade later. Allowing more than a billion-fold amplification of specific target regions, it has become instrumental in many applications, including cloning genes, diagnosing infectious diseases, and screening pregnant women for deleterious genetic abnormalities.
Fundamentals
The main components of PCR are a template, primers, free nucleotide bases, and the DNA polymerase enzyme. The DNA template contains the specific region you wish to amplify, such as the DNA extracted from a hair sample. Primers, or oligonucleotides, are short strands of single-stranded DNA complementary to the 3' end of each target region. A forward and a reverse primer are required, one for each complementary strand of DNA. DNA polymerase is the enzyme that carries out DNA replication. Thermostable analogs of DNA polymerase I, such as Taq polymerase, which was originally found in a bacterium that grows in hot springs, are a common choice due to their resistance to the heating and cooling cycles necessary for PCR.
PCR takes advantage of the complementary base pairing, double-stranded nature, and melting temperature of DNA molecules. This process involves cycling through 3 sequential rounds of temperature-dependent reactions: DNA melting (denaturation), annealing, and enzyme-driven DNA replication (elongation). Denaturation begins by heating the reaction to about 95 oC, disrupting the hydrogen bonds that hold the two strands of template DNA together. Next, the reaction is reduced to 50-65 °C, depending on the physicochemical properties of the primers, enabling annealing of complementary base pairs. The primers, which are added to the solution in excess, bind to the beginning of the 3' end of each template strand and prevent re-hybridization of the template strand with itself. Lastly, enzyme-driven DNA replication, or elongation, begins by setting the reaction temperature to the level that optimizes DNA polymerase's activity, around 75 to 80 °C. At this point, DNA polymerase, which needs double-stranded DNA to begin replication, synthesizes a new DNA strand by assembling free nucleotides in solution in the 3' to 5' direction to produce two full sets of complementary strands. The newly synthesized DNA is now identical to the template strand and will be used as such in the progressive PCR cycles. The steps in PCR are animated in the video below.
Figure (\PageIndex{17}\) below illustrates the steps involved in PCR amplification of target DNA.
Figure (\PageIndex{18}\) shows a video animation of a PCR reaction.
Figure (\PageIndex{18}\): Video animation of a PCR reaction.
Given that previously synthesized DNA strands serve as templates, DNA amplification via PCR increases exponentially, with each DNA copy doubling at the end of each replication step. The exponential replication of the target DNA eventually plateaus around 30 to 40 cycles, mainly due to reagent limitation, but can also be due to inhibitors of the polymerase reaction found in the sample, self-annealing of the accumulating product, and accumulation of pyrophosphate molecules.
Real-Time PCR
At its inception, PCR technology was limited to qualitative and semi-quantitative analysis because it could not quantify nucleic acids. At that time, the DNA product was separated by size on an agarose gel for electrophoresis to verify successful amplification of the target gene. Ethidium bromide, a molecule that fluoresces when bound to dsDNA, could give a rough estimate of DNA amount by roughly comparing the brightness of separated bands, but was not sensitive enough for rigorous quantitative analysis.
Improvements in fluorophore development and instrumentation led to thermocyclers that no longer required measurement of only end-product DNA. This process, known as real-time PCR, or quantitative PCR (qPCR), has allowed for detecting dsDNA during amplification. qPCR thermocyclers can excite fluorophores at specific wavelengths, detect their emission with a photodetector, and record the values. The sensitive collection of numerical values during amplification has strongly enhanced quantitative analytical power.
Two main types of fluorophores are used in qPCR: those that bind specifically to a given target sequence and those that do not. The sensitivity of fluorophores has been an important aspect of qPCR development. One of the most effective and widely used non-specific markers, SYBR Green, after binding to the minor groove of dsDNA, exhibits a 1000-fold increase in fluorescence compared to being free in solution (Video 5.1). However, if even greater specificity is desired, a sequence-specific oligonucleotide, or hybridization probe, can be added that binds to the target gene at some point upstream of the primer (3' end). These hybridization probes contain a reporter molecule at the 5' end and a quencher molecule at the 3' end. The quencher molecule effectively inhibits the reporter's fluorescence while the probe is intact. However, upon contact with DNA polymerase I, the hybridization probe is cleaved, allowing for the dye's fluorescence (Video 5.1).
Reverse-Transcription PCR
Since its advent, PCR technology has been continually expanded, and reverse-transcription PCR (RT-PCR) is among the most important advances. Real-time PCR is frequently confused with reverse-transcription PCR, but it is a separate technique. In RT-PCR, the DNA amplified is derived from mRNA using reverse-transcriptase enzymes to produce a cDNA copy of the gene. Using primer sequences for genes of interest, traditional PCR can be performed on cDNA to qualitatively assess gene expression. Currently, reverse-transcription PCR is commonly used in combination with real-time PCR, allowing one to quantitatively measure relative changes in gene expression across different samples.
Figure (\PageIndex{19}\) shows a video animation video showing the use of reverse transcription Polymerase Chain Reaction (RT-PCR) in COVID-19 testing.
Figure (\PageIndex{19}\): Video animation of the Reverse Transcription Polymerase Chain Reaction (RT-PCR)
Issues of Concern
One disadvantage of PCR technology is its extreme sensitivity. Trace amounts of RNA or DNA contamination in the sample can produce extremely misleading results. Another disadvantage is that primers designed for PCR require sequence data and can therefore only be used to identify the presence or absence of a known pathogen or gene. Another limitation is that sometimes the primers used for PCR can anneal non-specifically to similar sequences, but not identical to, the target gene.
Another potential issue with PCR is primer dimer (PD) formation. PD is a potential by-product consisting of primer molecules that have hybridized with each other due to complementary base pairing in the primers. The DNA polymerase amplifies the PD, leading to competition for PCR reagents that could be used to amplify the target sequences.
Clinical Significance
PCR amplification is an indispensable tool with various applications within medicine. Often, it is used to test for the presence of specific alleles, such as in prospective parents screening for genetic carriers. Still, it can also be used to directly diagnose disease and detect mutations in the developing embryo. For example, the first time PCR was used in this way was to diagnose sickle cell anemia by detecting a single gene mutation.
Additionally, PCR has greatly revolutionized the diagnostic potential for infectious diseases, as it can rapidly determine the identity of microbes that were traditionally unable to be cultured or that required weeks for growth. Pathogens routinely detected by PCR include Mycobacterium tuberculosis, human immunodeficiency virus, herpes simplex virus, syphilis, and many others. Moreover, qPCR is used to test the qualitative presence of microbes and quantify bacterial, fungal, and viral loads.
The sensitivity of diagnostic tools for mutations in oncogenes and tumor suppressor genes has been improved by at least 10,000-fold through PCR, enabling earlier diagnosis of cancers such as leukemia. PCR has also enabled more nuanced and individualized therapies for cancer patients. Additionally, PCR can be used for tissue typing, which is vital for organ implantation, and it has even been proposed as a replacement for antibody-based blood typing tests. PCR also has clinical applications in prenatal testing for genetic diseases and/or clinical pathologies. Samples are obtained either via amniocentesis or chorionic villus sampling.
In forensic medicine, short pieces of highly polymorphic, repeating DNA, called short tandem repeats (STRs), are amplified and used to compare specific gene variations and differentiate individuals.[9] Primers specific to the loci of these STRs are used to amplify them by PCR. Various loci in the human genome contain STRs, and the statistical power of this technique is enhanced by analyzing multiple sites.
Gene Synthesis
Artificial gene synthesis, sometimes known as DNA printing, is a method in synthetic biology used to create artificial genes in the laboratory. Solid-phase DNA synthesis differs from molecular cloning and the polymerase chain reaction (PCR) in that it does not require preexisting DNA sequences. Therefore, making a completely synthetic double-stranded DNA molecule with no apparent limits on either nucleotide sequence or size is possible.
The method has generated functional bacterial or yeast chromosomes containing approximately one million base pairs. Creating novel nucleobase pairs beyond the two base pairs in nature could greatly expand the genetic code.
Har Gobind Khorana and coworkers demonstrated the synthesis of the first complete gene, a yeast tRNA, in 1972. Synthesis of the first peptide- and protein-coding genes was performed in the laboratories of Herbert Boyer and Alexander Markham, respectively.
Commercial gene synthesis services are now available. Approaches often combine organic chemistry and molecular biology techniques, and entire genes may be synthesized "de novo" without the need for a template DNA. Gene synthesis is an important tool in many fields of recombinant DNA technology, including heterologous gene expression, vaccine development, gene therapy, and molecular engineering. The synthesis of nucleic acid sequences can be more economical than classical cloning and mutagenesis procedures. It is also a powerful and flexible engineering tool for creating and designing new DNA sequences and protein functions.
Gene optimization
While the ability to efficiently and cost-effectively produce increasingly long stretches of DNA is a technological driver of this field, increasing attention is being focused on improving the design of genes for specific purposes. Early in the genome sequencing era, gene synthesis was used as an (expensive) source of cDNAs predicted by genomic or partial cDNA information, but was difficult to clone. This practice has become less urgent as higher-quality sources of sequence-verified cloned cDNA have become available.
Producing large amounts of protein from gene sequences can sometimes prove difficult. Many of the most interesting proteins are normally expressed at very low levels in wild-type cells. Redesigning these genes can improve gene expression in many cases. Rewriting the open reading frame is possible because of the genetic code's degeneracy. Thus, it is possible to change up to about a third of the nucleotides in an open reading frame and still produce the same protein. The number of alternate designs possible for a given protein is astronomical. For a typical protein sequence of 300 amino acids, over 10150 codon combinations will encode an identical protein. Codon optimization, or replacing rarely used codons with more common codons, sometimes has dramatic effects. Further optimizations, such as removing RNA secondary structures, can also be included. At least in the case of E. coli, protein expression is maximized by predominantly using codons that are recognized by tRNAs that retain amino acid charging during starvation. Computer programs are used to optimize this task. A well-optimized gene can improve protein expression 2 to 10-fold; in some cases, more than 100-fold improvements have been reported. Because of the many nucleotide changes to the original DNA sequence, the only practical way to create the newly designed genes is to use gene synthesis.
Oligonucleotide synthesis
Oligonucleotides are chemically synthesized using building blocks called nucleoside phosphoramidites. These can be normal or modified nucleosides with protecting groups to prevent their amines, hydroxyl, and phosphate groups from interacting incorrectly. One phosphoramidite is added at a time, the 5' hydroxyl group is deprotected, and a new base is added, and the process is repeated. The chain grows in the 3' to 5' direction, which is backward relative to DNA biosynthesis in vivo. In the end, all the protecting groups are removed. Figure (\PageIndex{20}\) below shows the solid-phase DNA synthesis reaction.
Nevertheless, as a chemical process, several incorrect interactions occur, resulting in defective products. The longer the oligonucleotide sequence being synthesized, the more defects there are; this process is only practical for producing short nucleotide sequences. The current practical limit is about 200 bp (base pairs) for an oligonucleotide with sufficient quality to be used directly for a biological application. HPLC can be used to isolate products with the proper sequence. Meanwhile, many oligos can be synthesized in parallel on gene chips. They should be prepared individually and at larger scales for optimal performance in subsequent gene-synthesis procedures.
DNA synthesis and synthetic biology
The significant drop in the cost of gene synthesis in recent years, driven by increased competition among companies providing this service, has enabled the production of entire bacterial plasmids that previously did not exist. The field of synthetic biology uses technology to produce synthetic biological circuits, stretches of DNA engineered to alter gene expression within cells and drive the cell to produce a desired product.
The ability to synthetically produce DNA will enable the development of environmental, medical, and commercially relevant products. For example, in 2015, Novartis, in collaboration with Synthetic Genomics Vaccines Inc. and the US Biomedical Advanced Research and Development Authority, announced that they had effectively created a synthetic DNA influenza vaccine. New synthetic DNA vaccines promise to provide an alternative to conventional egg-based vaccines, which can be plagued by low efficacy.
DNA vaccines can avoid many issues associated with egg-based vaccine production by generating viral proteins within host cells. To create a DNA vaccine, an antigen-encoding gene is cloned into a non-replicative expression plasmid, which is delivered to the host by traditional vaccination routes. Host cells that take up the plasmid express the vaccine antigen, which can be presented to immune cells via the major histocompatibility complex (MHC) pathways. CD4+ T helper cell activation following MHC class II presentation of secreted DNA vaccine protein is critical for the production of antigen-specific antibodies, as shown in Figure (\PageIndex{21}\) below.
After two decades of research, DNA vaccine technology is maturing—several veterinary DNA vaccines are currently licensed for West Nile virus and melanoma, and, significantly, the first commercial DNA vaccine against H5N1 in chickens has recently been conditionally approved by the USDA. In addition, ongoing large animal trials of DNA vaccines against other diseases, such as HIV, hepatitis, and Zika virus, offer valuable insights that can be applied to influenza DNA vaccine design. Promising approaches have arisen from the numerous studies evaluating different DNA vaccine formulations and delivery systems. Still, a strategy that consistently elicits protection against influenza in large animal models has not yet emerged. Successful plasmid delivery and the use of appropriate adjuvants remain key challenges that need to be addressed before influenza DNA vaccines become effective for human use.
Summary
(Summary written by Claude, Sonnet 4.6, Anthropic)
This chapter surveys the major techniques for isolating, reading (sequencing), writing (synthesizing), and amplifying DNA — providing the technical foundation for modern molecular biology, genomics, diagnostic medicine, and synthetic biology — and illustrates how chemistry drives innovation in each area.
Genomic DNA vs. cDNA represent two fundamentally different windows into an organism's genetic information. Genomic DNA contains the complete chromosomal sequence, including promoters, enhancers, silencers, 5' and 3' untranslated regions, introns, and exons — the full regulatory and informational architecture of the genome. Eukaryotic genes are organized with coding exons separated by non-coding introns; the pre-mRNA transcript is processed by the spliceosome to remove introns, and the mature mRNA is further modified by 5' capping and 3' polyadenylation before export to the cytoplasm for translation. cDNA is synthesized in vitro from mature mRNA using reverse transcriptase (originally identified as an enzyme encoded by retroviruses like HIV), producing a double-stranded DNA that contains only exon sequences — the coding information without the intervening introns. This makes cDNA essential for expressing eukaryotic proteins in prokaryotic hosts: bacteria lack the spliceosomal machinery to process introns, so only intron-free cDNA allows correct protein production. In prokaryotes, gene organization differs fundamentally: genes lack introns, and related genes are often clustered into operons — polycistronic transcription units regulated by a single promoter that produce a single mRNA encoding multiple proteins in a common pathway.
DNA extraction and purification begin with cell lysis (using detergents such as SDS to dissolve membranes), followed by enzymatic digestion of proteins (protease) and RNA (RNase), and then isolation of DNA by one of two main approaches. The classical phenol/chloroform extraction exploits liquid-liquid partitioning: lipids and denatured proteins partition into the organic phase (phenol, which destabilizes native protein structure by altering water activity and causing denaturation and precipitation), while DNA remains in the aqueous phase. Phenol's tendency to form free radicals that damage DNA has made it less favored. Modern silica-gel spin column methods exploit a chemistry-based adsorption mechanism: under high-concentration chaotropic salt conditions (guanidinium thiocyanate, guanidinium chloride, NaI, NaClO₄), the chaotropic ions disrupt the hydration shell of DNA, and bridging cations (Na⁺, guanidinium⁺) mediate adsorption of the polyanionic DNA to the silica surface through a combination of hydrogen bonds (between DNA and silica's surface Si-OH groups) and electrostatic bridges; washing under the same high-salt conditions removes proteins and lipids, and elution with low-salt buffer or water strips the bridging cations and releases pure DNA. Purity is assessed spectrophotometrically: A₂₆₀/A₂₈₀ ≈ 1.8 indicates minimal protein contamination (proteins absorb at 280 nm due to aromatic amino acids), A₂₆₀/A₂₃₀ > 2.0 indicates minimal salt/organic contamination, and A₂₆₀ = 1.0 corresponds to 50 μg/mL dsDNA or 33 μg/mL ssDNA. The hyperchromic effect (37% increase in A₂₆₀ on melting of dsDNA to ssDNA) results from base unstacking — the π-π interactions between stacked aromatic bases in dsDNA reduce their UV absorbance, which recovers on denaturation. The cooperative sigmoidal thermal denaturation curve defines the melting temperature Tm (the temperature at which 50% of the population is denatured), which correlates linearly with GC content.
DNA sequencing has undergone a revolutionary progression from Sanger's chain-termination method to massively parallel next-generation platforms. Sanger sequencing (1977) uses dideoxynucleotide triphosphates (ddNTPs) that lack the 3'-OH required for phosphodiester bond formation; when incorporated stochastically at ~1/100 the rate of the corresponding dNTP, they terminate chain elongation, generating a nested population of fragments ending at every position in the sequence. The four reactions (one per ddNTP), each producing all possible extension products terminating at one base type, are resolved by denaturing polyacrylamide gel electrophoresis (smallest fragments travel farthest from the primer) or capillary electrophoresis, with sequences read from the gel image or fluorescent chromatogram. Automation with fluorescently labeled ddNTPs (four distinct fluorophores, one per base) and capillary electrophoresis enabled Sanger sequencing to produce the first draft human genome (2001, ~$2.7 billion). Next-generation sequencing (NGS) platforms achieve throughput of 300 Gb per run at ~$1,000 per human genome through massively parallel sequencing of millions of fragments simultaneously. Illumina sequencing uses sequencing-by-synthesis, with reversibly terminating fluorescent dNTPs added one at a time to flow-cell-anchored PCR clusters, and images between each base-addition cycle (100–150 bp reads). Roche 454 pyrosequencing detects the light emitted from luciferin oxidation coupled to the ATP produced from pyrophosphate (PPᵢ) released on each dNTP incorporation (up to 1 kb reads). Ion Torrent sequencing detects the H⁺ released during phosphodiester bond formation using a semiconductor pH sensor—the first electronic (non-optical) sequencing approach. Nanopore sequencing passes single-stranded DNA through a protein pore (e.g., α-hemolysin or engineered CsgG) driven by a transmembrane voltage potential and controlled by a helicase motor; each base produces a characteristic disruption in the ionic current through the pore that is decoded in real time, and even chemically modified bases (e.g., 5-methylcytosine) produce distinctive signals. Expandomer sequencing extends nanopore technology by using CuAAC click chemistry to attach each nucleotide to a large macrocyclic polymer tether (~16 kDa, charge -38), physically spacing each base by one tether length as it transits the pore, dramatically improving signal-to-noise ratio; an engineered expandomer synthase (36 mutations from Dpo4 polymerase) incorporates the bulky modified nucleotides during a distributive synthesis step. The nanopore and SMRT (single-molecule real-time) sequencing platforms — in which circular ssDNA templates are sequenced in nanowells using DNA polymerase that releases fluorophore-tagged pyrophosphate as each nucleotide is incorporated — together enabled the Telomere-to-Telomere (T2T) Consortium to complete the full human genome sequence (2022), adding 200 Mb of previously unresolvable centromeric and telomeric repetitive sequence.
PCR and DNA synthesis provide the tools to amplify specific sequences and create novel DNA molecules. PCR exploits the dsDNA melting temperature, primer-mediated specificity, and thermostable polymerases (Taq, from the hot-spring bacterium Thermus aquaticus) to amplify target sequences exponentially through cycles of denaturation (95°C), primer annealing (50–65°C), and extension (72–80°C). Each cycle doubles the number of target copies, yielding >10⁹ copies after 30 cycles. Quantitative PCR (qPCR) tracks amplification in real time using either non-specific dsDNA-binding fluorophores (SYBR Green, which increases fluorescence 1000-fold on binding dsDNA) or sequence-specific TaqMan hydrolysis probes (5'-reporter/3'-quencher oligonucleotides that are cleaved by the 5'→3' exonuclease activity of Taq polymerase during extension, releasing reporter fluorescence). RT-PCR combines reverse transcription of mRNA into cDNA with subsequent PCR amplification to quantify gene expression levels. Clinical applications of PCR include the diagnosis of infectious diseases, the detection of cancer mutations with 10,000-fold improved sensitivity, forensic DNA fingerprinting using short tandem repeat (STR) profiles, and prenatal genetic screening. Oligonucleotide synthesis uses solid-phase phosphoramidite chemistry in which 5'-DMT-protected nucleoside-3'-phosphoramidites are added sequentially (3'→5', opposite to biological synthesis) through four-step cycles of deprotection, coupling, capping (to block failed additions), and oxidation; the practical limit of ~200 bp is set by cumulative coupling inefficiencies. Longer genes are assembled from overlapping oligonucleotides using PCR-based assembly. Gene optimization — redesigning the codons in an open reading frame to match the host organism's codon usage preferences (exploiting genetic code degeneracy, with up to ~10¹⁵⁰ possible codon combinations for a 300-aa protein) and eliminating inhibitory mRNA secondary structures — can improve heterologous protein expression 10- to 100-fold. DNA vaccines offer a synthetic-biology application: an antigen-encoding gene cloned into a non-replicative expression plasmid is delivered to host cells, where the host's own biosynthetic machinery produces the antigen for MHC-mediated immune presentation to CD4⁺ T helper cells, driving antibody production — avoiding egg-based vaccine production limitations and enabling rapid response to emerging pandemic strains.





_for_DNA_nanapore_sequencing.png?revision=1)