3.1: DNA Structure
- Page ID
- 135662
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Viewing the three dimensional structures of molecules, like DNA, remained a challenge in the early part of the twentieth century. Following the discovery of X-rays in 1895, researchers soon proposed that the atomic and molecular structure of a compound, such as a crystal, could be determined by beaming X-rays at them and observing their diffraction patterns. From this work, the field of X-ray crystallography was born. X-ray crystallography has been fundamental in numerous scientific breakthroughs, including determining the size of atoms, the lengths and types of chemical bonds, and revealing the structure and function of biological molecules, such as proteins and nucleic acids. Dorothy Crowfoot Hodgkin used X-ray crystallography to determine the structures of cholesterol (1937), penicillin (1946), and vitamin B12 (1956), for which she was awarded the Nobel Prize in Chemistry in 1964. Rosalind Franklin, the British X-ray crystallographer, was one of the first to produce high quality X-ray diffraction images of DNA. From her work, we now know the intricacies of the DNA helix. Today, over 130,000 X-ray crystal structures of proteins, nucleic acids, and other biological molecules have been determined.
Introduction
The first recorded discovery of DNA came in 1869, when an unknown substance found in the pus of wounded soldiers was described by the Swiss physician Friedrich Miescher. As it was found in nuclei of cells, he named this substance "nuclein". In 1878, Albrecht Kossel isolated the nucleic acid component of nuclein. In 1909, Phoebus Levene identified the base, sugar and phosphate components of a nucleotide and would later suggest that DNA consists of a string of four nucleotides (A, C, T, G) linked together through the phosphate groups. In 1910, Albrecht Kossel received the Nobel Prize in Physiology or Medicine for his discovery of the four bases of DNA, as well as of uracil in RNA. In 1928, Frederick Griffith showed that a molecule obtained from pathogenic Streptococcus pneumonia strains could be introduced into healthy S. pneumonia cells, "transforming" them into pathogenic cells. While we now know that Griffith introduced DNA into these bacteria, he did not know the chemical identity of this transforming molecule. It wasn't until the 1930s, that O. Avery, C. MacLeod, and M. McCarty, using improved purification techniques, showed that the transforming molecule was, in fact, DNA. In spite of these results, DNA was not readily accepted as the genetic material. Confirmation of this would come from the experiments of Alfred Hershey and Martha Chase, who determined that DNA could be passed from bacteriophage to E.coli cells. For his contributions to determining DNA as the “stuff of genes”, Hershey shared the 1969 Nobel Prize in Physiology or Medicine with Max Delbruck and Salvador E. Luria.
Understanding how DNA is used in biotechnology requires that you understand the structure of this molecule. In this this section, you will be able to:
- Discuss who "discovered the DNA helix"
- Explain the structure of a nucleotide
- Describe how DNA is structured
- Describe the forms of DNA is found in cells: chromatin, chromosomes and plasmids
- Compare and contrast prokaryotic and eukaryotic DNA
- Explain DNA packaging in eukaryotes
The History of the DNA Helix
The history of determining the structure of the DNA helix is rich and long. In 1937, William Astbury used X-ray crystallography, a method for investigating molecular structure by observing the patterns formed by X-rays shot through a crystal of a substance, to show that DNA had a regular structure. In 1952, Rosalind Franklin, a researcher in the lab of Maurice Wilkins at King's College, together with her research assistant, Raymond Gosling, produced one of the most well-known X-ray diffraction images of the DNA helix, known as "Photo 51" (Figure \(\PageIndex{1}\)). This image helped solidify the theory that DNA was a double helix. At the same time, James Watson and Francis Crick, working together at the University of Cambridge, had started to build a model of the DNA helix. Using data from other research groups, including that of Rosalind Franklin and Erwin Chargaff, they eventually pieced together the puzzle of the DNA helix and published their seminal work in the journal Nature in 1953. In 1962, James Watson, Francis Crick, and Maurice Wilkins were awarded the Nobel Prize in Medicine for their work on DNA structure. Because Nobel Prizes are not awarded posthumously, Rosalind Franklin did not receive this award, having died of ovarian cancer in 1958. Several controversies still exist surrounding the discovery of the DNA helix, including the story that James Watson was shown Photo 51 without Rosalind Franklin's permission. Crick and Watson also received a copy of a report containing several of Franklin's crystallographic calculations. How much of this data contributed to their final DNA model will never be known. However, to simply attribute the discovery of the DNA helix to just James Watson and Francis Crick would discount the critical contributions of researchers like William Astbury, Erwin Chargaff, Raymond Gosling, and Rosalind Franklin. Today our knowledge of DNA structure is because of all of them.
Interestingly, in an ironic twist, the Russian biologist Nikolai Koltsov proposed, in 1927, the presence of a "giant hereditary molecule" in cells made up of "two mirror strands that would replicate in a semi-conservative fashion using each strand as a template". A pretty fantastic theory since it was proposed long before Watson, Crick, Gosling, and Franklin worked out the structure of the DNA double-helix.
DNA is Made of Nucleotides and Forms a Double Helix
The DNA molecule is a nucleic acid polymer made of subunits called nucleotides. A nucleotide (Figure \(\PageIndex{2}\)) consists of three chemical groups:
- a pentose sugar
- one to three phosphate groups
- a nitrogenous base
The sugar component of a nucleotide is a pentose sugar made of five carbons. The carbons are numbered clockwise from the oxygen as 1', 2', 3', 4', and 5' (e.g. 1' is read as “one prime”). In the case of DNA, this pentose sugar is called deoxyribose as it lacks an oxygen at the 2' carbon. The nitrogenous bases of DNA are adenine, guanine, cytosine, and thymine (Figure \(\PageIndex{2}\)). Adenine and guanine are known as the purines, whereas cytosine and thymine are classified as pyrimidines. An introduction to nucleic acids can be found in Chapter 2.7: Nucleotides.


Making a DNA molecule requires linking nucleotides so that they form a long polynucleotide chain. Joining nucleotides is done by forming a phosphodiester bond between the phosphate group found attached to the 5' carbon of one nucleotide and the hydroxyl group attached to the 3' carbon of the next nucleotide. The resulting single-strand of DNA has its sugar–phosphate groups lined up as a “backbone” with the nucleotide bases sticking out from this backbone. However, DNA is not a single-strand but two strands that form a coiled double helix (Figure \(\PageIndex{3}\)). The two DNA strands of the helix have the opposite orientation, or run "anti-parallel" to each other. Hydrogen bonds form between complementary bases (i.e., adenine with thymine; cytosine with guanine), linking the two strands together along their length. Refer back to Chapter 2.7: Nucleotides to read more about the structure of the DNA helix.
DNA Comes in Many Forms: Chromatin, Chromosomes, and Plasmids
Stretched end-to-end, a DNA molecule in a single human cell would come to a length of about 2 meters. The DNA of E. coli is about 1000 times as long as the bacterial cell itself. As a result, DNA must be packaged in a very ordered way to fit and function within a cell. The strategy used to package DNA is dependent upon whether the cell is prokaryotic or eukaryotic.
Prokaryotic DNA
Prokaryotic cells, because of the absence of a nucleus, package the majority of their DNA into a region of the cytoplasm known as the nucleoid. In most prokaryotic cells, the DNA found within the nucleoid is a large circular piece of double-stranded DNA called a bacterial chromosome or genophore (Figure \(\PageIndex{4}\)). However, other bacteria may have linear chromosomes found within this region.
To ensure the bacterial chromosome can fit into the nucleoid, it is packaged through supercoiling. Supercoiling happens in both prokaryotic and eukaryotic cells when the DNA helix is subject to strain by being either underwound or overwound (Figure \(\PageIndex{5}\)). A good analogy for supercoiling is the twists found in a telephone cord. Most DNA found in bacterial cells is underwound, or negatively supercoiled, because it makes the separation of the two DNA strands easier during DNA replication and transcription. Several structural proteins and enzymes within the nucleoid help to maintain the supercoiled nature of the bacterial chromosome. One of these enzymes is known as a topoisomerase.
The bacterial chromosome contains the genes responsible for the day-to-day functions of the cell. However, in addition to the chromosome, bacteria also have smaller pieces of circular DNA known as plasmids. A plasmid is a small, extrachromosomal DNA molecule that is found outside of the nucleoid and can replicate independently of the bacterial chromosome. Naturally occurring plasmids contain genes for specialized functions, such as antibiotic resistance, virulence, nitrogen fixation, or conjugation. They are significantly smaller than the bacterial chromosome, typically several thousand base pairs, although plasmids as large as several hundred thousand base pairs have been found.
In addition, plasmids can be artificially constructed and introduced into bacterial cells. These types of plasmids (Figure \(\PageIndex{6}\)) serve as tools in genetics and biotechnology labs, where they are used to clone and amplify genes for commercial applications. Plasmids that are sold commercially contain several regions in common with naturally occurring ones, such as antibiotic resistance genes. In addition, they will have a region for the insertion of genes, referred to as a multiple cloning site or MCS. This MCS is flanked by a promoter sequence and a polyadenylation signal so that the inserted gene will be properly transcribed into mRNA by the bacterial cell. More about these flanking regions of DNA can be found in Chapter 3.3 Transcription of RNA.
Eukaryotic DNA
If the DNA of all 46 human chromosomes were stitched together into a single molecule, it would be over 2 meters long. This is a problem that all eukaryotic cell have - the DNA is too long to fit into a nucleus. As such, eukaryotic cells must employ a unique type of packaging strategy to fit their DNA inside the nucleus. Eukaryotes begin to package their DNA by converting it into a form called chromatin. Chromatin is a complex of DNA and proteins. Eukaryotic cells have two types of chromatin: euchromatin and heterochromatin. In euchromatin, the DNA is found associated with proteins known as histones. A histone is often referred to as a octamer - a complex of 8 small, positively charged protein subunits (H2A, H2B, H3, and H4) found in the histone as pairs (Figure \(\PageIndex{7}\)). To form euchromatin, the DNA helix wraps around a histone octamer almost two times, creating a structure called a nucleosome. The DNA is then secured to the histone octamer by the histone protein, H1. The interaction between the DNA and the histone to form a nucleosome is simply the result of the attraction between negatively charged DNA and positively charged histone proteins. The nucleosome is linked to the next one by a short strand of "linker" DNA that is free of histones and is essentially the DNA helix. The length of linker DNA varies from cell type to cell type but is consistent within the cell, producing nucleosomes at regular intervals. Because of its appearance under the electron microscope, euchromatin is frequently referred to as "beads on a string", with the "bead" as the nucleosome and the "string" the linker DNA (Figure \(\PageIndex{7}\)). Once formed, certain sections of euchromatin condense further when neighboring nucleosomes stack compactly onto each other to form a 30-nm–wide fiber called heterochromatin. Heterochromatin condensing is thought to be the result of interactions between the H1 proteins.
Eukaryotic cells, in the interphase stage of the cell cycle, will be a mixture of heterochromatin and euchromatin. However, as the cell progresses towards division (i.e. mitosis or meiosis), the DNA is packaged beyond the heterochromatin level to form a chromosome. A chromosome ("chromo" = color; "soma" = body) is a highly organized thread-like structure composed of chromatin and associated proteins. The chromosome represents the highest level of DNA organization found within a eukaryotic cell. In dividing cells, condensing into a chromosome begins after the DNA has been replicated and concludes during metaphase (Figure \(\PageIndex{8}\)). Euchromatin is condensed into heterochromatin. The heterochromatin forms loops that are 300 nm wide. These loops are condensed further to form the 700 nm wide chromatid. During metaphase, the chromosome is made of two chromatids, known as sister chromatids, joined at a region of heterochromatin called a centromere. Once cell division is complete, the DNA will de-condense back to its euchromatin and heterochromatin forms.
Animation: DNA Packaging
The form of chromatin in the eukaryotic cell has a direct correlation to the function of its DNA. The majority of chromatin in the nucleus is euchromatin and represents the form of DNA where transcription occurs. Stretches of euchromatin are rich in genes and are transcriptionally active in the cell. DNA in the form of heterochromatin has fewer genes and is thought to be inaccessible to the transcription machinery. As such, it represents a more inactive or "silent" form of DNA. For more information about transcription, refer to Chapter 3.3 Transcription of RNA. All eukaryotic cells will have stretches of heterochromatin that do not de-condense to euchromatin but persist as "permanent" heterochromatin. Examples of these regions are the centromere and telomeres of the chromosome.
The DNA helix is made of two anti-parallel polynucleotide strands that resemble a "spiral staircase". The sugar-phosphate groups of the nucleotides join to form an outer "backbone" that coils like the rails, with the bases projecting to the inside of the helix, like the steps of the staircase.
The major concepts to remember are:
- the two polynucleotide chain of the DNA helix run in the opposite direction to one another (i.e. are anti-parallel)
- the bases of the DNA helix pair to one another in a complementary fashion - C with G and A with T
- DNA can be found in a prokaryotic cell as a circular chromosome, called a genophore, and as small pieces of circular DNA, known as plasmids
- DNA in eukaryotic cells is found as chromatin or chromosomes
- Eukaryotic cells have two forms of chromatin: euchromatin and heterochromatin
- Euchromatin is made of a DNA helix wrapped around histone protein complexes
- Heterochromatin is formed by condensing euchromatin further into larger coils
- Heterochromatin is condensed further to form the chromatids of a chromosome
Glossary
Adenine (A): a nitrogenous base that pairs with thymine (T) in DNA through two hydrogen bonds.
Base pair: a pair of complementary nitrogenous bases in DNA, consisting of adenine-thymine (A-T) and guanine-cytosine (G-C).
Chromatin: a form of organized DNA found within the nucleus of a cell; found as two types: euchromatin and heterochromatin
Chromosome: the most highly condensed and organized form of DNA; a thread-like structure composed of DNA and associated proteins
Complementary base pairing: the specific pairing of nitrogenous bases in DNA (A pairs with T, G pairs with C) through hydrogen bonding
Cytosine (C): a nitrogenous base that pairs with guanine (G) in DNA through three hydrogen bonds
Deoxyribonucleic Acid (DNA): a molecule that carries genetic instructions for the growth, development, function, and reproduction of living organisms
Deoxyribose: a five-carbon sugar found in DNA nucleotides
Double Helix: a twisted ladder-like structure of DNA, consisting of two strands running in opposite directions
Euchromatin: a type of chromatin; composed of the DNA helix wound around proteins called histones
Gene: a segment of DNA that contains the instructions for synthesizing a specific protein or RNA molecule
Genophore: the chromosome equivalent found in prokaryotes and viruses
Guanine (G): a nitrogenous base that pairs with cytosine (C) in DNA through three hydrogen bonds
Heterochromatin: a type of chromatin; composed of euchromatin wound into a large solenoid structure through its associated with nuclear proteins
Histone: a complex of assembled histone proteins that provides structural support for euchromatin; classified into 5 types: 1, 2A, 2B, 3, 4, and 5
Nucleosome: the section of euchromatin that is composed of a DNA helix wrapped around a histone
Nucleotide: the basic building block of DNA, consisting of one to three phosphate groups, a deoxyribose sugar, and a nitrogenous base
Phosphodiester bond: the bond found between nucleotides in DNA and RNA
Purine: the organic bases adenine and guanine found in nucleic acids
Pyrimidine: the organic bases cytosine, uracil and thymine found in nucleic acids
Supercoiling: the twisting of the DNA helix
Thymine (T): a nitrogenous base that pairs with adenine (A) in DNA
Topoisomerase: the enzyme complex responsible for the elimination of supercoiling
Transcription: the process of copying a gene’s DNA sequence into messenger RNA (mRNA)

