A critical clue to understanding the structure of nucleic acids came from the work of Erwin Chargaff (1905–2002). When analyzing DNA from various sources, he found that the relative amounts of G, C, T and A nucleotides varied between organisms but were the same (or very similar) for organisms of the same type or species. On the other hand, the ratios of A to T and of G to C were always equal to 1, no matter where the DNA came from. Knowing these rules, James Watson (b 1928) and Francis Crick (1916–2004) built a model of DNA that fit what was known about the structure of nucleotides and structural data from Rosalind Franklin (1920–1958)206. Franklin got these data by pulling DNA molecules into oriented strands, fibers of many molecules aligned parallel to one another. By passing a beam of X-rays through these fibers she was able to obtain a diffraction pattern. This pattern is based on the structure of DNA molecules and defines key parameters that constrain any model of the molecule’s structure. But making a model of the molecule that would produce the observed X-ray data allowed Watson and Crick to make conclusions about the structure of a DNA molecule.
To understand this process, let us consider the chemical nature of a nucleotide and a nucleotide polymer like DNA. First the nucleotide bases in DNA (A, G, C and T) have a number of similar properties. Each nucleotide has three hydrophilic regions: the negatively charged phosphate group, a sugar which has a number of O–H groups, and a hydrophilic edge of the base (where the N–H and N groups lie). While the phosphate and sugar are three-dimensional moieties, the bases are flat, the atoms in the rings are all in one plane. The upper and lower surfaces of the rings are hydrophobic (non-polar) while the edges have groups that can interact via hydrogen bonds. This means that the amphipathic factors that favor the assembly of lipids into bilayer membranes are also at play in nucleic acid structure. To reduce their interactions with water, in their model Watson and Crick had the bases stacked on top of one another, hydrophobic surface next to hydrophobic surface. This left each base’s hydrophilic edge, with -C=O and -N-H groups that can act as H-bond acceptors and donors, to be dealt with. How were these hydrophilic groups to be arranged? Their insight, led to a direct explanation for why Chargaff’s rules were universal; they recognized that pairs of nucleotide bases, in the two DNA strands, could be arranged in an anti-parallel and complementary orientation. So what does that mean? Each DNA polymer strand has a directionality to it, it runs from the 5’ phosphate group at one end to the 3’ hydroxyl group at the other, each nucleotide monomer is connected to the next through a phosphodiester linkage involving its 5’ phosphate group attached to the 3’ hydroxyl from of the existing strand. When the two strands were arranged in opposite orientations, that is, anti-parallel to one another: one from 5’→3’ and the other 3’←5’, the bases attached to the sugar-phosphate backbone interact with one another in a highly specific way. An A can form two hydrogen bonding interactions with a T on the opposite (anti-parallel) strand, while an G an form three hydrogen bonding interactions with a C. A key feature of this arrangement is that the lengths of the A::T and G:::C base pairs are almost identical. The hydrophobic surfaces of the bases are stacked on top of each other, while the hydrophilic sugar and phosphate groups are in contact with the surrounding aqueous solution. The possible repulsion between negatively charged phosphate groups is neutralized (or shielded) by the presence of positively charged ions present in the solution from which the X-ray measurements were made.
In their final model Watson and Crick depicted what is now known as B-form DNA. This is the usual form of DNA in a cell. However, under different salt conditions, DNA can form two other double helical forms, known as A and Z. While the A and B forms of DNA are "right-handed" helices, the Z-form of DNA is a left-handed helix. We will not concern ourselves with these other forms of DNA, leaving that more more advanced courses.
As soon as the Watson-Crick model of DNA structure was proposed its explanatory power was obvious. Because the A::T and G:::C base pairs are of the same length, the sequence of bases along the length of a DNA molecule (written, by convention in the 5’ to 3’ direction) has little effect on the overall three-dimensional structure of the molecule. That implies that essentially any possible sequence can be found, at least theoretically, in a DNA molecule. If information were encoded in the sequence of nucleotides along a DNA strand, any information could be placed there and that information would be as stable as the DNA molecule itself. This is similar to the storage of information in various modern computer memory devices, that is, any type of information can be stored, because storage does not involve any dramatic change in the basic structure of the storage material. The structure of a flash memory drive is not altered by whether in contains photos of your friends, a song, a video, or a textbook. At the same time, the double-stranded nature of the DNA molecule’s structure and complementary nature of base pairing (A to T and G to C) suggested a simple model for DNA (and information) replication - that is, pull the two strands of the molecule apart and build new (anti-parallel) strands using the two original strands as templates. This model of DNA replication is facilitated by the fact that the two strands of the parental DNA molecule are held together by weak hydrogen bonding interactions, so no chemical reaction is required to separate them, no covalent bonds need to be broken. In fact, at physiological temperatures DNA molecules often open up over short stretches and then close, a process known as DNA breathing207. This makes the replication of the information stored in the molecule conceptually straightforward (even though the actual biochemical process is complex.) The existing strands determine the sequence of nucleotides on the newly synthesized strands. The newly synthesized strand can, in turn, direct the synthesis of a second strand, identical to the original strand. Finally, the double stranded nature of the DNA molecule means that any information within the molecule is, in fact, stored in a redundant fashion. If one strand is damaged, that is its DNA sequence is lost or altered, the second undamaged strand can be used to repair that damage. A number of mutations in DNA are repaired using this type of mechanism (see below).
206 An interesting depiction of this process is provided by the movie “Life Story” http://en.wikipedia.org/wiki/Life_Story_(TV_film)
207 Dynamic approach to DNA breathing: http://www.ncbi.nlm.nih.gov/pubmed/23345902