We have presented DNA replication (the same, apparently homologous process is used in all known organisms) in as conceptually simple terms as we can, but it is important to keep in mind that the actual machinery involved is complex. In part this complexity arises because the process is topologically constrained and needs to be highly accurate. In the bacterium Escherichia coli over 100 genes are involved in the processes of DNA replication and repair. To insure that replication is controlled and complete, replication begins at specific sequences along the DNA strand, known as origins of replication or origins for short. Origin DNA sequences are recognized by specific DNA binding proteins. The binding of these proteins initiates the assembly of an origin recognition complex, an ORC. Various proteins then bind to the DNA to locally denature (unwind and separate) and block the single strands from reannealing. This leads to the formation of a replication bubble. Multiprotein complexes, known as a replication forks, then assembles on the two DNA strands. Using a single replication origin and two replication forks moving in opposite directions, a rapidly growing E. coli can replicate its ~4,700,000 base pairs of DNA (which are present in the form of a single circular DNA molecule) in ~40 minutes. Each replication fork moves along the DNA adding ~1000 base pairs of DNA per second to the newly formed DNA polymer. While a discussion of the exact mechanisms involved is beyond our scope here, it is also critical that DNA is complete before a cell attempts to divide.
DNA synthesis (replication) is a highly accurate process; the polymerase makes about one error for every 10,000 bases it adds. But that level of error would almost certainly be highly deleterious, and in fact most of these errors are quickly recognized as mistakes. To understand how, remember that correct AT and GC base pairs have the same molecular dimensions, that means that incorrect AG, CT, AC, and GT base pairs are either too long or too short. By responding to base pair length, molecular machines can recognize a mistake in base pairing as a structural defect in the DNA molecule. When a mismatched base pair is formed and recognized, the DNA polymerase stops forward synthesis, reverses its direction, and removes the region of the DNA containing the mismatched base pair using a “DNA exonuclease” activity. It then resynthesizes the region, (hopefully) correctly. This process is known as proof-reading; the proof-reading activity of the DNA polymerase complex reduces the total DNA synthesis error rate to ~1 error per 1,000,000,000 (109) base pairs synthesized.
At this point let us consider nomenclature, which can seem arcane and impossible to understand, but in fact obeys reasonably straightforward rules. An exonuclease is an enzyme that can bind to the free end of a nucleic acid polymer and remove nucleotides through a hydrolysis reaction of the phosphodiester bond. A 5' exonuclease cuts off a nucleotide located at the 5' end of the molecule, a 3' exonuclease, cuts off a nucleotide located at the molecule’s 3' end. An intact circular nucleic acid molecule is immune to the effects of an exonuclease. To break the bond between two nucleotides in the interior of a nucleic acid molecule (or in a circular molecule, which has no ends), one needs an endonuclease activity.
As you think about the processes involved, you come to realize that once DNA synthesis begins, it is important that it continues without interruption. But the interactions between nucleic acid chains are based on weak H-bonding interactions, and the enzymes involved in the DNA replication process can be expected to dissociate from the DNA because of the effects of thermal motion, imagine the whole system jiggling and vibrating - held together by relatively weak interactions. We can characterize how well a DNA polymerase molecule remains productively associated with a DNA molecule in terms of the number of nucleotides it adds to a new molecule before it falls off; this is known as its “processivity”. So if you think of the DNA replication complex as a molecular machine, you can design ways to insure that the replication complex has high processivity, basically by keeping it bound to the DNA. One set of such machines is the polymerase sliding clamp and clamp loader (see: http://youtu.be/QMhi9dxWaM8). The DNA polymerase complex is held onto the DNA by a doughnut shaped protein, known as a sliding clamp, that encircles the DNA double helix and is strongly bound to the DNA polymerase. So the question is, how does a protein come to encircle a DNA molecule? The answer is that the clamp protein is added to DNA by another protein molecular machine known as the clamp loader209. Once closed around the DNA the clamp can move freely along the length of the DNA molecule, but it cannot leave the DNA. The clamp’s sliding movement along DNA is diffusive – that is, driven by thermal motion. Its movement is given a direction because the clamp is attached to the DNA polymerase complex which is adding monomers to the growing nucleic acid polymer. This moves the replication complex (inhibited from diffusing away from the DNA by the clamp) along the DNA in the direction of synthesis. Processivity is increased since, in order to leave the DNA the polymerase has to disengage from the clamp or the clamp as to be removed by the clamp loader acting in reverse, that is acting as an unloader.