In laboratories around the world there is an intense desire to sequence more genomes.
- those of a wide variety of organisms to aid in establishing evolutionary relationships;
- those of pooled populations of microorganisms in, for examples, sea water, soil, the large intestine;
- other humans to look for genes that predispose to disease and genetic patterns in various ethnic groups.
All of the sequenced genomes listed in Genome Sizes were determined using the dideoxy method invented by Frederick Sanger and described elsehwere. However, now a great effort is being expended to find ways to sequence DNA more rapidly (and more cheaply).
The Genome Sequencer
Several new methods are being developed and one is already commercially available (the Genome Sequencer 20 System). Its method is called pyrosequencing or sequencing by synthesis. It works like this.
- The DNA to be sequenced is broken up into fragments of ~100 base pairs and denatured to form single-stranded DNA (ssDNA).
- Single ssDNA fragments are attached to microscopic beads, which are separated from each other.
- The polymerase chain reaction (PCR) is run on each bead so that each becomes coated with ~ 10 million identical copies of that fragment.
- The beads are placed singly into separate, microscopic wells (~200,000 of them).
- Each well receives a cocktail of reagents:
- DNA polymerase — for adding deoxyribonucleotides to the ssDNA
- adenosine phosphosulfate (APS)
- ATP sulfurylase — an enzyme that forms ATP from adenosine phosphosulfate (APS) and pyrophosphate (PPi)
- luciferase - an ATPase that catalyzes the conversion of luciferin to oxyluciferin with the liberation of light
After a primer is annealed to the end of the ssDNA, synthesis is ready to begin. As is always true of DNA synthesis, incoming nucleotides are added to the 3' end of the growing chain (left). The nucleotides are supplied as four deoxynucleoside triphosphates. As each nucleotide is added, a molecule containing two phosphate groups — called pyrophosphate (PPi) is split off.
Figure 5.12.1 Pyrosequencing run and the data produced by a single well
The sequencing run:
- Each of the thousands of wells is flooded with one four deoxyribonucleotides, dTTP, dCTP, and dGTP, but instead of dATP (which would trigger the luciferin reaction), deoxyadenosine alpha-thiotriphosphate (dATPαS) is used instead. DNA polymerase ignores the difference and uses it whenever a T is encountered on the ssDNA template, but luciferase doesn't recognize to it.
- In any well where the complementary nucleotide is present at the 3' end of the template, the nucleotide is added and pyrophosphate is liberated.
- The amount of light is proportional to the number of that nucleotide added. So if, for example, the incoming nucleotide is dGTP, and there is a string of 3 Cs on the template, the light emitted will be 3 times brighter than if only one C is present.
- A detector picks up the light (if any) from each well and the data are recorded.
- Then each of the remaining 3 nucleotides are added in sequence.
- Then the sequence of 4 additions is repeated until synthesis is complete.
The above diagram also shows the type of data produced in a single well. The height of the peak of light production gives the number of additions that occurred when a particular nucleotide was added (bottom). Computer software then displays the template sequence (top) for each of the thousands of different fragments sequenced. With this technology, as many as 20 million base pairs of genome sequence can be learned in an instrument run of less than 6 hours.