Bacteriophage are viruses that infect bacteria. Because of their very large number of progeny and ability to recombine in mixed infections (more than one strain of bacteria in an infection), they have been used extensively in high-resolution definition of genes. Much of what we know about genetic fine structure, prior to the advent of techniques for isolating and sequencing genes, derive form studies in bacteriophage.
Bacteriophage have been a powerful model genetic system, because they have small genomes, have a short life cycle, and produce many progeny from an infected cell. They provide a very efficient means for transfer of DNA into or between cells. The large number of progeny makes it possible to measure very rare recombination events.
Lytic bacteriophage form plaques on lawns of bacteria; these are regions of clearing where infected bacteria have lysed. Early work focused on mutants with different plaque morphology, e.g. T2 r, which shows rapid lysis and generates larger plaques, or on mutants with different host range, e.g. T2 h, which will kill both host strains B and B/2.
A cis-trans complementation test defines a cistron, which is a gene
Seymour Benzer used the rIIlocus of phage T4 to define genes by virtue of their behavior in a complementation test, and also to provide fundamental insight into the structure of genes (in particular, the arrangement of mutable sites - see the next section). The difference in plaque morphology between rand r+phage is easy to see (large versus small, respectively), and Benzer isolated many r mutants of phage T4. The wild type, but not any rIImutants, will grow on E. colistrain K12(l), whereas both wild type and mutant phage grow equally well on E. colistrain B. Thus the wild phenotype is readily detected by its ability to grow in strain K12 (l).
If E. colistrain K12 (l) is co-infected with 2 phage carrying mutations at different positions in rIIA, you get no multiplication of the phage (except the extremely rare wild type recombinants, which occur at about 1 in 106 progeny). In the diagram below, each line represents the chromosome from one of the parental phage.
phage 1 _|__x______|________|_
phage 2 _|_______x_|________|_
Likewise, if the two phage in the co-infection carry mutations at different positions in rIIB, you get no multiplication of the phage (except the extremely rare wild type recombinants, about 1 in 106).
phage 3 _|_________|_x______|_
phage 4 _|_________|______x_|_
However, if one of the co-infecting phage carries a mutation in rIIAand the other a mutation in rIIB, then you see multiplication of the phage, forming a very large number of plaques on E. colistrain K12 (l).
phage 1 _|__x______|________|_ Provides wt rIIB protein
phage 4 _|_________|______x_|_ Provides wt rIIA protein
Together these two phage provide all the phage functions - they complementeach other. This is a positive complementation test. The first two examples show no complementation, and we place them in the same complementation group. Mutants that do not complement are placed in the same complementation group; they are different mutant alleles of the same gene. Benzer showed that there were two complementation groups (and therefore two genes) at the r II locus, which he called A and B.
Question 1.3.In the mixed infection with phage 1 and phage 4, you also obtain the rare wild type recombinants, but there are more recombinants than are seen in the co-infections with different mutant alleles. Why?
Benzer’s experiments analyzing the rIIlocus of bacteriophage T4 formalized the idea of a cis‑transcomplementation test to define a cistron, which is an operational definition of a gene. First, let’s define cis and transwhen used to refer to genes. In the cisconfiguration, both mutations are on the same chromosome. In the transconfiguration, each mutation is on a different chromosome
Mutations in the same gene will not complement in trans, whereas mutations in different genes will complement in trans(Fig. 1.12). In the cisconfiguration, the other chromosome is wild type, and wild‑type will complement any recessive mutation.
The complementation groupcorresponds to a genetic entity we call a cistron, it is equivalent to a gene.
This test requires a diploid situation. This can be a natural diploid (2 copies of each chromosome) or a partial, or merodiploid, e.g. by conjugating with a cell carrying an F' factor. Some bacteriophage carry pieces of the host chromosome; these are called transducingphage. Infection of E. coli with a transducing phage carrying a mutation in a host gene is another way to create a merodiploid in the laboratory for complementation analysis.
Figure 1.12. The complementation test defines the cistron and distinguishes between two genes.
Recombination within genes allows construction of a linear map of mutable sites that constitute a gene
Once the recombination analysis made it clear that chromosomes were linear arrays of genes, these were thought of as "string of pearls" with the genes, or "pearls," separated by some non‑genetic material (Fig. 1.13). This putative non-genetic material was thought to be the site of recombination, whereas the genes, the units of inheritance, were thought to be resistant to recombination. However, by examining the large number of progeny of bacteriophage infections, one can demonstrate that recombination can occur within a gene. This supports the second model shown in Fig. 1.13. Because of the tight packing of coding regions in phage genomes, recombination almost always occurs within genes in bacteriophage, but in genomes with considerable non-coding regions between genes, recombination can occur between genes as well.
Figure 1.13.Models for genes as either discrete mutable units separate by non-genetic material (top) or as part of a continuous genetic material (bottom).
The tests between these two models required screening for genetic markers (mutations) that are very close to each other. When two markers are very close to each other, the recombination frequency is extremely low, so enough progeny have to be examined to resolve map distances of, say 0.02 centiMorgans = 0.02 map units = 0.02 % recombinants. This means that 2 out of 10,000 progeny will show recombination between two markers that are 0.02 map units apart, and obviously one has to examine at least 10,000 progeny to reliably score this recombination. That's the power of microbial genetics ‑ you actually can select or screen through this many progeny, sometimes quite easily.
An example of recombination in phage is shown in Fig. 1.14. Wild type T2 phage forms small plaques and kills only E. colistrain B. Thus different alleles of hcan be distinguished by plating on a mixture of E. colistrains B and B/2. The phage carrying mutant hallele will generate clear plaques, since they kill both strains. Phage with the wild type h+ give turbid plaques, since the B/2 cells are not lysed but B cells are. When a mixture of E. colistrains B and B/2 are co-infected with both T2 hrand T2 h+r+, four types of plaques are obtained. Most have the parental phenotypes, clear and large or turbid and small. These plaques contain progeny phage that retain the parental genotypes T2 hrand T2 h+r+, respectively. The other two phenotypes are nonparental, i.e. clear and small or turbid and large. These are from progeny with recombinant genotypes, i.e. T2 hr+and T2 h+r. In this mixed infection, recombination occurred between two phage genomes in the same cell.
Figure 1.14. Recombination in bacteriophage
The first demonstration of recombination within a gene came from work on the rIIAand rIIBgenes of phage T4. These experiments from Seymour Benzer, published in 1955, used techniques like that diagrammed in Fig. 1.14. Remember that mutations in the rgene cause rapid lysis of infected cells, i.e. the length of the lytic cycle is shorter. The difference in plaque morphology between rand r+phage is easy to see (large versus small, respectively). These two genes are very close together, and many mutations were independently isolated in each. This was summarized in the discussion on complementation above.
Consider the results of infection of a bacterial culture with two mutant alleles of gene rIIA.
and T4rIIA27 _|_______x______________________|_
(x marks the position of the mutation in each allele).
Progeny phage from this infection include those with a parental genotype (in the great majority), and at a much lower frequency, two types of recombinants:
wild type T4 r+ _|______________________________|_
double mutant T4rIIA6 rIIA27 _|_______x_______________x______|_
The wild type is easily scored because it, and not any rIImutants, will grow on E. coli strain K12(l), whereas both wild type and mutant phage grow equally well on E. coli strain B. Thus you can selectfor the wild type (and you will see only the desired recombinant). Finding the double mutants is more laborious, because they are obtained only by screening through the progeny, testing for phage that when backcrossed with the parental phage result in no wild type recombinant progeny.
Equal numbers of wild type and double mutant recombinants were obtained, showing that recombination can occur within a gene, and that this occurs by reciprocal crossing over. If recombination were only between genes, then no wild type phage would result. A large spectrum of recombination values was obtained in crosses for different alleles, just like you obtain for crosses between mutants in separate genes.
Several major conclusions could be made as a result of these experiments on recombination within the rIIgenes.
- A large number of mutable sites occur within a gene, exceeding some 500 for the rIIA and rIIBgenes. We now realize that these correspond to the individual base pairs within the gene.
- The genetic maps are clearly linear, indicating that the gene is linear. Now we know a gene is a linear polymer of nucleotides.
- Most mutations are changes at one mutable site (point mutations). Many genes can be restored to wild type by undergoing a reverse mutation at the same site (reversion).
- Other mutations cause the deletionof one or more mutable sites, reflecting a physical loss of part of the rII gene. Deletions of one or more mutable site (base pair) are extremely unlikely to revert back to the original wild type.
One gene encodes one polypeptide
One of the fundamental insights into how genes function is that one gene encodes one enzyme (or more precisely, one polypeptide). Beadle and Tatum reached this conclusion based on their complementation analysis of the genes required for arginine biosynthesis in fungi. They showed that a mutation in each gene led to a loss of activity of one enzyme in the multistep pathway of arginine biosynthesis. As discussed above in the section on genetic dissection, a large number of Arg auxotrophs (requiring Arg for growth) were isolated, and then organized into a set of complementation groups, where each complementation groups represents a gene.
The classic work of Beadle and Tatum demonstrated a direct relationship between the genes defined by the auxotrophic mutants and the enzymes required for Arg biosynthesis. They showed that a mutation in one gene resulted in the loss of one particular enzymatic activity, e.g. in the generalized scheme below, a mutation in gene 2 led to a loss of activity of enzyme 2. This led to an accumulation of the substrate for that reaction (intermediate N in the diagram below). If there were 4 complementation groups for the Arg auxotrophs, i.e. 4 genes, then 4 enzymes were found in the pathway for Arg biosynthesis. Each enzyme was affected by mutations in one of the complementation groups.
M ® N ® O ® P ® Arg
enzyme 1 enzyme 2 enzyme 3 enzyme 4
gene 1 gene 2 gene 3 gene 4
Figure 1.15. A general scheme showing the relationships among metabolic intermediates (M, N, O, P), and end product (Arg), enzymes and the genes that encode them.
In general, each step in a metabolic pathway is catalyzed by an enzyme (identified biochemically) that is the product of a particular gene (identified by mutants unable to synthesize the end product, or unable to break down the starting compound, of a pathway). The number of genes that can generate auxotrophic mutants is (usually) the same as the number of enzymatic steps in the pathway. Auxotrophic mutants in a given gene are missing the corresponding enzyme. Thus Beadle and Tatum concluded that one gene encodes one enzyme. Sometimes more than one gene is required to encode an enzyme because the enzyme has multiple, different polypeptide subunits. Thus each polypeptide is encoded by a gene.
The metabolic intermediates that accumulate in each mutant can be used to place the enzymes in their order of actionin a pathway. In the diagram in Fig. 1.15, mutants in gene 3 accumulated substance O. Feeding substance O to mutants in gene 1 or in gene 2 allows growth in the absence of Arg. We conclude that the defects in enzyme 1 or enzyme 2, respectively, are upstream of enzyme 3. In contrast, feeding substance O to mutants in gene 4 will not allow growth in the absence of Arg. Even though this mutant can convert substance O to substance P, it does not have an active enzyme 4 to convert P to Arg. The inability of mutants in gene 4 to grow on substance O shows that enzyme 4 is downstream of enzyme 3.
Question 1.4.Imagine that you are studying serine biosynthesis in a fungus. You isolate serine auxotrophs, do all the pairwise crosses of the mutants and discover that the auxotrophs can be grouped into three complementation groups, called A, B and C. You also discover that a different metabolic intermediate accumulates in members of each complementation group - substance A in auxotrophs in the A complementation group, substance B in the B complementation group and substance C in the C complementation group. Each of the intermediates is fed to auxotrophs from each of the three complementation groups as tabulated below. A + means that the auxotroph was able to grow in media in the absence of serine when fed the indicated substance; a - denotes no growth in the absence of serine.
mutant in complementation group A
mutant in complementation group B
mutant in complementation group C
In the biosynthetic pathway to serine in this fungus, what is the order of the enzymes encoded in the three complementation groups? Enzyme A is encoded by the gene that when altered generates mutants that fall into complementation group A, etc.
The gene and its polypeptide product are colinear
Once it was determined that a gene was a linear array of mutable sites, that genes are composed of a string of nucleotides called DNA (see Chapter 2), and that each gene encoded a polypeptide, the issue remained to be determined how exactly that string of nucleotides coded for a particular amino acid sequence. This problem was studied along several avenues, culminating in a major achievement of the last half of the 20th century – the deciphering of the genetic code. The detailed assignment of particular codons (triplets of adjacent nucleotides) will be discussed in Chapter 13. In the next few sections of this chapter, we will examine how some of the basic features of the genetic code were deciphered.
A priori, the coding units within a gene couldencode both the composition and the address for each amino acid, as illustrated in Model 1 of Fig. 1.17. In this model, the coding units could be scrambled and still specify the same protein. In such a situation, the polypeptide would not be colinear with the gene.
Figure 1.16.Alternative models for gene and codon structure.
In an alternative model (Model 2 in Fig. 1.16), the coding units only specify the composition, but not the position, of an amino acid. The "address" of the amino acid is derived from the position of the coding unit within the gene. This model would predict that the gene and its polypeptide product would be colinear - e.g. mutation in the 5th coding unit would affect the 5th amino acid of the protein, etc.
Charles Yanofsky and his co-workers (1964) tested these two models and determined that the gene and the polypeptide product are indeed colinear. They used recombination frequencies to map the positions of different mutant alleles in the gene that encodes a particular subunit of the enzyme tryptophan synthase. They then determined the amino acid sequence of the wild type and mutant polypeptides. As illustrated in Fig. 1.17, the position of a mutant allele on the recombination map of the gene corresponds with the position of the amino acid altered in the mutant polypeptide product. For instance, allele A101 maps to one end of the gene, and the corresponding Glu ® Val replacement is close to the N terminus of the polypeptide. Allele A64maps close to the other end of the gene, and the corresponding Ser ® Leu replacement is close to the C terminus of the polypeptide. This correspondence between the positions of the mutations in each allele and the positions of the consequent changes in the polypeptide show that Model 1 can be eliminated and Model 2 is supported.
Figure 1.17.The polypeptide is colinear with the gene.
Mutable sites are base pairs along the double helix
The large number of mutable sites found in each gene, and between which recombination can occur, leads one to conclude that the mutable sites are base pairs along the DNA. Sequence determination of the wild type and mutant genes confirms this conclusion.
Single amino acids are specified by three adjacent nucleotides, which are a codons
This conclusion requires three pieces of information.
First of all, adjacent mutable sites specify amino acids. Reaching this conclusion required investigation of the fine structure of a gene, including rare recombination between very closely linked mutations within a gene. Yanofsky and his colleagues, working with mutations the trpA gene of E. coli, encoding tryptophan synthase, showed that different alleles mutated in the same codon could recombine (albeit at very low frequency). (This is the same laboratory and same system that was used to show that a gene and its polypeptide product are colinear.) Thus recombination between two different alleles can occur within a codon, which means that a codon must have more than one mutable site. We now recognize that a mutable site is a nucleotide in the DNA. Thus adjacent mutable sites (nucleotides) encode a single amino acid.
Let’s look at this in more detail (Fig. 1.18). Yanofsky and colleagues examined two different mutant alleles of trpA, each of which caused alteration in amino acid 211 of tryptophan synthase. In the mutant allele A23, wild type Gly is converted to mutant Arg. In the mutant allele A46, wild type Gly is converted to mutant Glu.
GGA (Gly 211) --> AGA (Arg 211) mutant allele A23
GGA (Gly 211) --> GAA (Glu 211) mutant allele A46
A23 ´ A46 AGA ´ GAA ® GGA (wild type Gly 211 in 2 out of 100,000 progeny)
Figure 1.18.Recombination can occur between two mutant alleles affecting the same codon.
Alleles A23and A46are not alternative forms of the same mutable site, because recombination to yield wild type occurs, albeit at a very low frequency (0.002%; the sites are very close together, in fact in the same codon!). If they involved the same mutable site, one would never see the wild-type recombinant.
The second observation is that the genetic code is non-overlapping. This was shown by demonstrating that a mutation at a single site alters only one amino acid. This conflicts with the predictions of an overlapping code (see Fig. 1.19), and thus the code must be non-overlapping.
Figure 1.19. Predictions of the effects of nucleotide substitutions, insertions or deletions on polypeptides encoded by an overlapping, a punctuated, or a nonoverlapping, nonpunctuated code.
The third observation is that the genetic code is read in triplets from a fixed starting point. This was shown by examining the effect of frameshift mutations. As shown in Fig. 1.19, a code lacking punctuation has a certain reading frame. Insertions or deletions of nucleotides are predicted to have a drastic effect on the encoded protein because they will change that reading frame. The fact that this was observed was one of the major reasons to conclude that the mRNA molecules encoded by genes are read in successive blocks of three nucleotides in a particular reading frame.
For the sequence shown in Fig. 1.20, insertion of an A shifts the reading frame, so all amino acids after the insertion differ from the wild type sequence. (The 4th amino acid is still a Gly because of degeneracy in the code: both GGC and GGG code for Gly.) Similarly, deletion of a U alters the entire sequence after the deletion.
Figure 1.20. Frameshift mutations show that the genetic code is read in triplets.
These observations show that the nucleotide sequence is read, or translated, from a fixed starting point without punctuation. An alternative model is that the group of nucleotides encoding an amino acid (the codon) could also include a signal for the end of the codon (Model 2 in Fig. 1.19). This could be considered a "comma" at the end of each codon. If that were the case, insertions or deletions would only affect the codon in which they occur. However, the data show that all codons, including and after the one containing the insertion or deletion, are altered. Thus the genetic code is not punctuated, but is read in a particular frame that is defined by a fixed starting point (Model 3 in Fig. 1.19). That starting point is a particular AUG, encoding methionine. (More about this will be covered in Chapter 13).
The results of frame-shift mutations are so drastic that the proteins are usually not functional. Hence a screen or selection for loss-of-function mutants frequently reveals these frameshift mutants. Simple nucleotide substitutions that lead to amino acid replacements often have very little effect on the protein, and hence have little, or subtle, phenotypes.
A double mutant generated by crossing over between the insertion (+) and deletion (‑) results in an (almost) normal phenotype, i.e. reversion of insertion or deletion.
A gene containing three closely spaced insertions (or deletions) of single nucleotides will produce a functional product. However, four or five insertions or deletions do not give a functional product (Crick, Barnett, Brenner and Watts‑Tobin, 1961). This provided the best evidence that the genetic code is read in groups of three nucleotides(not two or four). Over the next 5 years the code was worked out (by 1966) and this inference was confirmed definitively.