Skip to main content
Biology LibreTexts

1: Fundamental Properties of Genes

Genetic dissection by complementation

Genes are the hereditary units that when altered change a phenotype; genes are classically defined by their effects on phenotype. But in many cases more than one gene affects a phenotype. Metabolic pathways, such as synthesis of DNA, repair of DNA, synthesis of leucine, or breakdown of starch occur in multiple steps catalyzed by enzymes. Each subunit of each enzyme is encoded in a gene, and all those genes are needed for the efficient running of the pathway. Multiple genes also determine complex traits, such as susceptibility to substance abuse, diabetes, and other diseases, and probably less pressing concerns, such as retaining a healthy head of hair after you are 40.

Many pathways have been elucidated by finding many mutants that are defective in that process, hopefully enough to sample every gene in the organism (saturation mutagenesis), and grouping them according to the gene that is mutated. All the mutations in the same gene fall into the same complementation group. Two mutants complement each other if they restore the normal phenotype when together in a diploid. This occurs when the mutants have mutations in different genes. If one is examining mutants with a similar phenotype (e.g. inability to grow on leucine or inability to make DNA), then tests of all pairwise combinations of the mutants will place them into complementation group, which complement between groups but not within groups. The complementation groups then define the genes in the process under study. This is a powerful method of genetic dissection of a pathway. Complementation and its functions are discussed in more detail here.

Genetic methods in microorganisms

The genetic systems found in bacteria and fungi are particularly powerful. The small size of the genome(all the genetic material in an organism), the ability to examine both haploid and diploid forms, and the ease of large-scale screens have made them the method of choice for many investigations. Some of the key features will be summarized in this section.

Microorganisms such as bacteria and fungi have several advantages for genetic analysis. They have a haploid genome, thus an investigator can detect recessive phenotypes easily and rapidly. In the haploid (1N) state, only one allele is present for each gene, and thus its phenotype is the one observed in the organism.

Bacteria can carry plasmids and can be infected with viruses, each of which are capable of carrying copies of bacterial genes. Thus bacteria can be partially diploid, or merodiploid, for some genes. This allows one to test whether alleles are dominant or recessive. 

Bacteria are capable of sexual transfer of genetic information, during which time homologous chromosomes can recombine. Thus one can use recombination frequencyto map genes, analogous to the process in diploid sexual organisms. Indeed, a high frequency of recombination was essential in investigations of the fine structure of genes.

Bacteria grow, or increase in cell number, very rapidly. Generation times can be as short as 20 to 30 minutes. Thus many generations can be examined in a short time.

An investigator can obtain large quantities of mutant organisms for biochemical fractionation.

Bacterial genomes are small, ranging from about 0.580 (Mycoplasma genitalium) to 4.639 million base pairs (E. coli), with about 500 to 4300 genes, respectively. Compared to organisms with genomes 100 to 1000 times larger, this makes it easier to saturate the genome with mutations that disrupt some physiological process. Also, the smaller genome size, plus the availability of transducing phage, made it possible to isolate bacterial genes for intensive study.

Genomes of several bacteria are now completely sequenced, so all the genes, and their DNA sequences are known.

Yeast, such as Saccharomyces cerevisiae, are eukaryotic microorganisms that have both a haploid and a diploid phase to their life cycle, and thus have these same advantages as bacteria. Although its genome is larger (12 million base pairs), and it has 16 chromosomes, it is a powerful model organism for genetic and biochemical investigation of many aspects of molecular and cell biology.  The genome of Saccharomyces cerevisiaeis completely sequenced, revealing about 6100 genes.

One can use mutagens to increase the number of mutations, e.g. to modify bases, intercalate, etc. Specific mutagens will be considered in Part Two of the course.

Replica plating allows one to test colonies under different growth conditions. This is illustrated in Figure 1.8 for finding mutant with new growth factor requirements. Replica plating can be used to compare growth of cells on complete medium, minimal medium, and minimal medium supplemented with a specific growth factor, e.g. an amino acid like Arg (the abbreviation for arginine). Cells that grow on minimal medium supplemented with Arg, but not on minimal medium are Arg auxotrophs. The word auxotrophmeans "increased growth requirements". These are cells that require some additional nutrient (growth factor) to grow. Prototrophs(usually the wild type cells) do not have the need for the additional factor and grow on minimal medium. In this case, they still make their own Arg.


Figure 1.8. Replica plating of microorganisms. Panel A shows the technique of replica plating to screen for drug sensitivity. Panel B illustrates its application to finding mutants with growth factor requirements.

Sometimes the trait one is selecting for is lethal to the organism. In this situation, one can screen for conditional mutants. These are mutants that grow under one condition and not under another condition. Conditional mutants that grow at a low temperature but not at a high temperature are are called "temperature sensitive" or ts mutants. Conditional mutants are not necessarily associated with lethality. The dark ear tips, nose and feet of a Siamese cat are the phenotype of a temperature sensitive mutation in the clocus (determining fur color). The enzyme encoded is not functional at higher temperatures, but is functional at lower temperatures, such as the extremities of the cat. Hence the fur on these parts of the Siamese cat’s body is pigmented.

Figure 1.9. Coat color in Siamese cats is determined by a temperature sensitive mutation in an enzyme needed for pigment formation. Siamese are homozygous chch, which encodes an enzyme that is active at low temperature (in the extremities of the cat) but inactive elsewhere.

Conjugation in bacteria

The ability to plate out large numbers of haploid bacteria or fungi on a Petri dish, and to examine a single colony (or clone) under a variety of conditions (with an without a growth factor, with and without a drug, or at high and low temperature), makes it relatively easy to screen through many individuals to find mutants with a particular phenotype. However, in order to carry out a complementation analysis, one needs to be able to combine the two haploid mutants in one cell. Many fungi, such as yeast, do this thorough a natural meiotic sporulation and mating process. Figure 1.6 illustrates the use of fungal matings in complementation.

Bacteria can also, although not by meiosis and fertilization, and only a part of the genome of one bacterium is transferred to another. The sexual transfer of information in E. coliuses plasmids called F (fertility) factors or Hfr strains. Male E. coli cells have a large plasmid, the F or fertility factor. A plasmidis a circular, extrachromosomal DNA molecule that is not essential to the bacterium. The F plasmid can transfer DNA from the male cell to an F- or female cell, in a process called conjugation (Figure 1.10). The male and female cells are brought close together by attachments at pili, the cells join and DNA is synthesized from the F plasmid and transferred into the recipient cells. This converts the female cell to a male cell, in response to conjugation via pili.

In some strains of E. coli the F factor is integrated. In this case, the DNA transfer starts in F region of the chromosome, but it also transfers adjacent chromosomal DNA. These are called hfrstrains, for their high frequency of recombination. The transferred DNA recombines with the DNA in the recipient cell.

Some F-related plasmids are a hybrid of F DNA and host bacterial DNA. These F’plasmids appear to be derived from F factors but they have replaced some of the F DNA with bacterial DNA. Thus they are convenient carriers of parts of the E. coli genome.

This conjugal transfer can be used to create partial diploids, also called merodiploids, in E. coli. For some time after conjugation, a portion of two different copies of the chromosome is present in the same cells. Another method is to introduce F’ factors, carrying bacterial DNA, into another strain. These are two ways to do complementation analysis in E. coli.


Figure 1.10. F-factor mediated conjugal transfer of DNA in bacteria.

Gene mapping by conjugal transfer

Conjugal transfer can also be used for genetic mapping. By using many different hfr strains, each with the F factor integrated at a different part of the E. colichromosome, the positions of many genes were mapped. These studies showed that the genetic map of the E. coli chromosome is circular. During conjugal transfer, genes closer to the site of F integration are transferred first. By disrupting the mating at different times, one can determine which genes are closer to the integration site. Thus on the E. colichromosome, genes are mapped in terms of minutes (i.e., the time it takes to transfer to recipient). 

For example, for an hfr strain with the F factor integrated at 0 min on the E. colimap, conjugal transfer to a female recipient would transfer

  • leuACBD         at 1.7 min
  • pyrH               at 4.6 min
  • proAB              at 5.9 min
  • bioABFCD      at 17.5 min.

Use of hfr strains with different sites of integration (initiation of transfer) allows the entire circular genome to be mapped (Figure 1.11).  0/100 is thrABC.


Figure 1.11.Circular genetic map of E. coli.


Bacteriophageare viruses that infect bacteria. Because of their very large number of progeny and ability to recombine in mixed infections (more than one strain of bacteria in an infection), they have been used extensively in high-resolution definition of genes. Much of what we know about genetic fine structure, prior to the advent of techniques for isolating and sequencing genes, derive form studies in bacteriophage.

Bacteriophage have been a powerful model genetic system, because they have small genomes, have a short life cycle, and produce many progeny from an infected cell. They provide a very efficient means for transfer of DNA into or between cells.  The large number of progeny makes it possible to measure very rare recombination events.

Lytic bacteriophage form plaqueson lawns of bacteria; these are regions of clearing where infected bacteria have lysed. Early work focused on mutants with different plaque morphology, e.g. T2 r, which shows rapid lysis and generates larger plaques, or on mutants with different host range, e.g. T2 h, which will kill both host strains B and B/2.

A cis-trans complementation test defines a cistron, which is a gene

Seymour Benzer used the rIIlocus of phage T4 to define genes by virtue of their behavior in a complementation test, and also to provide fundamental insight into the structure of genes (in particular, the arrangement of mutable sites - see the next section). The difference in plaque morphology between rand r+phage is easy to see (large versus small, respectively), and Benzer isolated many r mutants of phage T4. The wild type, but not any rIImutants, will grow on E. colistrain K12(l), whereas both wild type and mutant phage grow equally well on E. colistrain B. Thus the wild phenotype is readily detected by its ability to grow in strain K12 (l). 

If E. colistrain K12 (l) is co-infected with 2 phage carrying mutations at different positions in rIIA, you get no multiplication of the phage (except the extremely rare wild type recombinants, which occur at about 1 in 106 progeny). In the diagram below, each line represents the chromosome from one of the parental phage.

                                                                rIIA                rIIB

            phage 1                                    _|__x______|________|_


            phage 2                                    _|_______x_|________|_


Likewise, if the two phage in the co-infection carry mutations at different positions in rIIB, you get no multiplication of the phage (except the extremely rare wild type recombinants, about 1 in 106).

                                                                rIIA                rIIB

            phage 3                                    _|_________|_x______|_


            phage 4                                    _|_________|______x_|_

However, if one of the co-infecting phage carries a mutation in rIIAand the other a mutation in rIIB, then you see multiplication of the phage, forming a very large number of plaques on E. colistrain K12 (l).


                                                                rIIA                rIIB

            phage 1                                    _|__x______|________|_       Provides wt rIIB protein


            phage 4                                    _|_________|______x_|_       Provides wt rIIA protein


Together these two phage provide all the phage functions - they complementeach other. This is a positive complementation test.  The first two examples show no complementation, and we place them in the same complementation group. Mutants that do not complement are placed in the same complementation group; they are different mutant alleles of the same gene. Benzer showed that there were two complementation groups (and therefore two genes) at the r II locus, which he called A and B.


In the mixed infection with phage 1 and phage 4, you also obtain the rare wild type recombinants, but there are more recombinants than are seen in the co-infections with different mutant alleles. Why?

Benzer’s experiments analyzing the rIIlocus of bacteriophage T4 formalized the idea of a cis‑transcomplementation test to define a cistron, which is an operational definition of a gene. First, let’s define cis and transwhen used to refer to genes. In the cisconfiguration, both mutations are on the same chromosome. In the transconfiguration, each mutation is on a different chromosome

Mutations in the same gene will not complement in trans, whereas mutations in different genes will complement in trans(Figure 1.12). In the cisconfiguration, the other chromosome is wild type, and wild‑type will complement any recessive mutation.

            The complementation group corresponds to a genetic entity we call a cistron, it is equivalent to a gene.

This test requires a diploid situation. This can be a natural diploid (2 copies of each chromosome) or a partial, or merodiploid, e.g. by conjugating with a cell carrying an F' factor. Some bacteriophage carry pieces of the host chromosome; these are called transducingphage. Infection of E. coli with a transducing phage carrying a mutation in a host gene is another way to create a merodiploid in the laboratory for complementation analysis.

Figure 1.12. The complementation test defines the cistron and distinguishes between two genes.

Recombination within genes allows construction of a linear map of mutable sites that constitute a gene

Once the recombination analysis made it clear that chromosomes were linear arrays of genes, these were thought of as "string of pearls" with the genes, or "pearls," separated by some non‑genetic material (Figure 1.13). This putative non-genetic material was thought to be the site of recombination, whereas the genes, the units of inheritance, were thought to be resistant to recombination. However, by examining the large number of progeny of bacteriophage infections, one can demonstrate that recombination can occur within a gene. This supports the second model shown in Figure 1.13. Because of the tight packing of coding regions in phage genomes, recombination almost always occurs within genes in bacteriophage, but in genomes with considerable non-coding regions between genes, recombination can occur between genes as well.




Figure 1.13: Models for genes as either discrete mutable units separate by non-genetic material (top) or as part of a continuous genetic material (bottom).

The tests between these two models required screening for genetic markers (mutations) that are very close to each other. When two markers are very close to each other, the recombination frequency is extremely low, so enough progeny have to be examined to resolve map distances of, say 0.02 centiMorgans = 0.02 map units = 0.02 % recombinants.  This means that 2 out of 10,000 progeny will show recombination between two markers that are 0.02 map units apart, and obviously one has to examine at least 10,000 progeny to reliably score this recombination.  That's the power of microbial genetics ‑ you actually can select or screen through this many progeny, sometimes quite easily.

An example of recombination in phage is shown in Figure 1.14. Wild type T2 phage forms small plaques and kills only E. colistrain B. Thus different alleles of hcan be distinguished by plating on a mixture of E. colistrains B and B/2. The phage carrying mutant hallele will generate clear plaques, since they kill both strains. Phage with the wild type h+ give turbid plaques, since the B/2 cells are not lysed but B cells are. When a mixture of E. colistrains B and B/2 are co-infected with both T2 hrand T2 h+r+, four types of plaques are obtained. Most have the parental phenotypes, clear and large or turbid and small. These plaques contain progeny phage that retain the parental genotypes T2 hrand T2 h+r+, respectively. The other two phenotypes are nonparental, i.e. clear and small or turbid and large. These are from progeny with recombinant genotypes, i.e. T2 hr+and T2 h+r. In this mixed infection, recombination occurred between two phage genomes in the same cell.

Figure 1.14. Recombination in bacteriophage


The first demonstration of recombination within a gene came from work on the rIIAand rIIBgenes of phage T4. These experiments from Seymour Benzer, published in 1955, used techniques like that diagrammed in Figure 1.14. Remember that mutations in the rgene cause rapid lysis of infected cells, i.e. the length of the lytic cycle is shorter.  The difference in plaque morphology between rand r+phage is easy to see (large versus small, respectively). These two genes are very close together, and many mutations were independently isolated in each. This was summarized in the discussion on complementation above.

Consider the results of infection of a bacterial culture with two mutant alleles of gene rIIA.


            T4rIIA6                                   _|_______________________x______|_


            and  T4rIIA27                         _|_______x______________________|_

                        (x marks the position of the mutation in each allele).


Progeny phage from this infection include those with a parental genotype (in the great majority), and at a much lower frequency, two types of recombinants:


            wild type          T4 r+                          _|______________________________|_


            double mutant T4rIIA6 rIIA27            _|_______x_______________x______|_


The wild type is easily scored because it, and not any rIImutants, will grow on E. coli strain K12(l), whereas both wild type and mutant phage grow equally well on E. coli strain B. Thus you can selectfor the wild type (and you will see only the desired recombinant). Finding the double mutants is more laborious, because they are obtained only by screening through the progeny, testing for phage that when backcrossed with the parental phage result in no wild type recombinant progeny.

Equal numbers of wild type and double mutant recombinants were obtained, showing that recombination can occur within a gene, and that this occurs by reciprocal crossing over. If recombination were only between genes, then no wild type phage would result. A large spectrum of recombination values was obtained in crosses for different alleles, just like you obtain for crosses between mutants in separate genes.

Several major conclusions could be made as a result of these experiments on recombination within the rIIgenes.

  1. A large number of mutable sites occur within a gene, exceeding some 500 for the rIIA and rIIBgenes. We now realize that these correspond to the individual base pairs within the gene.
  2. The genetic maps are clearly linear, indicating that the gene is linear.  Now we know a gene is a linear polymer of nucleotides.
  3. Most mutations are changes at one mutable site (point mutations).  Many genes can be restored to wild type by undergoing a reverse mutation at the same site (reversion).
  4. Other mutations cause the deletionof one or more mutable sites, reflecting a physical loss of part of the rII gene.  Deletions of one or more mutable site (base pair) are extremely unlikely to revert back to the original wild type. 

One gene encodes one polypeptide

One of the fundamental insights into how genes function is that one gene encodes one enzyme (or more precisely, one polypeptide). Beadle and Tatum reached this conclusion based on their complementation analysis of the genes required for arginine biosynthesis in fungi.  They showed that a mutation in each gene led to a loss of activity of one enzyme in the multistep pathway of arginine biosynthesis. As discussed above in the section on genetic dissection, a large number of Arg auxotrophs (requiring Arg for growth) were isolated, and then organized into a set of complementation groups, where each complementation groups represents a gene. 

The classic work of Beadle and Tatum demonstrated a direct relationship between the genes defined by the auxotrophic mutants and the enzymes required for Arg biosynthesis.  They showed that a mutation in one gene resulted in the loss of one particular enzymatic activity, e.g. in the generalized scheme below, a mutation in gene 2 led to a loss of activity of enzyme 2.  This led to an accumulation of the substrate for that reaction (intermediate N in the diagram below).  If there were 4 complementation groups for the Arg auxotrophs, i.e. 4 genes, then 4 enzymes were found in the pathway for Arg biosynthesis. Each enzyme was affected by mutations in one of the complementation groups.


            M        ®              N    ®                   O         ®           P       ®                    Arg

                        enzyme 1         enzyme 2                 enzyme 3             enzyme 4

                        gene 1              gene 2                      gene 3                  gene 4


Figure 1.15. A general scheme showing the relationships among metabolic intermediates (M, N, O, P), and end product (Arg), enzymes and the genes that encode them.

In general, each step in a metabolic pathway is catalyzed by an enzyme (identified biochemically) that is the product of a particular gene (identified by mutants unable to synthesize the end product, or unable to break down the starting compound, of a pathway).  The number of genes that can generate auxotrophic mutants is (usually) the same as the number of enzymatic steps in the pathway.  Auxotrophic mutants in a given gene are missing the corresponding enzyme.  Thus Beadle and Tatum concluded that one gene encodes one enzyme.  Sometimes more than one gene is required to encode an enzyme because the enzyme has multiple, different polypeptide subunits. Thus each polypeptide is encoded by a gene.

The metabolic intermediates that accumulate in each mutant can be used to place the enzymes in their order of actionin a pathway. In the diagram in Figure 1.15, mutants in gene 3 accumulated substance O. Feeding substance O to mutants in gene 1 or in gene 2 allows growth in the absence of Arg. We conclude that the defects in enzyme 1 or enzyme 2, respectively, are upstream of enzyme 3. In contrast, feeding substance O to mutants in gene 4 will not allow growth in the absence of Arg. Even though this mutant can convert substance O to substance P, it does not have an active enzyme 4 to convert P to Arg. The inability of mutants in gene 4 to grow on substance O shows that enzyme 4 is downstream of enzyme 3.

Exercise 1.4.

Imagine that you are studying serine biosynthesis in a fungus.  You isolate serine auxotrophs, do all the pairwise crosses of the mutants and discover that the auxotrophs can be grouped into three complementation groups, called A, B and C.  You also discover that a different metabolic intermediate accumulates in members of each complementation group - substance A in auxotrophs in the A complementation group, substance B in the B complementation group and substance C in the C complementation group.  Each of the intermediates is fed to auxotrophs from each of the three complementation groups as tabulated below.  A + means that the auxotroph was able to grow in media in the absence of serine when fed the indicated substance; a - denotes no growth in the absence of serine. 



mutant in complementation group A

mutant in complementation group B

mutant in complementation group C

substance A




substance B




substance C




In the biosynthetic pathway to serine in this fungus, what is the order of the enzymes encoded in the three complementation groups?  Enzyme A is encoded by the gene that when altered generates mutants that fall into complementation group A, etc. 

The gene and its polypeptide product are colinear

Once it was determined that a gene was a linear array of mutable sites, that genes are composed of a string of nucleotides called DNA (see Chapter 2), and that each gene encoded a polypeptide, the issue remained to be determined how exactly that string of nucleotides coded for a particular amino acid sequence. This problem was studied along several avenues, culminating in a major achievement of the last half of the 20th century – the deciphering of the genetic code. The detailed assignment of particular codons (triplets of adjacent nucleotides) will be discussed in Chapter 13. In the next few sections of this chapter, we will examine how some of the basic features of the genetic code were deciphered.

A priori, the coding units within a gene couldencode both the composition and the address for each amino acid, as illustrated in Model 1 of Figure 1.17. In this model, the coding units could be scrambled and still specify the same protein. In such a situation, the polypeptide would not be colinear with the gene.


Figure 1.16.Alternative models for gene and codon structure.

In an alternative model (Model 2 in Figure 1.16), the coding units only specify the composition, but not the position, of an amino acid.  The "address" of the amino acid is derived from the position of the coding unit within the gene.  This model would predict that the gene and its polypeptide product would be colinear - e.g. mutation in the 5th coding unit would affect the 5th amino acid of the protein, etc.

Charles Yanofsky and his co-workers (1964) tested these two models and determined that the gene and the polypeptide product are indeed colinear. They used recombination frequencies to map the positions of different mutant alleles in the gene that encodes a particular subunit of the enzyme tryptophan synthase.  They then determined the amino acid sequence of the wild type and mutant polypeptides.  As illustrated in Figure 1.17, the position of a mutant allele on the recombination map of the gene corresponds with the position of the amino acid altered in the mutant polypeptide product.  For instance, allele A101 maps to one end of the gene, and the corresponding Glu ® Val replacement is close to the N terminus of the polypeptide.  Allele A64maps close to the other end of the gene, and the corresponding Ser ® Leu replacement is close to the C terminus of the polypeptide.  This correspondence between the positions of the mutations in each allele and the positions of the consequent changes in the polypeptide show that Model 1 can be eliminated and Model 2 is supported.


Figure 1.17.The polypeptide is colinear with the gene.

Mutable sites are base pairs along the double helix

The large number of mutable sites found in each gene, and between which recombination can occur, leads one to conclude that the mutable sites are base pairs along the DNA. Sequence determination of the wild type and mutant genes confirms this conclusion.

Single amino acids are specified by three adjacent nucleotides, which are a codons

This conclusion requires three pieces of information.

First of all, adjacent mutable sites specify amino acids. Reaching this conclusion required investigation of the fine structure of a gene, including rare recombination between very closely linked mutations within a gene. Yanofsky and his colleagues, working with mutations the trpA gene of E. coli, encoding tryptophan synthase, showed that different alleles mutated in the same codon could recombine (albeit at very low frequency). (This is the same laboratory and same system that was used to show that a gene and its polypeptide product are colinear.)  Thus recombination between two different alleles can occur within a codon, which means that a codon must have more than one mutable site. We now recognize that a mutable site is a nucleotide in the DNA. Thus adjacent mutable sites (nucleotides) encode a single amino acid.

Let’s look at this in more detail (Figure 1.18). Yanofsky and colleagues examined two different mutant alleles of trpA, each of which caused alteration in amino acid 211 of tryptophan synthase. In the mutant allele A23, wild type Gly is converted to mutant Arg. In the mutant allele A46, wild type Gly is converted to mutant Glu.


            GGA (Gly 211) --> AGA (Arg 211)  mutant allele A23

            GGA (Gly 211) --> GAA (Glu 211)  mutant allele A46

            A23 ´ A46    AGA ´ GAA   ®  GGA (wild type Gly 211 in 2 out of 100,000 progeny)

Figure 1.18.Recombination can occur between two mutant alleles affecting the same codon.


Alleles A23and A46are not alternative forms of the same mutable site, because recombination to yield wild type occurs, albeit at a very low frequency (0.002%; the sites are very close together, in fact in the same codon!).  If they involved the same mutable site, one would never see the wild-type recombinant.

The second observation is that the genetic code is non-overlapping. This was shown by demonstrating that a mutation at a single site alters only one amino acid.  This conflicts with the predictions of an overlapping code (see Figure 1.19), and thus the code must be non-overlapping.


Figure 1.19. Predictions of the effects of nucleotide substitutions, insertions or deletions on polypeptides encoded by an overlapping, a punctuated, or a nonoverlapping, nonpunctuated code.


The third observation is that the genetic code is read in tripletsfrom a fixed starting point. This was shown by examining the effect of frameshift mutations. As shown in Figure 1.19, a code lacking punctuation has a certain reading frame. Insertions or deletions of nucleotides are predicted to have a drastic effect on the encoded protein because they will change that reading frame. The fact that this was observed was one of the major reasons to conclude that the mRNA molecules encoded by genes are read in successive blocks of three nucleotides in a particular reading frame. 

For the sequence shown in Figure 1.20, insertion of an A shifts the reading frame, so all amino acids after the insertion differ from the wild type sequence.  (The 4th amino acid is still a Gly because of degeneracy in the code: both GGC and GGG code for Gly.)  Similarly, deletion of a U alters the entire sequence after the deletion.



Figure 1.20. Frameshift mutations show that the genetic code is read in triplets.

These observations show that the nucleotide sequence is read, or translated, from a fixed starting point without punctuation.  An alternative model is that the group of nucleotides encoding an amino acid (the codon) could also include a signal for the end of the codon (Model 2 in Figure 1.19).  This could be considered a "comma" at the end of each codon.  If that were the case, insertions or deletions would only affect the codon in which they occur.  However, the data show that all codons, including and after the one containing the insertion or deletion, are altered.  Thus the genetic code is not punctuated, but is read in a particular frame that is defined by a fixed starting point (Model 3 in Figure 1.19).  That starting point is a particular AUG, encoding methionine.  (More about this will be covered in Chapter 13).

The results of frame-shift mutations are so drastic that the proteins are usually not functional.  Hence a screen or selection for loss-of-function mutants frequently reveals these frameshift mutants.  Simple nucleotide substitutions that lead to amino acid replacements often have very little effect on the protein, and hence have little, or subtle, phenotypes.

A double mutant generated by crossing over between the insertion (+) and deletion (‑) results in an (almost) normal phenotype, i.e. reversion of insertion or deletion.

A gene containing three closely spaced insertions(or deletions) of single nucleotides will produce a functional product.  However, four or five insertions or deletions do not give a functional product (Crick, Barnett, Brenner and Watts‑Tobin, 1961).  This provided the best evidence that the genetic code is read in groups of three nucleotides(not two or four).  Over the next 5 years the code was worked out (by 1966) and this inference was confirmed definitively.

Central Dogma: DNA to RNA to protein

A few years after he and James Watson had proposed the double helical structure for DNA, Francis Crick (with other collaborators) proposed that a less stable nucleic acid, RNA, served as a messenger RNA that provided a transient copy of the genetic material that could be translated into the protein product encoded by the gene.  Such mRNAs were indeed found.  These and other studies led Francis Crick to formulate this “central dogma” of molecular biology (Figure 1.21).

This model states that DNA serves as the repository of genetic information.  It can be replicated accurately and indefinitely. The genetic information is expressed by the DNA first serving as a template for the synthesis of (messenger) RNA; this occurs in a process called transcription. The mRNA then serves as a template, which is read by ribosomes and translatedinto protein. The protein products can be enzymes that catalyze the many metabolic transformations in the cell, or they can be structural proteins. 


Figure 1.21.The central dogma of molecular biology.


Although there have been some additional steps added since its formulation, the central dogma has stood the test of time and myriad experiments.  It provides a strong unifying theme to molecular genetics and information flow in cell biology and biochemistry.

Although in many cases a gene encodes one polypeptide, other genes encode a functional RNA. Some genes encode tRNAsandrRNAsneeded for translation, others encode other structural and catalytic RNAs.  Genes encode some product that is used in the cell, i.e. that when altered generates an identifiable phenotype. More generally, genes encode RNAs, some of which are functional as transcribed (or with minor alterations via processing) such as tRNAs and rRNAs, and others are messengers that are then translated into proteins. These proteins can provide structural, catalytic and regulatory roles in the cell.

Note the static role of DNAin this process. Implicit in this model is the idea that DNA does not provide an active cellular function, but rather it encodes macromolecules that are functional.  However, the expression of virtually all genes is highly regulated.  The sites on the DNA where this control is exerted are indeed functional entities, such as promoters and enhancers.  In this case, the DNA is directly functional (cis‑regulatory sites), but the genes being regulated by these sites still encode some functional product (RNA or protein).

Studies of retroviruses lead Dulbecco to argue that the flow of information is not unidirectional, but in fact RNA can be converted into DNA (some viral RNA genomes are converted into DNA proviruses integrated into the genome).  Subsequently Temin and Baltimore discovered the enzyme that can make a DNA copy of RNA, i.e. reverse transcriptase.

Transcription and mRNA structure

Several aspects of the structure of genes can be illustrated by examining the general features of a bacterial gene as now understood.

A gene is a string of nucleotides in the duplex DNA that encodes a mRNA, which itself codes for protein. Only one strand of the duplex DNA is copied into mRNA (Figure 1.22). Sometimes genes overlap, and in some of those cases each strand of DNA is copied, but each for a different mRNA. The strand of DNA that reads the same as the sequence of mRNA is the nontemplate strand. The strand that reads as the reverse complement of the mRNA is the template strand.



Figure 1.22.Only one strand of duplex DNA codes for a particular product.


NOTE: The term "sense strand" has two oppositeuses (unfortunately). Sidney Brenner first used it to designate the strand that served as the template to make RNA (bottom strand above), and this is still used in many genetics texts.  However, now many authors use the term to refer to the strand that reads the same as the mRNA (top strand above). The same confusion applies to the term "coding strand" which can refer to the strand encoding mRNA (bottom strand) or the strand "encoding" the protein (top strand).  Interestingly, "antisense" is used exclusively to refer to the strand that is the reverse complement of the mRNA (bottom strand).

Figure 1.22 helps illustrate the origin of terms used in gene expression.  Copying the information of DNA into RNA stays in the same "language" in that both of these polymers are nucleic acids, hence the process is called transcription.  An analogy would be writing exercises where you had to copy, e.g. a poem, from a book onto your paper - you transcribed the poem, but it is still in English.  Converting the information from RNA into DNA is equivalent to converting from one "language" to another, in this case from one type of polymer (the nucleic acid RNA) to a different one (a polypeptide or protein).  Hence the process is called translation.  This is analogous to translating a poem written in French into English.  

Figure 1.23 illustrates the point that a gene may be longer than the region coding for the protein because of 5' and/or 3' untranslated regions.


Figure 1.23.Genes and mRNA have untranslated sequences at both the 5’ and 3’ ends.


Eukaryotic mRNAs have covalent attachment of nucleotides at the 5' and 3' ends, and in some cases nucleotides are added internally (a process called RNA editing).  Recent work shows that additional nucleotides are added post‑transcriptionally to some bacterial mRNAs as well.

Regulatory signals can be considered parts of genes

In order to express a gene at the correct time, the DNA also carries signals to start transcription (e.g. promoters), signals for regulating the efficiency of starting transcription (e.g. operators, enhancers or silencers), and signals to stop transcription (e.g. terminators).  Minimally, a gene includes the transcription unit, which is the segment of DNA that is copied into RNA in the primary transcript.  The signals directing RNA polymerase to start at the correct site, and other DNA segments that influence the efficiency of this process are regulatory elements for the gene. One can also consider them to be part of the gene, along with the transcription unit.

A contemporary problem - finding the function of genes

Genes were originally detected by the heritable phenotype generated by their mutant alleles, such as the white eyes in the normally red-eyed Drosophilaor the sickle cell form of hemoglobin (HbS) in humans.  Now that we have the ability to isolate virtually any, and perhaps all, segments of DNA from the genome of an organism, the issue arises as to which of those segments are genes, and what is the function of those genes.  (The genomeis all the DNA in the chromosomes of an organism.)  Earlier geneticists knew what the function of the genes were that they were studying (at least in terms of some macroscopic phenotype), even when they had no idea what the nature of the genetic material was.  Now molecular biologists are confronted with the opposite problem - we can find and study lots of DNA, but which regions are functions?  Many computational approaches are being developed to guide in this analysis, but eventually we come back to that classical definition, i.e. that appropriate mutations in any functional gene should generate a detectable phenotype.  The approach of biochemically making mutations in DNA in the laboratory and then testing for the effects in living cells or whole organisms is called "reverse genetics."

Additional Readings

  • Griffiths, A. J. F., Miller, J. H., Suzuki, D. T., Lewontin, R. C. and Gelbart, W. M. (1993) An Introduction to Genetic Analysis, Fifth Edition (W. H. Freeman and Company, New York).
  • Cairns, J., Stent, G. S. and Watson, J. D., editors (1992) Phage and the Origins of Molecular Biology, Expanded Edition (Cold Spring Harbor Laboratory Press, Plainview, NY).
  • Brock, T. D. (1990) The Emergence of Bacterial Genetics (Cold Spring Harbor Laboratory Press, Plainview, NY).
  • Benzer, S. (1955) Fine structure of a genetic region in bacteriophage. Proceedings of the National Academy of Sciences, USA 47: 344-354.
  • Yanofsky, C. (1963) Amino acid replacements associated with mutation and recombination in the A gene and their relationship to in vitro coding data. Cold Spring Harbor Symposia on Quantitative Biology 18: 133-134.
  • Crick, F. (1970) Central dogma of molecular biology. Nature 227:561-563


Question 1.5. Calculating recombination frequencies

Corn kernels can be colored or white, determined by the alleles C(colored, which is dominant) or c(white, which is recessive) of the coloredgene.  Likewise, alleles of the shrunkengene determine whether the kernels are nonshrunken (Sh, dominant) or shrunken (sh, recessive).  The geneticist Hutchison crossed a homozygous colored shrunken strain (CC shsh) to a homozygous white nonshrunken strain (cc ShSh) and obtained the heterozygous colored nonshrunken F1.  The F1 was backcrossed to a homozygous recessive white shrunken strain (cc shsh).  Four phenotypes were observed in the F2 progeny, in the numbers shown below.


               Phenotype                            Number of plants

               colored shrunken                              21,379

               white nonshrunken                           21,096

               colored nonshrunken                             638

               white shrunken                                      672


               a) What are the predicted frequencies of these phenotypes if the coloredand shrunkengenes are not linked?


               b) Are these genes linked, and if so, what is the recombination frequency between them?


Question 1.6.Constructing a linkage map:

Consider three genes, A, B and C, that are located on the same chromosome.  The arrangement of the three genes can be determined by a series of three crosses, each following two of the genes (referred to as two-factor crosses).  In each cross, a parental strain that is homozygous for the dominant alleles of the two genes (e.g. AB/AB) is crossed with a strain that is homozygous for the recessive alleles of the two genes (e.g. ab/ab), to yield an F1 that is heterozygous for both of the genes (e.g. AB/ab).  In this notation, the slash (/) separates the alleles of genes on one chromosome from those on the homologous chromosome.  The F1 (AB/ab) contains one chromosome from each parent.  It is then backcrossed to a strain that is homozygous for the recessive alleles (ab/ab) so that the fates of the parental chromosomes can be easily followed.  Let's say the resulting progeny in the F2 (second) generation showed the parental phenotypes (AB and ab) 70% of the time.  That is, 70% of the progeny showed only the dominant characters (AB) or only the recessive characters (ab), which reflect the haploid genotypes AB/aband ab/ab, respectively, in the F2 progeny.  The remaining 30% of the progeny showed recombinant phenotypes (Aband aB) reflecting the genotypes Ab/aband aB/abin the F2 progeny.  Similar crosses using F1's from parental AC/ACand ac/acbackcrossed to a homozygous recessive strain (ac/ac) generated recombinant phenotypes Acand aCin 10% of the progeny.  And finally, crosses using F1's from parental BC/BCand bc/bcbackcrossed to a homozygous recessive strain (bc/bc) generated recombinant phenotypes Bcand bCin 25% of the progeny.

a. What accounts for the appearance of the recombinant phenotypes in the F2 progeny?

b. Which genes are closer to each other and which ones are further away?

c. What is a linkage map that is consistent with the data given?


Question 1.7.Why are the distances in the previous problem not exactly additive, e.g. why is the distance between the outside markers (A and B) not 35 map units (or 35% recombination)?  There are several possible explanations, and this problem explores the effects of multiple crossovers.  The basic idea is that the further apart two genes are, the more likely that recombination can occur multiple times between them.  Of course, two (or any even number of) crossover events between two genes will restore the parental arrangement, whereas three (or any odd number of) crossover events will give a recombinant arrangement, thereby effectively decreasing the observed number of recombinants in the progeny of a cross. 

For the case examined in the previous problem, with genes in the order A___C_______B, let the term abrefer to the frequency of recombination between genes Aand B, and likewise let acrefer to the frequency of recombination between genes Aand C, and cbrefer to the frequency of recombination between genes Cand B

a) What is the probability that when recombination occurs in the interval between Aand C, an independent recombination event also occurs in the interval between Cand B?

b) What is the probability that when recombination occurs in the interval between Cand B, an independent recombination event also occurs in the interval between Aand C?

c) The two probabilities, or frequencies, in a and b above will effectively lower the actual recombination between the outside markers Aand Bto that observed in the experiment.  What is an equation that expresses this relationship, and does it fit the data in problem 3?

d. What is the better estimate for the distance between genes Aand Bin the previous problem?


Question 1.8  Complementation and recombination in microbes.

The State College Bar Association has commissioned you to study an organism, Alcophila latrobus, which thrives on Rolling Rock beer and is ruining the local shipments.  You find three mutants that have lost the ability to grow on Rolling Rock (RR).


a)  Recombination between the mutants can restore the ability to grow on RR.  From the following recombination frequencies, construct a linkage map for mutations 1, 2, and 3.


                                    Recombination between           Frequency

                                                1- and 2-                        0.100

                                                1- and 3-                        0.099

                                                2- and 3-                        0.001


b) The following diploid constructions were tested for their ability to grow on RR.  What do these data tell you about mutations 1, 2, and 3?


                                                                                    Grow on RR?

                        1)         1-         2+ / 1+            2-                     yes

                        2)         1-         3+ / 1+            3-                     yes

                        3)         2-         3+ / 2+            3-                     no


Question 1.9  Using recombination frequencies and complementation to deduce maps and pathways in phage.

A set of four mutant phage that were unable to grow in a particular bacterial host (lets call it restrictive) were isolated; however, both mutant and wild type phage will grow in another, permissive host.  To get information about the genes required for growth on the restrictive host, this host was co-infected with pairs of mutant phage, and the number of phage obtained after infection was measured.  The top number for each co-infection gives the total number of phage released (grown on the permissive host) and the bottom number gives the number of wild-type recombinant phage (grown on the restrictive host).  The wild-type parental phage gives 1010 phage after infecting either host.  The limit of detection is 102 phage.


Phenotypes of phage, problem 1.9:


Assays after co-infection with mutant phage:


Results of assays, problem 1.9:

                                                            Number of phage                               

                                    mutant 1          mutant 2          mutant 3          mutant 4

mutant 1 total               <102

recombinants               <102


mutant 2 total               1010                 <102

recombinants               5x106              <102


mutant 3 total               1010                 1010                 <102

recombinants               107                  5x106              <102


mutant 4 total               105                  1010                 1010                 <102

recombinants               105                  5x106                107                  <102


a)   Which mutants are in the same complementation group?  What is the minimum number of genes in the pathway for growth on the restrictive host?

b)   Which mutations have the shortest distance between them?

c)   Which mutations have the greatest distance between them?

d)   Draw a map of the genes in the pathway required for growth on the restrictive host.  Show the positions of the genes, the positions of the mutations and the relative distances between them. 


Question 1.10.  One of the classic experiments in bacterial genetics is the fluctuation analysisof Luria and Delbrück (1943, Mutations of bacteria from virus sensitivity to virus resistance, Genetics 28: 491-511). These authors wanted to determine whether mutations arose spontaneouslywhile bacteria grew in culture, or if the mutations were inducedby the conditions used to select for them. They knew that bacteria resistant to phage infection could be isolated from infected cultures. When a bacterial culture is infected with a lytic phage, initially it “clears” because virtually all the cells are lysed, but after several hours phage-resistant bacteria will start to grow.

Luria and Delbrück realized that the two hypothesis for the source of the mutations could be distinguished by a quantitative analysis of the number of the phage-resistant bacteria found in many infected cultures. The experimental approach is outlined in the figure below. Many cultures of bacteria are grown, then infected with a dose of phage T1 that is sufficient to kill all the cells, except those that have acquired resistance. These resistant bacteria grow into colonies on plates and can be counted.

a.What are the predictions for the distribution of the number of resistant bacteria in the two models? Assume that on average, about 1 in 107 bacteria are resistant to infection by phage T1.

 b. What do results like those in the figure and table tell you about which model is correct?


Figure for question 1.10.


The actual results from Luria and Delbrück are summarized in the following table. They examined 87 cultures, each with 0.2 ml of bacteria, for phage resistant colonies.


Number of resistant bacteria

Number of cultures



























Interested students may wish to read about the re-examination of the origin of mutations by Cairns, Overbaugh and Miller (1988, The origin of mutants. Nature 335:142-145). Using a non-lethal selective agent (lactose), they obtained results indicating both pre-adaptive (spontaneous) mutations as well as some apparently induced by the selective agent.