Skip to main content
Biology LibreTexts

8.6: Isolating Genes

Earlier in this chapter, we discussed methods such as column chromatography that are used to purify proteins of interest. Using combinations of these methods, it is possible to isolate a protein to a high degree of purity, thus enabling us to study the protein’s activity and properties. This problem is harder to solve for nucleic acids. Genomic DNA can be readily obtained from cells, but is too complex to be analyzed as a whole. Individual genes are the units of DNA that correspond to proteins, and thus, it makes more sense to isolate specific genes for study. Methods to isolate genes were not available till the 1970s, when the discovery of restriction enzymes and the invention of molecular cloning provided, for the first time, ways to obtain large quantities of specific DNA fragments, for study. Although, for purposes of obtaining large amounts of a specific DNA fragment, molecular cloning has been largely replaced by direct amplification using the polymerase chain reaction described later, cloned DNAs are still very useful for a variety of reasons. The development of molecular cloning was dependent on the discovery of restriction endonucleases, described below.

Restriction enzymes

Restriction enzymes, or restriction endonucleases, are enzymes made by bacteria. These enzymes protect bacteria by degrading foreign DNA molecules that are carried into their cells by, for example, an invading bacteriophage. Each restriction enzyme recognizes a specific sequence, usually of four or six nucleotides in the DNA. These sequences, when they occur in the bacterium's own DNA, are chemically modified by methylation, so that they are not recognized and degraded. Where these sequences occur in foreign DNA, they are cut by the restriction enzyme.

The utility and importance of restriction enzymes lies in their ability to recognize specific sequences in DNA and cut near or (usually) at the site they recognize. Over 3000 such enzymes are known. Sequences recognized by these enzymes are typically 4-8 base pairs long and the most commonly used enzymes recognize sequences described as palindromic.

Figure 8.30 -A restriction enzyme bound to its recognition sequence on DNA. Wikipedia


In molecular biology, the term palindrome means that the sequence of the recognition site when read in the 5‘ to 3‘ direction for the top strand is exactly the same as that of the bottom strand. Consider the sequence recognized by the restriction enzyme known as Hind III (pronounced hin-dee-three). It is

5’ -A-A-G-C-T-T-3’​
3’ -T-T-C-G-A-A-5’

On the top strand, the recognition sequence is

5’ AAGCTT 3’

which is the same as the bottom strand (read in the same 5’ to 3’ direction).

While all restriction enzymes must recognize and bind to particular DNA sequences, the exact spot at which they cut the DNA varies. Some enzymes leave a staggered sequence after cutting that has an overhang at the 5’ end of one strand of the duplex; some leave a staggered sequence after cutting that has an overhang at the 3’ end; and some cut both strands in the same place, leaving no overhanging sequence - called blunt end cutters.

Consider cutting a DNA sequence that contains the Hind III recognition site, which is

5’ -A-A-G-C-T-T-3’​
3’ -T-T-C-G-A-A-5’

Embedded within a DNA sequence, the Hind III sequence would look like this (Ns correspond to any base and represent all of the DNA around the recognition site).

5’ -N-N-N-A-A-G-C-T-T-N-N-N-3’​
3’ -N-N-N-T-T-C-G-A-A-N-N-N-5’

After cutting with Hind III, it would look as follows:

5’ -N-N-N-A 3‘ 5’A-G-C-T-T-N-N-N-N-3’​
3’ -N-N-N-T-T-C-G-A-5‘ 3’ A-N-N-N-N-5’

where gaps have been inserted to illustrate where cutting has occurred. Hind III cuts between the two ‘A’ containing nucleotides near the 5’ end of the recognition sequence and thus leaves 5’ overhangs (Figure 8.31).

Figure 8.31 - Result of cutting DNA with Hind III. Wikipedia

The restriction enzyme Pst I, on the other hand, recognizes the following sequence

5’ -N-N-N-C-T-G-C-A-G-N-N-N-N-3’​
3’ -N-N-N-G-A-C-G-T-C-N-N-N-N-5’

and cuts between the A and the G near the 3’ end of the recognition sequence.

5’ -N-N-N-C-T-G-C-A 3‘ 5’G-N-N-N-N 3’​
3’ -N-N-N-G 5‘ 3’ A-C-G-T-C-N-N-N-N 5’

As you can see, cutting a DNA with Pst I leaves 3’ overhangs of the recognition sequence. The ends left after cutting by a restriction enzyme that overhang either at the 5’ end or the 3’ end are referred to as being “sticky” because they can form proper base pairs and more readily be joined to a similarly “sticky end”. This means that you can take two unrelated pieces of DNA, cut them with the same restriction enzyme so that they have compatible sticky ends, and then "paste" them together using DNA ligase to form a new hybrid molecule, or recombinant.

Making Recombinant DNAs

Joining together of DNA fragments from different sources creates recombinant DNA. The ability to cut and paste DNA might seem like purely a technical feat, but one key application that arose out of this is molecular cloning. In molecular cloning a gene of interest can be inserted into a vector, usually a plasmid, by cutting both the vector and the gene (called the insert) with the same enzyme to generate sticky ends and joining the two pieces together to generate a recombinant (Figure 8.32). A plasmid is a type of autonomously replicating, extrachromosomal DNA. It is quite simple to extract plasmids from the cells, engineer them to contain the gene of interest and re-introduce the recombinant plasmid into the bacteria. The idea was that when the plasmid DNA was replicated, the extra inserted gene would also be copied. Thus, by growing up a lot of the bacteria carrying the plasmid, many copies of the gene of interest could be obtained, to provide sufficient amounts of the gene to use in experiments. While we now have easier methods to accomplish this goal, cloned DNAs remain very useful. For example, it is possible to clone a gene that encodes a protein of interest so that it can be expressed at high levels in the cells into which the recombinant plasmid is introduced.

Figure 8.32 - Recombinant DNA construction. Wikipedia

Whatever the purpose for which the recombinant plasmid is made, it typically carries an antibiotic resistance gene (or genes), called a selectable marker. Cells that take up the plasmid will be able to grow in the presence of the antibiotic. If bacterial cells to which the plasmid has been added are plated on agar containing the antibiotic, the cells which took up the plasmid will be able to grow, while the others will not.

Figure 8.33 - Restriction site map for the pUC 18/19 plasmids, a classic plasmid vector. Genes identified by arrows. Numbers correspond to the pUC 18/19 numbering convention

Expression cloning

As mentioned above, a gene of interest may be inserted into a vector and the recombinant plasmid be placed into a cell where the gene can be expressed. For instance, one might desire to clone the gene coding for human growth hormone or insulin or other medically important proteins and have a bacterium or yeast make large quantities of it very cheaply. Remember that these are human proteins, and thus it is not feasible to extract the proteins in any quantity from human subjects.

To clone a gene so that it can be expressed, one needs to set up the proper conditions in order for the human protein to be made in the bacterial cells. This typically involves the use of specially designed plasmids. These plasmids have been engineered to 1) replicate in high numbers; 2) carry markers that allow researchers to identify cells carrying them (antibiotic resistance, for example) and 3) contain sequences (such as a promoter and Shine Dalgarno sequence) necessary for expression of the desired protein, with convenient sites for insertion of the gene of interest in the appropriate place relative to the control sequences. A plasmid which has all of these features is referred to as an expression vector. In addition to plasmids that can be used for expression in bacterial cells, expression vectors are also available that allow protein expression in a variety of eukaryotic cells.

Many sophisticated variations on such vectors have been created that have made it easy to produce and purify large amounts of any protein of interest for which the gene has been cloned. A handy feature in some expression vectors is a sequence encoding an affinity tag either up- or downstream of the gene being expressed. This sequence allows a short affinity tag (such as a run of histidine residues) to be fused onto the encoded protein. The tag can be used to readily purify the protein, as described in the section on affinity chromatography.