Structural Features of DNA-Binding Proteins
Not any protein can bind specifically to DNA. Analysis of DNA binding proteins shows common motifs are found among them.
- helix-turn-helix: found in prokaryotic DNA binding proteins.
The figures shows two such proteins, the cro repressor from bacteriophage 434 and the lambda repressor from the bacteriophage lambda. (Bacteriophages are viruses that infect bacteria.) Notice how specificity is achieved, in part, by the formation of specific H-bonds between the protein and the major groove of the operator DNA.
Figure: Lambda Repressor/DNA Complex
- Jmol: Lambda-Repressor Complex
- zinc finger: (eukaryotes) These proteins have a common sequence motif of
X3-Cys-X2-4-Cys-X12-His-X3-4-His-X4- in which X is any amino acid. Zn2+ is tetrahedrally coordinated with the Cys and His side chains, which are on one of two antiparallel beta strands, and an alpha helix, respectively. The zinc finger, stabilized by the zinc, binds to the major groove of DNA.
Figure: zinc finger
- Jmol: Zif268:DNA Complex
Zn finger proteins, of which 900 are encoded in the human genome, can be mobilized to actual repair specific mutations in cells, which if carried out in a high enough percentage of mutant cells could cure specific genetic diseases such as some forms of severe combined immunodeficiency disease. In this new technique (Urnov et al, 2005), multiple linked Zn finger binding domains, (one of the natural-occurring ones or mutant forms produced in the lab), each one specific for a certain nucleotide sequence, is linked to a nonspecific endonuclease, derived from the enzyme FokI. The nuclease is active in dimeric form so the active complex requires two endonuclease domains, each bound to four different Zn finger domains, to assemble at the target site. Specificity of binding is achieved by selection by the Zn finger domains. A nick is then made by the DNA by the nuclease, and host cell repair mechanisms ensue. This process involves strand separation, homologous recombination of the nicked region with complementary DNA within the cell, and repair of the nick. If excess wild type (non-mutated) DNA is added to the cells and used as the template, the normal DNA repair mutation would fix the mutation. Urnov et al. have shown the up to 20% of cultured cells containing a mutation can be repaired in the lab. If these cells gain a selective growth advantage, the mutated cells would eventually be replaced with wild type cells.
- steroid hormone receptors: (eukaryotes) In contrast to most hormones, which bind to cell surface receptors, steroid hormones (derivatives of cholesterol) pass through the cell membrane and bind to cytoplasmic receptors through a hormone binding domain. This changes the shape of the receptor which then binds to a specific site on the DNA (hormone response element) though a DNA binding domain. In a structure analogous to the zinc finger, Zn 2+ is tetrahedrally coordinated to 4 Cys, in a globular-like structure which binds as a dimer to two identical, but reversed sequences of DNA (palindrome) within the major groove. (An example of a palindrome: Able was I ere I saw Elba.)
Consider the glucocorticoid receptor (GR) as a specific example. It binds DNA as a dimer. The two DNA binding domains of the dimer associate with two adjacent major grooves of the DNA in the GR binding sequence (GBS), a short sequence of DNA within the promoter. Meijsing, et al. have found that not only does the GBS act as a binding site for GR, allowing transcription of genes, but it also affects the conformation of the receptor, causing gene transcription to be regulated in another way. The group constructed luciferase reporters genes, which would express the protein luciferase if they were being transcribed, with the GBS. They found that relative transcriptional activity did not correlate to relative binding affinity of GR to the GBS. GBSs which were much more active than others bound comparably with those of lower activity, while GBSs with similar transcriptional activity bound with different affinities. This shows that the GBS is conferring unique function to the GR associated with it (i.e. transcription is not simply affected by whether or not the GR is bound to the GBS). A “lever arm” of the receptor was found to undergo conformational changes when bound to DNA, with changes specific to the sequence to which it was bound. A mutant protein, GR-γ, was made to be identical to the wild-type protein, GR-α, except in the lever arm. They were found to have different transcriptional activity even though they were binding to the same site on the DNA, showing that the lever arm and its conformation affects transcription.
- leucine zippers (or scissors): (eukaryotes) These proteins contain stretches of 35 amino acids in which Leu is found repeatedly at 7 amino acid intervals. These regions of the protein form amphiphilic helices, with Leu on one face, one Leu after two turns of a helix. Two of these proteins can form a dimer, stabilized by the binding of these nonpolar, leucine-rich amphiphilic helices to one another, forming a coiled-coil, much as in the muscle protein myosin. The leucine zipper represents the protein binding domain of the protein. The DNA binding domain is found in the first 30 N-terminal amino acids, which are basic and form an alpha helix when the protein binds to DNA. The leucine zipper then functions to bring two DNA binding proteins together, allowing the N-terminal bases helices to interact with the major groove of DNA in a base-specific fashion. Valine and isoleucine, along with leucine, are often found in stretches of amino acids that can interact to form other types of coiled coils.
- Jmol: Leucine Zipper
Just as Zinc fingers nucleases have been used to induce repair of mutations, another study of the rat genome used specially designed ZFNs to cause breaks in ds-DNA that are badly repaired (repaired by NHEJ) and result in site-specific mutations (Geurts, et al. 2009). This process, “knockout of the gene,” prevents the production of the protein normally transcribed by the target gene. Five- and six-finger ZFNs were used to achieve a high level of specificity in the targeted binding to the gene for three different proteins: green fluorescent protein (GFP), Immunoglobulin M (IgM) and Rab38. The knockout was successful in 12% of the rats tested; these animals had no wild-type protein and no expression. The ZFNs were sufficiently specific that no mutations were observed at any of 20 predicted non-target sites. This study supports the viability of control of transcription and expression for the treatment of disease and the importance of specific binding.
We have seen that two main factors contribute to the specific recognition of DNA by proteins; the formation of hydrogen bonds to specific nucleotide donors and acceptors in the major groove, and sequence-dependent deformations of the DNA helix to altered shapes with increased affinity of protein ligands. For example the Tata Binding Protein (TBP) can interact with a widened minor groove in the TAT box. New findings support that in addition proteins are able to use information in minor grooves that have become "narrowed" depending on the nucleotide sequence.
The ability of A-tracts to assume twisting conformations that cause inter-base-pair hydrogen bonding in the major grooves, results in the narrowing of minor grooves. Research on the minor grooves of DNA has produced results showing that high amounts of AT base pairs are concentrated in narrow minor grooves (width <5.0 Å) and CG base pairs are found more frequently in wide minor grooves.
How does minor groove narrowing affect DNA recognition? Narrow minor groves enhance the negative electrostatic potential of the DNA, making it a more specific and recognizable site. Electrostatic focusing causes electrostatic potentials to be most negative within the grooves. The backbone phosphates of the DNA are closer to the middle of the groove when it is narrow, thus correlating narrow minor grooves with a more negative electrostatic potential.
Why is there a high concentration of arginine in these narrow minor grooves? A more narrow and negative minor groove provides binding sites for arginine. Arginines can bind and in some cases insert themselves as short sequence motifs which enhance the specificity of the DNA shape recognition. Arg is preferred over Lys since the effective radii of the charge in Arg is greater than of the charge carrier in Lys. This would lead to a decreased desolvation energy for Arg which would promote its binding to the narrowed major groove. This discovery that high concentrations of arginines are found in the narrow minor grooves of the DNA is the basis of a new DNA recognition based on shape that is used by a number of DNA-binding proteins. The authors conclude that "the role of DNA shape must be taken into consideration when annotating the entire genome and predicting transcription-factor-binding sites."