Common Structural Motifs in Proteins
note: this chapter takes a little while to load because of the numerous iCn3D molecular models. Be patient and wait!
Given the number of possible combinations of 1o, 2o, and 3o structures, one might guess that the 3D structure of each protein is quite distinctive. This is in general true. However, it has been found that similar substructures are found in proteins. For instance, common secondary structures are often grouped together to form a motifs (often called super-secondary structure). Often the same motiff is found in protein with similar function (such as proteins that bind DNA, Ca2+, etc). Let's explore some of the common motiffs.
These are found in DNA binding proteins that regulate transcription and also in calcium binding proteins, in which the motif is often called the EF hand. The loop region in calcium binding proteins are enriched in Asp, Glu, Ser, and Thr. Why? The EF hand shown below is from calmodulin.
Here is a static and dynamic model of a basic helix-turn-helix from the c-Myc protein (1NKP). The iCn3D models shows the helices interacting with the major grove of DNA, which is shown in spacefill.
Here is a static and dynamic model of an EF hand from the calcium binding protein calmodlin ()
The EF Hand can be envisioned as a hand gripping a ball (calcium ion) with finger and thumb representing alpha helices.
The EF hand motiff of calmodulin is used in a variety of Ca2+ binding proteins. The figure below shows the alignment of the first 50 residue of human calmodulin with four other human calcium binding proteins. The EF hand (F12-L29) of calmodulin consists of the second half of the first helix (F12-L18), an intervening loop (F19-T28), and the second helix (T29-L29). Sometime it is annotated to encompass a larger stretch (8-43)
Part A shows the degree of conservation of amino acids in this first Ca2+-binding EF hand. Part B shows the general conservation of key hydrophobic (F12, F19, I27) as well additional those of similar polarity (36 and 39)
A series of strictly conserved amino acids are shown labeled in the iCn3D model below, which shows the first EF hand in human calmodulin.
Hover over the amino acid side chains that are coordinating the Ca2+ ion. Are they what you would expect?
A linear connectivity "wiring" diagram showing secondary structure connected by connecting regions is shown below. This particular wiring diagram shows a 2-residue beta strand, which is insignificant in length to be considered an actual strand.
A more complicated 2D topology map is shown below. In this case it is linear given the small section of amino acids depicted. We will see more complicated 2D topology maps with more complicated structures below.
It is presented on it's side to save print in its presentation.
Beta-hairpin or beta-turn
This motif is present in most antiparallel beta structures, both as an isolated ribbon and as part of beta sheets.
Here is a static and dynamic model of the beta hairpin from bovine pancreatic trypsin inhibitor (1k6u)
The figure below shows the 2D homology map for the beta-hairpin.
The Greek Key symbol represents infinity and eternal flow of things and resemble in part primitive keys. The Greek Key motif in proteins can be see in the structure of antiparallel beta sheets in the ordering of four adjacent antiparallel beta strands as shown in the diagram below. The figure also shows the repetitive Greek key, which you will see many times if you visit Greece and tour its antiquities.
The figure belows shows a partial 2D topology map of Staphylococcus nuclease (2SNS).
The motif is a common way to connect two parallel beta strands as compared to beta hairpins, which are used to connect antiparallel beta strands.
Here is a static and dynamic model of the beta-alpha-beta structure from triose phosphate isomerase
another view: https://structure.ncbi.nlm.nih.gov/i...8L3VTKtYjVPy97
Here is the 1D wiring diagram for the first beta-alpha-beta motif in triose phosphate isomerase.
Here is the 2D topology diagrams showing this motif.
Larger Stuctural Motifs - Protein Architecture
Some proteins combine larger secondary and supersecondary structural components, often in a repeated fashion to produce more complex structures. We've seen this with larger twisted sheets and beta barrels, such as the TIM barrel. Let's consider three of these, which can be considered examples of protein architectures without considering connectivity within the protein.
The Rossman Fold
Structural motifs can serve particular functions within proteins such as enabling the binding of substrates or cofactors. For example, the Rossmann fold is responsible for binding to nucleotide cofactors such as nicotinamide adenine dinucleotide (NAD+) (Figure 2.25). The Rossmann fold is composed of six parallel beta strands that form an extended beta sheet. The first three strands are connected by α-helices resulting in a beta-alpha-beta-alpha-beta structure. This pattern is duplicated once to produce an inverted tandem repeat containing six strands. Overall, the strands are arranged in the order of 321456 (1 = N-terminal, 6 = C-terminal). Five stranded Rossmann-like folds are arranged in the order 32145. The overall tertiary structure of the fold resembles a three-layered sandwich wherein the filling is composed of an extended beta sheet and the two slices of bread are formed by the connecting parallel alpha helices.
Figure 2.25 The Rossman Fold. (A) Structure of Nicotinamide Adenine Dinucleotide (NAD+) (B) Cartoon diagram of the Rossmann Fold (helices A-F red and strands 1-6 yellow) from E. coli malate dehydrogenase enzyme. The NAD+ cofactor is shown binding as the space filling molecule. (C) Schematic diagram of the six stranded Rossmann fold.
Image modified from: Boghog
One of the features if the Rossmann fold is its co-factor binding specificity. The most conserved segment of Rossmann folds is the first beta-alpha-beta segment. Since this segment is in contact with the ADP portion of dinucleotides such as FAD, NAD and NADP it is also called as an "ADP-binding beta-beta fold".
Here is model of the Rossman fold of malate dehydrogenase (5KKA) from E. Coli. The beta strands (yellow) and connecting alpha helices (red), and coil (blue) of the Rossman fold are shown in context of the rest of the monomeric version of the protein, which is shown in gray.
The TIM barrel revisited
Interestingly, similar structural motifs do not always have a common evolutionary ancestor and can arise by convergent evolution. This is the case with the TIM Barrel, a conserved protein fold consisting of eight α-helices and eight parallel β-strands that alternate along the peptide backbone. The structure is named after triosephosphate isomerase, a conserved metabolic enzyme. TIM barrels are one of the most common protein folds. One of the most intriguing features among members of this class of proteins is although they all exhibit the same tertiary fold there is very little sequence similarity between them. At least 15 distinct enzyme families use this framework to generate the appropriate active site geometry, always at the C-terminal end of the eight parallel beta-strands of the barrel.
Figure 2.26 The TIM Barrel. TIM barrels are considered α/β protein folds because they include an alternating pattern of α-helices and β-strands in a single domain. In a TIM barrel the helices and strands (usually 8 of each) form a solenoid that curves around to close on itself in a doughnut shape, topologically known as a toroid. The parallel β-strands form the inner wall of the doughnut (hence, a β-barrel), whereas the α-helices form the outer wall of the doughnut. Each β-strand connects to the next adjacent strand in the barrel through a long right-handed loop that includes one of the helices, so that the ribbon N-to-C coloring in the top view (A) proceeds in rainbow order around the barrel. The TIM barrel can also be thought of, then, as made up of 8 overlapping, right-handed β-α-β super-secondary structures, as shown in the side view (B).
Image modified from: WillowW
As our knowledge continues to increase about the myriad of structural motifs found in nature's treasure trove of protein structures, we continue to gain insight into how protein structure is related to function and are better enabled to characterize newly acquired protein sequences using in silico technologies.
These right-handed parallel helix structures consists of a contiguous polypeptide chain with three parallel beta strands separated by three turns forming a single rung of a larger helical structure which in total might contain as many as nine rungs. The intrastrand H-bonds are between parallel beta strands in separate rungs. These seem to prevalent in pathogens (bacteria, viruses, toxins) proteins that facilitate binding of the pathogen to a host cell.
Here is a static and dynamic model of the C-terminal fragment of the phage T4 GP5 beta helix.
Beta helices and found in the following organisms (with the diseases they cause in humans): Vibrio cholerae (cholera), Helicobacter pylori (ulcers), Plasmodium falciparum (malaria), Chlamyidia trachomatis (VD), Chlamydophilia pneumoniae (respiratory infection), Trypanosoma brucei (sleeping sickness), Borrelia burgdorferi (Lyme disease), Bordetella parapertussis (whooping cough), Bacillus anthracis (anthrax), Neisseria meningitides (menigitis) and Legionaella pneumophilia (Legionaire's disease).
Protein with this structure has 4-8 blade-shaped beta sheets arranged around a central axis, forming an active site shaped like a funnel.
Here is a model of the C-terminal domain of Tup1 (1ERJ), a yeast transcription factor, which has a seven-bladed beta propeller. Each blade contains a WD40 repeat sequence (around 40 amino acids) that often ends in tryptophan-aspartic acid (W-D). The particular protein has four WD dipeptides sequences, shown in sticks colored with CPK colors.
The funnel provides binding sites for proteins and other molecules, with the ones with more blades usually acting as enzymes.
Domains are the fundamental unit of 3o structure. Domains can be considered a chain or part of a chain that can independently fold into a stable tertiary structure. Domains are units of structure but can also be units of function. Some proteins can be cleaved at a single peptide bonds to form two separate domains. Often, these can fold independently of each other, and sometimes each unit retains an activity that was present in the uncleaved protein. Sometimes binding sites on the proteins are found in the interface between the structural domains. Many proteins seem to share functional and structural domains, suggesting that the DNA of each shared domain might have arisen from duplication of a primordial gene with a particular structure and function.
Evolution has led towards increasing complexity which has required proteins of new structure and function. Increased and different functionalities in proteins have been obtained with additions of domains to base proteins. Chothia (2003) has defined domain in an evolutionary and genetic sense as "an evolutionary unit whose coding sequence can be duplicated and/or undergo recombination". Proteins range from small with a single domain (typically from 100-250 amino acids) to large with many domains. From recent analyzes of genomes, new protein functionalities appear to arise from addition or exchange of other domains which, according to Chothia, result from
- duplication of sequences that code for one or more domains
- divergence of duplicated sequences by mutations, deletions, and insertions that produce modified structures that may have useful new properties to be selected
- recombination of genes that result in novel arrangement of domains.
Structural analyzes show that about half of all protein coding sequences in genomes are homologous to other known protein structures. There appears to be about 750 different families of domains (i.e. small proteins derived from a common ancestor) in vertebrates, each with about 50 homologous structures. About 430 of these domain families are found in all the genomes that have been solved.
Proteins with multiple domains also are more likely not to misfold if each domain can fold somewhat autonomously. In addition, they provide a myriad of binding sites which increase the number of biological functions expressed in a single protein. Multidomain proteins can also express multiple catalytic activities, allowing for a reaction product from one domain to diffuse to another catalytic domain (or interface between domains). This would reduce the dimensionality of the search for a substrate from 3D to a more of a 1D or 2D search, enormously speeding up the net reaction. The process is often called substrate channeling.
The dynamic model below shows three domains of the enzyme pyruvate kinase (1pkn). These include a nucleotide (ADP/ATP) binding domain (blue) made of beta strands, a substrate binding domain (green) in the middle composed of alpha/beta structure, and a regulatory domain (red) composed of alpha/beta structure. These domains were analyzed by a web program called CATH-Gene3D.
The CATH programs offer a complete classification of protein structure based on the following hierarchy of organization: Class, Architecture, Topology, and Homologous Superfamilies - CATH.
- Class: the highest level of organization which consists of four classes - mainly alpha, mainly beta, alpha-beta, and few secondary structures
- Architecture (40 types): describes the shape of domain based on secondary structures but doesn't describe how they are connected. Ex: beta barrel, beta propeller
- Topology (or fold group, 1233 types): members in topology groups have a common fold or topology in the "core" of the domain structure.
- Homologous Superfamilies (2386 types): These groups are homologous in sequence or structure and derive from a common precursor gene/protein.
An alternative computer program, Pfam, show this enzyme as having 2 major domains, a pyruvate kinase beta barrel domain and a pyruvate kinase alpha/beta domain.
Pfam domains are determined by sequence analysis while CATH are determined by structural comparisons. Domains determined by both programs show about a 75% overlap.
At a simpler level, domains are built from the kinds of motif structures we discussed above. Since proteins are very packed structures, the organizational structure of proteins can be thought of as closely packed motifs, but not all possible combinations are found. For example if you have one beta hairpin next to another to form a 2-unit Greek key, there are 24 likely ways to connect them but only eight are common. The two below appear to account for more than the sum of the other 22.
Here is an example of the architecture of the multi-domain protein, human Attractin-like protein 1. This protein is an example of a lectin, a carbohydrate binding protein, which we will explore in a subsequent chapter. It binds Ca2+, so it is considered a C-Lectin.