4.3: Secondary Structural Motifs and Domains
- Last updated
- Save as PDF
- Page ID
- 102254
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Search Fundamentals of Biochemistry
Learning Goals
(Learning goals written by Claude, Sonnet 4.6, Anthropic)
Supersecondary Structural Motifs
- Identify and describe the defining features of four major supersecondary structural motifs — helix-turn-helix (HTH/EF-hand), β-hairpin, Greek key, and β-α-β — in terms of the secondary structure elements involved, the type of β-sheet connectivity they produce (parallel vs. antiparallel), and representative biological functions such as DNA binding, Ca²⁺ coordination, and nucleotide cofactor binding.
- Explain the structural and chemical basis for Ca²⁺ coordination by the EF-hand motif — including the identity of coordinating residues (Asp, Glu, Ser, Thr), the distorted pentagonal bipyramidal geometry arising from a bidentate glutamate ligand, and why the coordination number of 7 preferentially binds Ca²⁺ over the smaller Mg²⁺ — and connect this geometry to the conserved amino acid positions revealed by sequence alignment of calmodulin homologs.
Larger Structural Architectures
- Describe the Rossmann fold and TIM barrel as examples of larger α/β protein architectures built from repeated β-α-β supersecondary motifs, explain why TIM barrels always present their active sites at the C-terminal ends of the β-barrel strands, and account for the TIM barrel's prevalence across evolutionarily unrelated enzyme families as a case of convergent evolution.
- Explain the structural organization of β-helices and β-propellers, identify the biological contexts in which each is found (pathogen adhesion proteins and signal transduction/enzyme scaffolds, respectively), and predict what property of the β-propeller funnel architecture makes it suitable as a multi-ligand binding or catalytic scaffold.
Protein Domains, Evolution, and Classification
- Define a protein domain as an independently folding structural and functional unit, explain how domain duplication, divergence, and recombination drive the expansion of protein functional diversity across genomes, and describe the advantages of multidomain proteins for substrate channeling, autonomous folding, and combinatorial functional diversity.
- Interpret the CATH domain classification hierarchy (Class → Architecture → Topology → Homologous Superfamily) to organize protein structures from the broadest secondary structure content to sequence-level evolutionary relationships, and explain how AI-based tools such as the TED encyclopedia of domains extend domain classification to the hundreds of millions of AlphaFold-predicted structures, revealing novel folds, high-symmetry domains, and extruded repeat architectures not accessible through sequence-based approaches alone.
Common Structural Motifs
Given the number of possible combinations of 1o, 2o, and 3o structures, one might guess that the 3D structure of each protein is quite distinctive. This is, in general, true. However, similar substructures are found in proteins. For instance, common secondary structures are often grouped into structural motifs called supersecondary structures. The same motif is often found in proteins with similar functions (such as proteins that bind DNA, Ca2+, etc). Let's explore some of the common motifs.
Alpha-loop-Alpha
These are found in DNA-binding proteins that regulate transcription and calcium-binding proteins, the motif of which is often called the EF-hand. The loop region in calcium-binding proteins is enriched in Asp, Glu, Ser, and Thr. Why? The EF-hand shown below is from Calmodulin.
Figure \(\PageIndex{1}\) shows an interactive iCn3D model of a basic helix-turn-helix from the c-Myc protein (1NKP). The iCn3D model shows helices interacting with the major groove of DNA, depicted in spacefill.
Figure \(\PageIndex{1}\): Basic helix-turn-helix from the c-Myc protein (1NKP). (Copyright; author via source).
Click the image for a popup or use this external link: https://structure.ncbi.nlm.nih.gov/i...kDv9DGzWWWoMZ8
Figure \(\PageIndex{2}\) shows an interactive iCn3D model of the "EF-hand" from the calcium-binding protein calmodulin (1cll)
Figure \(\PageIndex{2}\): EF hand from Calmodulin (1cll): Secondary Structure Motif. (Copyright; author via source).
Click the image for a popup or use this external link: https://structure.ncbi.nlm.nih.gov/i...vCYH7EX4sVvtu6
The EF Hand can be envisioned as a hand gripping a ball (calcium ion) with the index finger and thumb representing alpha helices, as shown in Figure \(\PageIndex{3}\).
The EF-hand motif of calmodulin is used in various Ca2+ binding proteins. Figure \(\PageIndex{4}\) shows the alignment of the first 50 residues of human calmodulin with four other human calcium-binding proteins. The EF-hand (F12-L29) of calmodulin consists of the second half of the first helix (F12-L18), an intervening loop (F19-T28), and the second helix (T29-L29). Sometimes, it is annotated to encompass a larger stretch (8-43)
Part A shows the degree of conservation of amino acids in this first Ca2+-binding EF-hand. Part B shows the general conservation of key hydrophobic (F12, F19, I27), as well as those of similar polarity (36 and 39)
Figure \(\PageIndex{5}\) shows an interactive iCn3D model of a bound calcium ion and interacting amino acids in human calmodulin, with key amino acids labeled.
Figure \(\PageIndex{5}\): Bound calcium ion and interacting amino acids in human calmodulin (1cll) (Copyright; author via source).
Click the image for a popup or use this external link: https://structure.ncbi.nlm.nih.gov/icn3d/share.html?bCf4mtNbk4kjkCHw6
Hover over the amino acid side chains that are coordinating the Ca2+ ion. Are they what you would expect?
You learned about transition metal complexes and their geometries (tetrahedral with 4 ligand interactions, trigonal bipyramidal with five, and octahedral with six) in introductory chemistry. Ca2+ (a nontransition Group II metal ion) has a diameter of 100 pm, compared to Fe2+ and Fe3+ of about 76 and 68 pm, respectively, for octahedral geometry with six ligand interactions. Hence, more ligand groups can crowd around the calcium ion. In the calmodulin EF hand, the coordination number is 7 with a distorted pentagonal bipyramidal geometry, allowing high affinity for the large Ca2+ ion and conformational flexibility. Mg2+, a smaller nontransition Group II metal ion, is typically octahedral, precluding Ca2+ binding to Mg2+-binding molecules. Figure \(\PageIndex{5b}\) shows the distorted pentagonal bipyramidal geometry of Ca2+-EF hand interactions.
![]() |
![]() |
Figure \(\PageIndex{5b}\) shows the distorted pentagonal bipyramidal geometry of Ca2+-EF hand interactions. Image created by Google Gemini.
The geometry is significantly distorted because Glu31 acts as a bidentate ligand and "pinches" the geometry of the complex.
A linear connectivity "wiring" diagram showing a secondary structure connected by connecting regions is shown in Figure \(\PageIndex{6}\). This wiring diagram shows a 2-residue beta strand, which is too short to be considered a true strand.
A more complicated 2D topology map is shown in Figure \(\PageIndex{7}\). In this case, it is linear, given the small section of amino acids depicted. We will see more complicated 2D topology maps with more complicated structures below.
It is presented on its side to save space on this page.
Beta-hairpin or beta-turn
This motif is present in most antiparallel beta structures, both as an isolated ribbon and as part of beta sheets.
Figure \(\PageIndex{8}\) shows an interactive iCn3D model of the beta hairpin from bovine pancreatic trypsin inhibitor (1k6u).
Beta hairpin from bovine pancreatic trypsin inhibitor (1k6u)
Figure \(\PageIndex{8}\): Beta hairpin from bovine pancreatic trypsin inhibitor (1k6u) (Copyright; author via source).
Click the image for a popup or use this external link: https://structure.ncbi.nlm.nih.gov/i...eMFdHkGogJHCCA
Figure \(\PageIndex{9}\) shows the 2D homology map for the beta-hairpin.
Greek Key
The "Greek Key" symbol represents infinity and the eternal flow of things and resembles, in part, primitive keys. The Greek Key motif in proteins can be seen in the structure of antiparallel beta sheets, ordering four adjacent antiparallel beta strands, as shown in Figure \(\PageIndex{9}\). The figure also shows the repetitive Greek key, which you will see many times if you visit Greece and tour its antiquities.
Figure \(\PageIndex{10}\)s shows a partial 2D topology map of Staphylococcus nuclease (2SNS).
Figure \(\PageIndex{11}\) shows an interactive iCn3D model of the Greek Key motif from Staphylococcus nuclease (2SNS). The involved beta strands are shown in yellow.
Figure \(\PageIndex{11}\): Greek Key motif from Staphylococcus nuclease (2SNS) (Copyright; author via source).
Click the image for a popup or use this external link: https://structure.ncbi.nlm.nih.gov/i...x2ef4xpttXrFb9
Beta-Alpha-Beta
The motif is a common way to connect two parallel beta strands, unlike beta hairpins, which connect antiparallel beta strands.
Figure \(\PageIndex{12}\) shows an interactive iCn3D model of the beta-alpha-beta structure from triose phosphate isomerase (1amk).
Figure \(\PageIndex{13}\) shows the 1D wiring diagram for the first beta-alpha-beta motif in triose phosphate isomerase.
Figure \(\PageIndex{14}\) shows the 2D topology diagrams showing this motif.
Larger Structural Motifs - Protein Architecture
Some proteins combine larger secondary and supersecondary structural components, often repeatedly, to produce more complex structures. We've seen this with larger twisted sheets and beta barrels, such as the TIM barrel. Let's consider three of these as examples of protein architectures, without considering the protein's connectivity.
The Rossman Fold
Structural motifs can serve particular functions within proteins, such as enabling the binding of substrates or cofactors. For example, the Rossmann fold is responsible for binding to nucleotide cofactors such as nicotinamide adenine dinucleotide (NAD+), as shown in Figure \(\PageIndex{15}\). The Rossmann fold comprises six parallel beta strands forming an extended beta sheet. The first three strands are connected by α-helices, resulting in a beta-alpha-beta-alpha-beta structure. This pattern is duplicated once to produce an inverted tandem repeat with six strands. Overall, the strands are arranged in the order of 321456 (1 = N-terminal, 6 = C-terminal). Five stranded Rossmann-like folds are arranged in the sequential order 32145. The overall tertiary structure of the fold resembles a three-layered sandwich, with the filling composed of an extended beta sheet and the connecting parallel alpha helices forming the two slices of bread.
Image modified from: Boghog
One feature of the Rossmann fold is its cofactor-binding specificity. The most conserved segment of Rossmann folds is the first beta-alpha-beta segment. Since this segment interacts with the ADP portion of dinucleotides such as FAD, NAD, and NADP, it is also called an "ADP-binding beta-beta fold."
Figure \(\PageIndex{16}\) shows an interactive iCn3D model of the Rossman fold of malate dehydrogenase (5KKA) from E. Coli. The beta strands (yellow) connecting the alpha helices (red) and the coil (blue) of the Rossman fold are shown in the context of the rest of the monomeric protein, which is shown in gray.
The TIM barrel revisited.
Interestingly, similar structural motifs do not always have a common evolutionary ancestor and can arise from convergent evolution. This is the case with the TIM Barrel, a conserved protein fold composed of eight α-helices and eight parallel β-strands, arranged along the peptide backbone. It is illustrated in Figure \(\PageIndex{17}\). The structure is named after triosephosphate isomerase, a conserved metabolic enzyme. TIM barrels are one of the most common protein folds. One of the most intriguing features among members of this class of proteins is that although they all exhibit the same tertiary fold, there is very little sequence similarity between them. At least 15 distinct enzyme families use this framework to generate the appropriate active site geometry, always at the C-terminal end of the eight parallel beta-strands of the barrel.
Figure \(\PageIndex{17}\) The TIM Barrel. TIM barrels are considered α/β protein folds because they include an alternating pattern of α-helices and β-strands in a single domain. In a TIM barrel, the helices and strands (usually 8 of each) form a solenoid that curves around to close in a doughnut shape, topologically known as a torus. The parallel β-strands form the inner wall of the doughnut (hence, a β-barrel), whereas the α-helices form the outer wall of the doughnut. Each β-strand connects to the next adjacent strand in the barrel through a long right-handed loop that includes one of the helices, so that the ribbon N-to-C coloring in the top view (A) proceeds in rainbow order around the barrel. The TIM barrel can also be thought of as composed of 8 overlapping, right-handed β-α-β supersecondary structures, as shown in the side view (B).
Image modified from: WillowW
Although the ribbon diagram of the TIM Barrel shows a hole in the protein's central core, the amino acid side chains are not shown in this representation (Figure 2.26). The protein's core is tightly packed, mostly with bulky hydrophobic amino acid residues. However, a few glycines are needed to allow wiggle room for the highly constrained center of the eight approximate repeats to fit together. The packing interactions between the strands and helices are also dominated by hydrophobicity, and the branched aliphatic residues valine, leucine, and isoleucine comprise about 40% of the total residues in the β-strands.
The figure \(\PageIndex{18}\) below shows an interactive iCn3D model of the TIM barrel (1WYI) from Chapter 4.2.
As our knowledge of the myriad structural motifs found in nature's treasure trove of protein structures continues to increase, we gain insight into how protein structure relates to function and are better enabled to characterize newly acquired protein sequences using in silico technologies.
Beta Helices
These right-handed parallel helical structures consist of a contiguous polypeptide chain with three parallel beta strands separated by three turns, forming a single rung of a larger helical structure, which might contain as many as nine rungs. The intrastrand H-bonds are between parallel beta strands in separate rungs. These are prevalent in proteins from pathogens (bacteria, viruses, and toxins) that facilitate pathogen binding to a host cell.
Figure \(\PageIndex{19}\) shows an interactive iCn3D model of the C-terminal fragment of the phage T4 GP5 beta helix (4osd).
Beta helices are found in the following organisms (with the diseases they cause in humans): Vibrio cholerae (cholera), Helicobacter pylori (ulcers), Plasmodium falciparum (malaria), Chlamydia trachomatis (VD), Chlamydophila pneumoniae (respiratory infection), Trypanosoma brucei (sleeping sickness), Borrelia burgdorferi (Lyme disease), Bordetella parapertussis (whooping cough), Bacillus anthracis (anthrax), Neisseria meningitides (meningitis), and Legionella pneumophila (Legionnaire's disease).
Beta Propellers
Proteins with this structure have 4-8 blade-shaped beta sheets arranged around a central axis, forming an active site shaped like a funnel.
Figure \(\PageIndex{20}\) shows an interactive iCn3D model of the C-terminal domain of Tup1 (1ERJ), a yeast transcription factor, which has a seven-bladed beta propeller. Each blade contains a WD40 repeat sequence (around 40 amino acids) that often ends in tryptophan-aspartic acid (W-D). The particular protein has four WD dipeptide sequences, shown as sticks in CPK colors.
The funnel provides binding sites for proteins and other molecules, with the ones with more blades usually acting as enzymes.
Domains
Domains are the fundamental unit of tertiary (3o) structure. Domains can be considered a chain or part of a chain that can independently fold into a stable tertiary structure. Domains are units of structure but can also be units of function. Some proteins can be cleaved at a single peptide bond, forming two domains. Often, these can fold independently of each other, and sometimes, each unit retains an activity it had in the uncleaved protein. Sometimes, binding sites on proteins are found at the interface between structural domains. Many proteins appear to share functional and structural domains, suggesting that the DNA encoding each domain may have arisen from the duplication of a primordial gene with a particular structure and function.
Evolution has increased complexity, requiring proteins to have new structures and functions. Increased and diverse protein functionalities have been achieved by adding domains to base proteins. Chothia (2003) has defined a domain in an evolutionary and genetic sense as "an evolutionary unit whose coding sequence can be duplicated and/or undergo recombination." Proteins range from small with a single domain (typically from 100-250 amino acids) to large with many domains. From recent analyses of genomes, new protein functionalities appear to arise from the addition or exchange of other domains, which, according to Chothia, result from
- duplication of sequences that code for one or more domains
- divergence of duplicated sequences by mutations, deletions, and insertions that produce modified structures that may have helpful new properties to be selected
- recombination of genes that form a novel arrangement of domains.
Structural analyses show that about half of all protein-coding sequences in genomes are homologous to other known protein structures. There appear to be about 750 different families of domains (i.e., small proteins derived from a common ancestor) in vertebrates, each with about 50 homologous structures. About 430 of these domain families are found in all solved genomes.
Proteins with multiple domains are less likely to misfold if each domain can fold somewhat autonomously. In addition, they provide a myriad of binding sites, increasing the number of biological functions expressed by a single protein. Multidomain proteins can also express multiple catalytic activities, allowing a reaction product from one domain to diffuse to another catalytic domain (or to the interface between domains). This would reduce the dimensionality of the search for a substrate from 3D to more of a 1D or 2D search, enormously speeding up the net reaction. The process is often called substrate channeling.
Figure \(\PageIndex{21}\) shows an interactive iCn3D model of the three domains of the enzyme pyruvate kinase (1pkn). These include a nucleotide (ADP/ATP) binding domain (blue) made of beta strands, a substrate binding domain (green) in the middle composed of alpha/beta structure, and a regulatory domain (red) composed of alpha/beta structure. These domains were analyzed by a web program called CATH-Gene3D.
One ubiquitous domain is the Immunoglobulin Fold (IGF), also called the Immunoglobulin Domain (IgD). They are abundantly found in immune proteins, cell surface proteins involved in recognition, and other proteins. They are described in detail in Chapter 5.4: Recognition of Self and Nonself - The Immune System. Here are some images and iCn3D displays of proteins with Ig Domains in bacteria (left), viruses (center), and humans (right). (some have long load times)
| Intimin-190 (Int190) from Enteropathogenic E. coli (1E5U) |
Ig Domain in SARS-CoV-2 Spike Glycoprotein (6VXX) |
AlphaFold ID Carcinoembryonic antigen-related cell adhesion molecule 1 (CEACAM1) (P13688) |
|
Click the image for a popup or use this external link: https://structure.ncbi.nlm.nih.gov/icn3d/share.html?AcPowvr2Uz37Y1Rh7 |
Click the image for a popup or use this external link: https://structure.ncbi.nlm.nih.gov/i...hPGzYJF9wV7gt5 |
Click the image for a popup or use this external link: https://structure.ncbi.nlm.nih.gov/i...GXmU8dwdNQ2w69 |
Recent Updates: 11/1/2024
With recent advances in AI-driven prediction of protein structure and function, there is a greater need to refine and develop programs that can determine the domain structures of the over 200 million protein structures in the AlphaFold database. Until recently, there were two different ways to determine domain structures in proteins:
- based on the 3D structure of a protein. The program CATH does this.
- based on 1D (linear) sequences. An example of a program that uses this approach is Pfam.
CATH classifieds protein structure based on the following hierarchy of organization: Class, Architecture, Topology, and Homologous Superfamilies
- Class: the highest level of organization, which consists of four classes - mainly alpha, mainly beta, alpha-beta, and a few secondary structures
- Architecture (40 types): describes the shape of the domain based on secondary structures, but doesn't describe how they are connected. Ex: beta-barrel, beta-propeller
- Topology (or fold group, 1233 types): Members of topology groups share a common fold or topology in the "core" of the domain structure.
- Homologous Superfamilies (2386 types): These groups are homologous in sequence or structure and derive from a common precursor gene or protein.
Pfam uses multiple alignments of sequences.
The pyruvate kinase example above shows three structural domains. Pfam finds two major domains: a pyruvate kinase beta-barrel domain and an alpha/beta domain. The domains determined by both programs show about a 75% overlap.
At a simpler level, domains are built from the kinds of motif structures we discussed above. Since proteins are highly compact structures, their organization can be thought of as a closely packed array of motifs, but not all possible combinations are observed. For example, if you have two beta hairpins next to each other to form a 2-unit Greek key, there are 24 possible ways to connect them, but only eight are common. The two below account for more than the sum of the other 22. These are shown in Figure \(\PageIndex{22}\).
Figure \(\PageIndex{23}\) shows an example of the architecture of the multi-domain protein, human Attractin-like protein 1. This protein is an example of a lectin, a carbohydrate-binding protein, which we will explore in a subsequent chapter. It binds Ca2+, so it is considered a C-Lectin. Three different programs were used to analyze the domain structure.
Figure \(\PageIndex{23}\): Architecture of the multi-domain protein, human Attractin-like protein 1
A new AI-based "TED - Encyclopedia of Domains" has been developed to identify and classify around 365 million domains in the AlphFold database. Around 1/3 of these were not predicted through 1D sequence alignments and comparisons. Around 3/4 of the "nonredundant" domains were similar to domains predicted by the 3D structure alignments of CATH. TED identified new domain interactions between superfamilies and many new protein folds. New folds across life likely suggest a common function, whereas new folds with a given lineage suggest evolutionary changes.
Some new folds had higher symmetry, including the beta-propeller, which repeats to achieve it (see below for lower C3 symmetry). Additional AI-based sequence (1D) information was used to help infer function. For example, putative Zn2+ finger-like binding sites containing 2 Cys and 2 His but lacking a traditional Zn-finger motif were found. Most putative heme-binding sites contained the CXXCH motif found in heme c. Inspection of the structures predicted to have these features supported their likely functions.
CATH and TED are 3D structure-based, so comparing their domains and domain interactions is warranted. TED found over 27 million examples of interacting domains, with about 14,000 interacting superfamilies pairs compared to around 200,000, with 5000 pairs for CATH
Figure \(\PageIndex{24}\) below shows the classification of TED domains using the CATH hierarchy.
Figure \(\PageIndex{24}\): Classification of TED domains using the CATH hierarchy. The top 100 superfamilies in TED-100 for each CATH class, where more matches to CATH superfamilies have been identified through structural hits in TED compared with sequence hits in Gene3D. (ii) Proportion of domains matched to CATH classes (n = 238,569,631). Andy M. Lau et al. Exploring structural diversity across the protein universe with The Encyclopedia of Domains. Science 386, eadq4946(2024). DOI:10.1126/science.adq4946. Author Accepted Manuscript (AAM) version available under a CC BY public copyright license." Manuscript published in Science, Volume 386, Issue 6721, 1 Nov 2024
Table \(\PageIndex{1}\) below shows examples of novel domain folds and probable functional sites described in the paper by Lau et al.
| TED: A0A7M3WA57_TED05 - paired beta-strands in a closed, twisted hairpin with both termini adjacent. (4Ci from Lau et al.) | TED: E1Z635_TED02 - Alpha-helical variant, Eukaryotic. (4Dii from Lau et al). | TED: A0A2J6RQN3 - tentative Zn2+-binding protein | TED: M5FA19 - tentative heme c-binding protein |
|
Download iCn3D png file |
Download iCn3D png file |
Download iCn3D png file |
Download iCn3D png file |
View this in iCn3D as follows:
- download the above files to your computer. IMPORTANT: If the file opens as an image in a new browser window, right-click the image and save the file to download it!
- open iCn3D
- File, Open File, iCn3D appendable, navigate to the folder with the downloaded png file, and select it.
Figure \(\PageIndex{25}\) below shows new examples of symmetry (C3) and extruded repeat domains found using TED.
Examples of high-symmetry domains and extruded repeats. Domains are identified as part of the novel domain identification pipeline, and high internal-symmetry domains are identified by scoring with the SymD program. Extruded repeats are domains with many ordered, cyclic repeats projecting along a single axis. Coloration follows plDDT confidence bins as per the AFDB. Dark blue indicates very high confidence: plDDT ≥ 90; blue indicates high confidence (90 > plDDT ≥ 70); yellow indicates low confidence (70 > plDDT ≥ 50); and orange indicates very low confidence (plDDT < 50). Andy M. Lau et al. ibid.
Table \(\PageIndex{2}\) below shows iCn3D examples of higher symmetry and extruded repeat domains described in the paper by Lau et al.
| C11 symmetry-A0A1V6M2Y0 | C10 symmetry-A0A6C0LIE9 | Extruded-A0A1M5CF6 | Extruded- A0A833H0U1 |
|
Download iCn3D png file |
Download iCn3D png file |
Download iCn3D png file |
Download iCn3D png file |
View this in iCn3D as follows:
- download above files to your computer. IMPORTANT: If the file opens as an image in a new browser window, right-click the image and save the file to download it!
- open iCn3D
- File, Open File, iCn3D appendable, navigate to the folder with the downloaded png file, and select it.
Individual protein annotations can also be browsed from the TED website (https://ted.cathdb.info).
TED structural domain assignments for AlphaFold Database v4 and associated codes are available for download at Zenodo. The deposition contains domain assignments for TED, PDB files for novel folds, and individual domain assignments from Chainsaw, Merizo, and UniDoc to facilitate further benchmarking efforts. Specifically:
- novel_folds_set_models.tar.gz contains PDB files for all novel-fold representatives identified in TED100.
- high_symmetry_folds_set_models.tar.gz contains PDB files of all highly symmetrical fold representatives identified in TED100.
Summary
(Summary written by Claude, Sonnet 4.6, Anthropic)
This chapter moves from individual secondary structure elements to the hierarchical organizational levels above them — supersecondary structural motifs, large-scale protein architectures, and domains — establishing the conceptual framework for understanding how structural complexity is built from recurring modular units and how that complexity is cataloged and interpreted.
Supersecondary structural motifs are defined as combinations of secondary structure elements that recur across evolutionarily unrelated proteins and are often associated with specific molecular functions. The helix-turn-helix motif — two α-helices connected by a short loop — mediates sequence-specific DNA binding in transcription factors such as c-Myc, with one helix inserting into the major groove. The structurally related EF-hand motif (helix-loop-helix) is the canonical Ca²⁺-binding module in calmodulin and numerous other Ca²⁺-sensing proteins. The Ca²⁺-coordinating loop is enriched in Asp, Glu, Ser, and Thr residues whose side chain oxygens serve as ligands; a bidentate glutamate contributes two of the seven coordination sites, generating a distorted pentagonal bipyramidal geometry uniquely suited to the large Ca²⁺ ion. The smaller Mg²⁺ prefers octahedral coordination (6 ligands) and cannot accommodate the EF-hand's coordination geometry, providing the basis for Ca²⁺ selectivity. Sequence alignments across Ca²⁺-binding protein families reveal conserved hydrophobic and polar positions that maintain the EF-hand framework even as peripheral sequence diverges. The β-hairpin (two antiparallel strands connected by a tight reverse turn) and Greek key motif (four antiparallel strands in a topology resembling the ancient decorative pattern) are the fundamental building blocks of antiparallel β-sheet proteins. The β-α-β motif, by contrast, connects two parallel β-strands via an intervening α-helix and is the structural unit from which all-parallel β-sheet architectures, including the Rossmann fold and TIM barrel, are assembled.
Larger structural architectures emerge from repetitive or elaborated arrangements of supersecondary motifs. The Rossmann fold consists of six parallel β-strands arranged as an extended sheet, with the first three strands connected by α-helices (β-α-β-α-β), the pattern then repeated and inverted to produce six strands in order 3-2-1-4-5-6. The resulting three-layer sandwich — with the β-sheet as the core "filling" and flanking helices as the "bread" — creates a conserved ADP-binding cleft used to recognize the adenine dinucleotide portion of NAD⁺, FAD, and NADP. The first β-α-β segment is the most conserved region. The TIM barrel, composed of eight alternating parallel β-strands and α-helices arranged as a toroidal (doughnut-shaped) structure, is one of the most prevalent protein folds known. The hydrophobic core between the β-barrel inner wall and the α-helix outer wall is tightly packed with branched aliphatic residues. Catalytic residues are invariably positioned at the C-terminal ends of the β-strands, regardless of the enzyme family — a remarkable convergent solution to the problem of creating a catalytic pocket in a stable scaffold. β-helices consist of a contiguous chain forming rungs of three parallel β-strands, producing a right-handed solenoid structure; they are prevalent in virulence factors and surface-adhesion proteins of pathogens responsible for diseases ranging from cholera to anthrax. β-propellers arrange four to eight blade-shaped β-sheets radially around a central axis, creating a funnel-shaped binding site; proteins with more blades typically function as enzymes, while smaller propellers mediate protein-protein interactions.
Protein domains are the fundamental structural and functional units of complex proteins — independently folding segments of ~100–250 amino acids that can often retain activity after proteolytic separation from the rest of the protein. Domains serve as the evolutionary currency of protein diversification: gene duplication creates copies of existing domains, divergence by mutation, deletion, and insertion produces new functions, and recombination assembles novel multidomain proteins. Approximately half of all protein-coding sequences in sequenced genomes are homologous to known structural domains, with ~750 domain families recognized in vertebrates. Multidomain proteins confer several advantages: each domain folds semi-autonomously (reducing the risk of misfolding), multiple binding sites are presented on a single chain, and sequential catalytic domains enable substrate channeling — passing reaction intermediates directly from one active site to another, reducing the dimensionality of the diffusion problem and accelerating overall throughput. Active sites are frequently located at interdomain interfaces, where structural flexibility and chemical contributions from two domains can be combined. The CATH classification hierarchy organizes protein domains by Class (secondary structure content), Architecture (spatial arrangement), Topology (fold connectivity), and Homologous Superfamily (common ancestry), encompassing ~2,386 superfamilies. The recent AI-driven TED (The Encyclopedia of Domains) has extended domain classification to over 365 million structures in the AlphaFold database, identifying approximately one-third of domains that had no sequence-based precedent, revealing new high-symmetry folds (including novel C3, C10, and C11 symmetric architectures), extruded repeat domains, and previously unrecognized metal-binding motifs (including non-canonical Zn²⁺-finger-like and heme c-binding sites), substantially expanding our catalogue of the structural universe and the functional space it encodes.











_from_Enteropathogenic_E._coli_(1E5U).png?revision=1)
.png?revision=1&size=bestfit&width=175&height=246)
%25C2%25A0(P13688).png?revision=1)







