Skip to main content
Biology LibreTexts

4.4: Protein Tertiary and Quaternary Structures - Part B

  • Page ID
    26180
  • note: this chapter takes a little while to load because of the numerous iCn3D molecular models. Be patient and wait!

    Domains

    Domains are the fundamental unit of 3o structure. Domains can be considered a chain or part of a chain that can independently fold into a stable tertiary structure. Domains are units of structure but can also be units of function. Some proteins can be cleaved at a single peptide bonds to form two separtae domains. Often, these can fold independently of each other, and sometimes each unit retains an activity that was present in the uncleaved protein. Sometimes binding sites on the proteins are found in the interface between the structural domains. Many proteins seem to share functional and structural domains, suggesting that the DNA of each shared domain might have arisen from duplication of a primordial gene with a particular structure and function.

    Evolution has led towards increasing complexity which has required proteins of new structure and function. Increased and different functionalities in proteins have been obtained with additions of domains to base proteins. Chothia (2003) has defined domain in an evolutionary and genetic sense as "an evolutionary unit whose coding sequence can be duplicated and/or undergo recombination". Proteins range from small with a single domain (typically from 100-250 amino acids) to large with many domains. From recent analyzes of genomes, new protein functionalities appear to arise from addition or exchange of other domains which, according to Chothia, result from

    • "duplication of sequences that code for one or more domains
    • divergence of duplicated sequences by mutations, deletions, and insertions that produce modified structures that may have useful new properties to be selected
    • recombination of genes that result in novel arrangement of domains."

    Structural analyzes show that about half of all protein coding sequences in genomes are homologous to other known protein structures. There appears to be about 750 different families of domains (i.e. small proteins derived from a common ancestor) in vertebrates, each with about 50 homologous structures. About 430 of these domain families are found in all the genomes that have been solved.

    The dynamic model below shows three domains of the enzyme pyruvate kinase (1pkn). These include a nucleotide (ADP/ATP) binding domain (blue) made of beta strands, a substrate binding domain (green) in the middle composed of alpha/beta structure, and a regulatory domain (red) composed of alpha/beta structure. These domains were analyzed by a web program called CATH-Gene3D.

    The CATH programs offer a complete classification of protein structure based on the following hierarchy of organization: Class, Architecture, Topology, and Homologous Superfamilies - CATH.

    • Class: the highest level of organization which consists of four classes - mainly alpha, mainly beta, alpha-beta, and few secondary structures
    • Architecture (40 types): describes the shape of domain based on secondary structures but doesn't describe how they are connected. Ex: beta barrel, beta propellor
    • Topology (or fold group, 1233 types): members in topology groups have a common fold or topology in the "core" of the domain structure.
    • Homologous Superfamilies (2386 types): These groups are homologous in sequence or structure and derive from a common precursor gene/protein.

    An alternative computer program, Pfam, show this enzyme as having 2 major domains, a pyruvate kinase beta barrel domain and a pyruvate kinase alpha/beta domain.

    Pfam domains are determined by sequence analysis while CATH are determined by structural comparisions. Domains determined by both programs show about a 75% overalap.

    Structural Classes of Proteins

    Proteins can be divided into 3 classes of protein, depending on their characteristic secondary structure. See the dynamics models below for each class.

    Alpha proteins - consist of predominately alpha helix.

    Example: myoglobin P562 (4mbn)

    Alpha/Beta proteins - consist of a common of alpha and beta structure.

    These are the most common class. Example: Triose phosphate Isomerase

    Beta proteins - consist of predominately beta structure.

    Example: Superoxide Dismutase (2sod).

    Here are some resources about protein structure

    Here are some 3D structures resources, accessible through a sequence or ID-based search. and collated in Nature's Structural Biology Knowledge Base.

    • Biological Magnetic Resonance Data Bank
    • CATH - structural classification of manually curated classification of protein domain structures
    • DisProt - Database of Protein Disorder
    • Gene3D - CATH domain assignments for protein sequences
    • NESG Functional Annotation Database - Computational analysis of function of protein of unknown function
    • Membrane proteins of known 3D structure
    • RCSB PDB - Protein Data Bank USA
    • PDBe - Protein Databank Europe
    • DPDBj - Protein Databank Japan
    • PDBsum - a pictorial database that provides an at-a-glance overview of the contents of each 3D structure deposited in the Protein Data Bank
    • PROCOGNATE - database of cognate ligands for the domains of enzyme structures in CATH, SCOP and Pfam
    • SCOP - Structural Classification of Proteins: detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structures are known
    • SMART (Simple Modular Architecture Research Tool) - allows the identification and annotation of genetically mobile domains and the analysis of domain architectures

    Quarternary Structure

    Primary structure is the linear sequence of the protein. Secondary structure is the repetitive structure formed from H-bonds among backbone amide H and carbonyl O atoms. Tertiary structure is the overall 3D structure of the protein. Quaternary structure is the overall structure that arises when tertiary structures aggregate with self to form homodimers, homotrimers, or homopolymers OR aggregate with different proteins to form heteropolymers. Most protein subunits in a larger protein displaying quaternary structure are held together by noncovalent interactions (intermolecular forces), although in some, they are aloso held together by disulfide bonds (example: immunoglobulins).

    Here is a dynamic model of a homodimer, the variable domain of the T cell receptor delta chain (1tvd). Carefully rotate the model to see the two identical chains held together by noncovalent interactions

    Here is a dynamic model of a heterodimer, reverse transcriptase, (1rev). The two different subunits are show in different colors.

    Globular versus fibrilar structures

    Most proteins have a roughly spherical or "globular" tertiary structure. However, there are many proteins that form elongated fibrils with properties like elasticity, which measures the extent of deformation with a given force and subsequent return to the original state. Elastic molecules must store energy (go to a higher energy state) when the elongating force is applied, and the energy must be released on return to the equilibrium resting structure. Structures that can store energy and release it when subjected to a force have resiliency. Proteins that stretch with an applied force include elastin (in blood vessels, lungs and skins where elasticity is required), resilin in insects (which stretches on wing beating), silk (found in spider web) and fibrillin (found in most connective tissues and cartilage). Some proteins have high resiliency (90% in elastin and resilin), while others are only partially resilient (35% in silk, which have a tensile strength approaching that of stainless steel).

    In contrast to rubber, which has an amorphous structure which imparts elasticity, these proteins, although they have a dissimilar amino acid sequence, seem to have a common structure inferred from their DNA sequences. In some (like fibrillin), the protein has a folded beta sheet domain which unfold like a stretched accordion. Others (like elastin and spider silk) have a beta sheet domain and other secondary structures (alpha-helices and beta turns) along with Pro and Ala repetitions. Scientists are studying these structures to help in the synthesis of new elastic and resilient products.

    Here is an example of a fibillar protein

    Additional Materials we may want to Include:

    Tertiary and Quaternary Protein Structure

    The complete 3-dimensional shape of the entire protein (or sum of all the secondary structural motifs) is known as the tertiary structure of the protein and is a unique and defining feature for that protein (Figure 2.27). Primarily, the interactions among R groups creates the complex three-dimensional tertiary structure of a protein. The nature of the R groups found in the amino acids involved can counteract the formation of the hydrogen bonds described for standard secondary structures such as the alpha helix. For example, R groups with like charges are repelled by each other and those with unlike charges are attracted to each other (ionic bonds). Uncharged nonpolar side chains can form hydrophobic interactions. Interaction between cysteine side chains can lead to the formation of disulfide linkages.

    This illustration shows a polypeptide backbone folded into a three-dimensional structure. Chemical interactions between amino acid side chains maintain its shape. These include an ionic bond between an amino group and a carboxyl group, hydrophobic interactions between two hydrophobic side chains, a hydrogen bond between a hydroxyl group and a carbonyl group, and a disulfide linkage.

    Figure 2.27 Tertiary Protein Structure. The tertiary structure of proteins is determined by a variety of chemical interactions. These include hydrophobic interactions, ionic bonding, hydrogen bonding and disulfide linkages.

    Image by: School of Biomedical Sciences Wiki


    All of these interactions, weak and strong, determine the final three-dimensional shape of the protein. When a protein loses its three-dimensional shape, it is usually no longer be functional.

    In nature, some proteins are formed from several polypeptides, also known as subunits, and the interaction of these subunits forms the quaternary structure. Weak interactions between the subunits help to stabilize the overall structure. For example, insulin (a globular protein) has a combination of hydrogen bonds and disulfide bonds that cause it to be mostly clumped into a ball shape. Insulin starts out as a single polypeptide and loses some internal sequences during cellular processing that form two chains held together by disulfide linkages as shown in figure 2.14. Three of these structures are then grouped further forming an inactive hexamer (Figure 2.28). The hexamer form of insulin is a way for the body to store insulin in a stable and inactive conformation so that it is available for release and reactivation in the monomer form.

    Figure 2.28 The Insulin Hormone is a Good Example of Quaternary Structure. Insulin is produced and stored in the body as a hexamer (a unit of six insulin molecules), while the active form is the monomer. The hexamer is an inactive form with long-term stability, which serves as a way to keep the highly reactive insulin protected, yet readily available.

    Figure By: Isaac Yonemoto


    Predicting the folding pattern of a protein based on its primary sequence is an extremely difficult task due to the inherent flexibility of amino acid residues that can be utilized to form different secondary features. As described by Fujiwara, et al., the SCOP classification (Structural Classification of Protein) and SCOPe (the extended version) are major databases providing detailed and comprehensive descriptions of all known protein structures. SCOP classification is based on hierarchical levels: The first two levels, family and superfamily, describe near and far evolutionary relationships, whereas the third, fold, describes geometrical relationships and structural motifs within the protein. Within the fold classification scheme, most proteins are assigned to one of four structural classes: (1) all α-helix, (2) all β-sheet, (3) α/β for proteins with dispersed patterns, and (4) α + β for proteins with regions that are predominated by one or the other pattern type.

    Based on their shape, function and location proteins can be characterized broadly as fibrous, globular, membrane, or disordered.

    Fibrous Proteins

    Fibrous Proteins are characterized by elongated protein structures. These types of proteins often aggregate into filaments or bundles forming structural scaffolds in biological systems. Within animals, the two most abundant fibrous protein families are α-keratin and collagen.

    α-keratin

    α-keratin is the key structural element making up hair, nails, horns, claws, hooves, and the outer layer of skin. Due to its tightly wound structure, it can function as one of the strongest biological materials and has various uses in mammals, from predatory claws to hair for warmth. α-keratin is synthesized through protein biosynthesis, utilizing transcription and translation, but as the cell matures and is full of α-keratin, it dies, creating a strong non-vascular unit of keratinized tissue.

    The first sequences of α-keratins were determined by Hanukoglu and Fuchs. These sequences revealed that there are two distinct but homologous keratin families which were named as Type I keratin and Type II keratins. There are 54 keratin genes in humans, 28 of which code for type I, and 26 for type II. Type I proteins are acidic, meaning they contain more acidic amino acids, such as aspartic acid, while type II proteins are basic, meaning they contain more basic amino acids, such as lysine. This differentiation is especially important in α-keratins because in the synthesis of its sub-unit dimer, the coiled coil, one protein coil must be type I, while the other must be type II (Figure 2.29). Even within type I and II, there are acidic and basic keratins that are particularly complementary within each organism. For example, in human skin, K5, a type II α-keratin, pairs primarily with K14, a type I α-keratin, to form the α-keratin complex of the epidermis layer of cells in the skin.

    Coiled-coil dimers then assemble into protofilaments, a very stable, left-handed superhelical motif which further multimerises, forming filaments consisting of multiple copies of the keratin monomers (Figure 2.29). The major force that keeps the coiled-coil structures associated with one another are hydrophobic interactions between apolar residues along the keratins helical segments.

    Figure 2.29. Formation of an Intermediate Filament. Intermediate filaments are composed of an α-keratin superhelical complex. Initially, two keratin monomers (A) form a coiled coil dimer structure (B) Two coiled coil dimers join to form a staggered tetramer (C), the tetramers start to join together (D), ultimately forming a sheet of eight tetramers (E). The sheet of eight tetramers is then twisted into a lefthanded helix forming the final intermediate filament (E) An electron micrograph of the intermediate filament is shown in the upper lefthand corner.

    Image by: US Gov


    Collagen

    The fibrous protein, Collagen is the most abundant protein in mammals, making 25% to 35% of the whole-body protein content. It is found predominantly in the extracellular space within various connective tissues in the body. Collagen contains a unique quaternary structure of three protein strands wound together to form a triple helix. It is mostly found in fibrous tissues such as tendons, ligaments, and skin.

    Depending upon the degree of mineralization, collagen tissues may be rigid (bone), compliant (tendon), or have a gradient from rigid to compliant (cartilage). It is also abundant in corneas, blood vessels, the gut, intervertebral discs, and the dentin in teeth. In muscle tissue, it serves as a major component of the endomysium. Collagen constitutes one to two percent of muscle tissue and accounts for 6% of the weight of strong, tendinous, muscles. The fibroblast is the most common cell that creates collagen. Gelatin, which is used in food and industry, is collagen that has been irreversibly hydrolyzed. In addition, partially and fully hydrolyzed collagen powders are used as dietary supplements. Collagen has many medical uses in treating complications of the bones and skin.

    The name collagen comes from the Greek (kólla), meaning "glue", and suffix -gen, denoting "producing". This refers to the compound's early use in the process of boiling the skin and tendons of horses and other animals to obtain glue.

    Over 90% of the collagen in the human body is type I. However, as of 2011, 28 types of collagen have been identified, described, and divided into several groups according to the structure they form. The five most common types are:

    • Type I: skin, tendon, vasculature, organs, bone (main component of the organic part of bone)
    • Type II: cartilage (main collagenous component of cartilage)
    • Type III: reticulate (main component of reticular fibers), commonly found alongside type I
    • Type IV: forms basal lamina, the epithelium-secreted layer of the basement membrane
    • Type V: cell surfaces, hair, and placenta

    Here we will focus on the unique attributes of Collagen Type I. Collagen Type I has an unusual amino acid composition and sequence:

    • Glycine is found at almost every third residue.
    • Proline makes up about 17% of collagen.
    • Collagen contains two uncommon derivative amino acids not directly inserted during translation. These amino acids are found at specific locations relative to glycine and are modified post-translationally by different enzymes, both of which require vitamin C as a cofactor (Figure 2.30).
      • Hydroxyproline derived from proline
      • Hydroxylysine derived from lysine - depending on the type of collagen, varying numbers of hydroxylysines are glycosylated (mostly having disaccharides attached).

    Figure 2.30. Hydroxylation of Proline and Lysine During the Post-Translational Modification of Collagen Type I. The enzymes prolyl hydroxylase and lysyl hydroxylase are required for the hydroxylation of proline (A) and lysine (B) residues, respectively. (Note: While position 3 is shown above, prolyl residues may alternatively be hydroxylated at the 4-position). The hydroxylase enzymes modify amino acid residues after they have been incorporated into the protein as a post-translational modification and require vitamin C (ascorbate) as a cofactor. (C) Further modification of the hydroxylysine residues by glycosylation can lead to the incorporation of the disaccharide (galactose-glucose) at the hydroxy oxygen.


    Most collagen forms in a similar manner. The synthesis process for Collagen Type I is described below and showcases the complexity of protein folding and processing (Figure 2.31).

    1. Inside the cell
      1. Two types of alpha chains are formed during translation on ribosomes along the rough endoplasmic reticulum (RER): alpha-1 and alpha-2 chains. These peptide chains (known as preprocollagen) have registration peptides on each end and a signal peptide.
      2. Polypeptide chains are released into the lumen of the RER.
      3. Signal peptides are cleaved inside the RER and the chains are now known as pro-alpha chains.
      4. Hydroxylation of lysine and proline amino acids occurs inside the lumen. This process is dependent on ascorbic acid (vitamin C) as a cofactor.
      5. Glycosylation of specific hydroxylysine residues occurs.
      6. Triple alpha helical structure is formed inside the endoplasmic reticulum from two alpha-1 chains and one alpha-2 chain.
      7. Procollagen is shipped to the Golgi apparatus, where it is packaged and secreted by exocytosis.
    2. Outside the cell
      1. Registration peptides are cleaved and tropocollagen is formed by procollagen peptidase.
      2. Multiple tropocollagen molecules form collagen fibrils, via covalent cross-linking (aldol reaction) by lysyl oxidase which links hydroxylysine and lysine residues. Multiple collagen fibrils form into collagen fibers.
      3. Collagen may be attached to cell membranes via several types of protein, including fibronectin, laminin, fibulin and integrin.

    Figure 2.31. Synthesis of Collagen Type I. Polypeptide chains are synthesized in the endoplasmic reticulum and released into the lumen where they are hydroxylated and glycosylated. The procollagen triple helix is formed and transported through the golgi apparatus where it is further processed. Procollagen is secreted into the extracellular matrix where it is cleaved into tropocollagen. Tropocollagen assembles into a collagen fibril where crosslinking and hydrogen bonding occur to form the final collagen fiber.

    Image modified from: E.V. Wong and Encyclopedia Britannica


    Vitamin C deficiency causes scurvy, a serious and painful disease in which defective collagen prevents the formation of strong connective tissue. Gums deteriorate and bleed, with loss of teeth; skin discolors, and wounds do not heal. Prior to the 18th century, this condition was notorious among long-duration military, particularly naval, expeditions during which participants were deprived of foods containing vitamin C.

    An autoimmune disease such as lupus erythematosus or rheumatoid arthritis may attack healthy collagen fibers. Cortisol stimulates degradation of collagen into amino acids, suggesting that stress can worsen these disease states.

    Many bacteria and viruses secrete virulence factors, such as the enzyme collagenase, which destroys collagen or interferes with its production.

    Globular Proteins

    Globular proteins or spheroproteins are spherical ("globe-like") proteins and are one of the common protein types. Globular proteins are somewhat water-soluble (forming colloids in water), unlike the fibrous or membrane proteins. There are multiple fold classes of globular proteins, since there are many different architectures that can fold into a roughly spherical shape.

    The term globin can refer more specifically to proteins including the globin fold. The globin fold is a common three-dimensional fold in proteins and defines the globin-like protein superfamily (Figure 2.32). This fold typically consists of eight alpha helices, although some proteins have additional helix extensions at their termini. The globin fold is found in its namesake globin protein families: hemoglobins and myoglobins, as well as in phycocyanins. Because myoglobin was the first protein whose structure was solved, the globin fold was thus the first protein fold discovered. Since the globin fold contains only helices, it is classified as an all-alpha protein fold.

    Figure 2.32 The Globin Fold. (A) An example of the globin fold, the oxygen-carrying protein myoglobin (PBD ID 1MBA) from the mollusc Aplysia limacina. (B) Structure of the tetrameric hemoglobin protein containing a total of four globin folds.

    Image A by: Wikipedia Image B by: Zephyris


    The term globular protein is quite old (dating probably from the 19th century) and is now somewhat archaic given the hundreds of thousands of proteins and more elegant and descriptive structural motif vocabulary. The spherical structure is induced by the protein's tertiary structure. The molecule's apolar (hydrophobic) amino acids are bounded towards the molecule's interior whereas polar (hydrophilic) amino acids are bound outwards, allowing dipole-dipole interactions with the solvent, which explains the molecule's solubility.

    Unlike fibrous proteins which play a predominant structural function, globular proteins can act as:

    • Enzymes, by catalyzing organic reactions taking place in the organism in mild conditions and with a great specificity. Different esterases fulfill this role.
    • Messengers, by transmitting messages to regulate biological processes. This function is done by hormones, i.e. insulin etc.
    • Transporters of other molecules through membranes
    • Stocks of amino acids.
    • Regulatory roles are also performed by globular proteins rather than fibrous proteins.
    • Structural proteins, e.g., actin and tubulin, which are globular and soluble as monomers, but polymerize to form long, stiff fibers

    Many of the proteins that will be detailed in later chapters will fall into this class of proteins.

    Membrane Proteins

    Membrane proteins are proteins that are part of, or interact with, biological membranes. They include: 1) integral membrane proteins, which are part of or permanently anchored to the membrane, and 2) peripheral membrane proteins, which are attached temporarily to the membrane via integral proteins or the lipid bilayer. The integral membrane proteins are further classified as transmembrane proteins that span across the membrane, or integral monotopic proteins, which are to attached to only one side of the membrane.

    Membrane proteins, like soluble globular proteins, fibrous proteins, and disordered proteins, are common. Symbolic of their importance in medicine, membrane proteins are the targets of over 50% of all modern medicinal drugs. It is estimated that 20–30% of all genes in most genomes encode for membrane proteins. Compared to other classes of proteins, determining membrane protein structures remains a challenge in large part due to the difficulty in establishing experimental conditions that can preserve the correct conformation of the protein in isolation from its native environment (Figure 2.33).

    Membrane proteins perform a variety of functions vital to the survival of organisms:

    • Membrane receptor proteins relay signals between the cell's internal and external environments.
    • Transport proteins move molecules and ions across the membrane. They can be categorized according to the Transporter Classification database.
    • Membrane enzymes may have many activities, such as oxidoreductase, transferase or hydrolase.
    • Cell adhesion molecules allow cells to identify each other and interact. For example, proteins involved in immune response.


    Integral membrane proteins are permanently attached to the membrane. Such proteins can be separated from the biological membranes only using detergents, nonpolar solvents, or sometimes denaturing agents. They can be classified according to their relationship with the bilayer:

    • Integral polytopic proteins are transmembrane proteins that span across the membrane more than once. These proteins may have different transmembrane topology. These proteins have one of two structural architectures:
      • helix bundle proteins, which are present in all types of biological membranes;
      • beta barrel proteins, which are found only in outer membranes of Gram-negative bacteria, and outer membranes of mitochondria and chloroplasts.
    • Bitopic proteins are transmembrane proteins that span across the membrane only once. Transmembrane helices from these proteins have significantly different amino acid distributions to transmembrane helices from polytopic proteins.
    • Integral monotopic proteins are integral membrane proteins that are attached to only one side of the membrane and do not span the whole way across.

    Figure 2.34 Schematic representation of the different types of interaction between monotopic membrane proteins and the cell membrane. 1. interaction by an amphipathic α-helix parallel to the membrane plane (in-plane membrane helix) 2. interaction by a hydrophobic loop 3. interaction by a covalently bound membrane lipid (lipidation) 4. electrostatic or ionic interactions with membrane lipids.


    Peripheral membrane proteins are temporarily attached either to the lipid bilayer or to integral proteins by a combination of hydrophobic, electrostatic, and other non-covalent interactions. Peripheral proteins dissociate following treatment with a polar reagent, such as a solution with an elevated pH or high salt concentrations.

    Integral and peripheral proteins may be post-translationally modified, with added fatty acid, diacylglycerol or prenyl chains, or GPI (glycosylphosphatidylinositol), which may be anchored in the lipid bilayer.

    Disordered Proteins

    An intrinsically disordered protein (IDP) is a protein that lacks a fixed or ordered three-dimensional structure (Figure 2.35). IDPs cover a spectrum of states from fully unstructured to partially structured and include random coils, (pre-)molten globules, and large multi-domain proteins connected by flexible linkers. They constitute one of the main types of protein (alongside globular, fibrous and membrane proteins).

    Figure 2.35 Conformational flexibility in SUMO-1 protein (PDB:1a5r). The central part shows relatively ordered structure. Conversely, the N- and C-terminal regions (left and right, respectively) show ‘intrinsic disorder’, although a short helical region persists in the N-terminal tail. Ten alternative NMR models were morphed. Secondary structure elements: α-helices (red), β-strands (blue arrows).

    Image by: Lukasz Kozlowski


    The discovery of IDPs has challenged the traditional protein structure paradigm, that protein function depends on a fixed three-dimensional structure. This dogma has been challenged over the last twenty years by increasing evidence from various branches of structural biology, suggesting that protein dynamics may be highly relevant for such systems. Despite their lack of stable structure, IDPs are a very large and functionally important class of proteins. In some cases, IDPs can adopt a fixed three-dimensional structure after binding to other macromolecules. Overall, IDPs are different from structured proteins in many ways and tend to have distinct properties in terms of function, structure, sequence, interactions, evolution and regulation.

    In the 1930s -1950s, the first protein structures were solved by protein crystallography. These early structures suggested that a fixed three-dimensional structure might be generally required to mediate biological functions of proteins. When stating that proteins have just one uniquely defined configuration, Mirsky and Pauling did not recognize that Fisher's work would have supported their thesis with his 'Lock and Key' model (1894). These publications solidified the central dogma of molecular biology in that the sequence determines the structure which, in turn, determines the function of proteins. In 1950, Karush wrote about 'Configurational Adaptability' contradicting all the assumptions and research in the 19th century. He was convinced that proteins have more than one configuration at the same energy level and can choose one when binding to other substrates. In the 1960s, Levinthal's paradox suggested that the systematic conformational search of a long polypeptide is unlikely to yield a single folded protein structure on biologically relevant timescales (i.e. seconds to minutes). Curiously, for many (small) proteins or protein domains, relatively rapid and efficient refolding can be observed in vitro. As stated in Anfinsen's Dogma from 1973, the fixed 3D structure of these proteins is uniquely encoded in its primary structure (the amino acid sequence), is kinetically accessible and stable under a range of (near) physiological conditions, and can therefore be considered as the native state of such "ordered" proteins.

    During the subsequent decades, however, many large protein regions could not be assigned in x-ray datasets, indicating that they occupy multiple positions, which average out in electron density maps. The lack of fixed, unique positions relative to the crystal lattice suggested that these regions were "disordered". Nuclear magnetic resonance spectroscopy of proteins also demonstrated the presence of large flexible linkers and termini in many solved structural ensembles. It is now generally accepted that proteins exist as an ensemble of similar structures with some regions more constrained than others. Intrinsically Unstructured Proteins (IUPs) occupy the extreme end of this spectrum of flexibility, whereas IDPs also include proteins of considerable local structure tendency or flexible multidomain assemblies. These highly dynamic disordered regions of proteins have subsequently been linked to functionally important phenomena such as allosteric regulation and enzyme catalysis.

    Many disordered proteins have the binding affinity with their receptors regulated by post-translational modification, thus it has been proposed that the flexibility of disordered proteins facilitates the different conformational requirements for binding the modifying enzymes as well as their receptors. Intrinsic disorder is particularly enriched in proteins implicated in cell signaling, transcription and chromatin remodeling functions.

    Flexible linkers

    Disordered regions are often found as flexible linkers or loops connecting domains. Linker sequences vary greatly in length but are typically rich in polar uncharged amino acids. Flexible linkers allow the connecting domains to freely twist and rotate to recruit their binding partners via protein domain dynamics. They also allow their binding partners to induce larger scale conformational changes by long-range allostery.

    Linear motifs

    Linear motifs are short disordered segments of proteins that mediate functional interactions with other proteins or other biomolecules (RNA, DNA, sugars etc.). Many roles of linear motifs are associated with cell regulation, for instance in control of cell shape, subcellular localisation of individual proteins and regulated protein turnover. Often, post-translational modifications such as phosphorylation tune the affinity (not rarely by several orders of magnitude) of individual linear motifs for specific interactions. Unlike globular proteins IDPs do not have spatially-disposed active pockets. Nevertheless, in 80% of IDPs (~3 dozens) subjected to detailed structural characterization by NMR there are linear motifs termed PreSMos (pre-structured motifs) that are transient secondary structural elements primed for target recognition. In several cases it has been demonstrated that these transient structures become full and stable secondary structures, e.g., helices, upon target binding. Hence, PreSMos are the putative active sites in IDPs.

    Coupled folding and binding

    Many unstructured proteins undergo transitions to more ordered states upon binding to their targets. The coupled folding and binding may be local, involving only a few interacting residues, or it might involve an entire protein domain. It was recently shown that the coupled folding and binding allows the burial of a large surface area that would be possible only for fully structured proteins if they were much larger. Moreover, certain disordered regions might serve as "molecular switches" in regulating certain biological function by switching to ordered conformation upon molecular recognition like small molecule-binding, DNA/RNA binding, ion interactions.

    Disorder in the bound state (fuzzy complexes)

    Intrinsically disordered proteins can retain their conformational freedom even when they bind specifically to other proteins. The structural disorder in bound state can be static or dynamic. In fuzzy complexes structural multiplicity is required for function and the manipulation of the bound disordered region changes activity. The conformational ensemble of the complex is modulated via post-translational modifications or protein interactions. Specificity of DNA binding proteins often depends on the length of fuzzy regions, which is varied by alternative splicing. Intrinsically disordered proteins adapt many different structures in vivo according to the cell's conditions, creating a structural or conformational ensemble.

    Therefore, their structures are strongly function-related. However, only few proteins are fully disordered in their native state. Disorder is mostly found in intrinsically disordered regions (IDRs) within an otherwise well-structured protein. The term intrinsically disordered protein (IDP) therefore includes proteins that contain IDRs as well as fully disordered proteins.

    The existence and kind of protein disorder is encoded in its amino acid sequence. In general, IDPs are characterized by a low content of bulky hydrophobic amino acids and a high proportion of polar and charged amino acids, usually referred to as low hydrophobicity. This property leads to good interactions with water. Furthermore, high net charges promote disorder because of electrostatic repulsion resulting from equally charged residues. Thus disordered sequences cannot sufficiently bury a hydrophobic core to fold into stable globular proteins. In some cases, hydrophobic clusters in disordered sequences provide the clues for identifying the regions that undergo coupled folding and binding (refer to biological roles).

    Many disordered proteins reveal regions without any regular secondary structure These regions can be termed as flexible, compared to structured loops. While the latter are rigid and contain only one set of Ramachandran angles, IDPs involve multiple sets of angles. The term flexibility is also used for well-structured proteins, but describes a different phenomenon in the context of disordered proteins. Flexibility in structured proteins is bound to an equilibrium state, while it is not so in IDPs. Many disordered proteins also reveal low complexity sequences, i.e. sequences with over-representation of a few residues. While low complexity sequences are a strong indication of disorder, the reverse is not necessarily true, that is, not all disordered proteins have low complexity sequences. Disordered proteins have a low content of predicted secondary structure.

    References

    OpenStax, Proteins. OpenStax CNX. Sep 30, 2016 http://cnx.org/contents/bf17f4df-605c-4388-88c2-25b0f000b0ed@2.

    File:Chirality with hands.jpg. (2017, September 16). Wikimedia Commons, the free media repository. Retrieved 17:34, July 10, 2019 from commons.wikimedia.org/w/index.php?title=File:Chirality_with_hands.jpg&oldid=258750003.

    Wikipedia contributors. (2019, July 6). Zwitterion. In Wikipedia, The Free Encyclopedia. Retrieved 21:48, July 10, 2019, from en.Wikipedia.org/w/index.php?title=Zwitterion&oldid=905089721

    Wikipedia contributors. (2019, July 8). Absolute configuration. In Wikipedia, The Free Encyclopedia. Retrieved 15:28, July 14, 2019, from en.Wikipedia.org/w/index.php?title=Absolute_configuration&oldid=905412423

    Structural Biochemistry/Enzyme/Active Site. (2019, July 1). Wikibooks, The Free Textbook Project. Retrieved 16:55, July 16, 2019 from en.wikibooks.org/w/index.php?title=Structural_Biochemistry/Enzyme/Active_Site&oldid=3555410.

    Structural Biochemistry/Proteins. (2019, March 24). Wikibooks, The Free Textbook Project. Retrieved 19:16, July 18, 2019 from en.wikibooks.org/w/index.php?title=Structural_Biochemistry/Proteins&oldid=3529061.

    Fujiwara, K., Toda, H., and Ikeguchi, M. (2012) Dependence of a α-helical and β-sheet amino acid propensities on teh overall protein fold type. BMC Structural Biology 12:18. Available at: https://bmcstructbiol.biomedcentral.com/track/pdf/10.1186/1472-6807-12-18

    Wikipedia contributors. (2019, July 16). Keratin. In Wikipedia, The Free Encyclopedia. Retrieved 17:50, July 19, 2019, from en.Wikipedia.org/w/index.php?title=Keratin&oldid=906578340

    Wikipedia contributors. (2019, July 13). Alpha-keratin. In Wikipedia, The Free Encyclopedia. Retrieved 18:17, July 19, 2019, from en.Wikipedia.org/w/index.php?title=Alpha-keratin&oldid=906117410

    Open Learning Initiative. (2019) Integumentary Levels of Organization. Carnegie Mellon University. In Anatomy & Physiology. Available at: https://oli.cmu.edu/jcourse/webui/syllabus/module.do?context=4348901580020ca6010f804da8baf7ba.

    Wikipedia contributors. (2019, July 16). Collagen. In Wikipedia, The Free Encyclopedia. Retrieved 03:42, July 20, 2019, from en.Wikipedia.org/w/index.php?title=Collagen&oldid=906509954

    Wikipedia contributors. (2019, July 2). Rossmann fold. In Wikipedia, The Free Encyclopedia. Retrieved 16:01, July 20, 2019, from https://en.Wikipedia.org/w/index.php?title=Rossmann_fold&oldid=904468788

    Wikipedia contributors. (2019, May 30). TIM barrel. In Wikipedia, The Free Encyclopedia. Retrieved 16:46, July 20, 2019, from en.Wikipedia.org/w/index.php?title=TIM_barrel&oldid=899459569

    Wikipedia contributors. (2019, July 16). Protein folding. In Wikipedia, The Free Encyclopedia. Retrieved 18:30, July 20, 2019, from https://en.Wikipedia.org/w/index.php?title=Protein_folding&oldid=906604145

    Wikipedia contributors. (2019, June 11). Globular protein. In Wikipedia, The Free Encyclopedia. Retrieved 18:49, July 20, 2019, from en.Wikipedia.org/w/index.php?title=Globular_protein&oldid=901360467

    Wikipedia contributors. (2019, July 11). Intrinsically disordered proteins. In Wikipedia, The Free Encyclopedia. Retrieved 19:52, July 20, 2019, from en.Wikipedia.org/w/index.php?title=Intrinsically_disordered_proteins&oldid=905782287