Protein structure is commonly presented in a hierarchical manner. While this is an over-simplification, it is a good place to start. When we think about how a polypeptide folds, we have to think about the environment it will inhabit, how it interacts with itself and with other polypeptides. In a protein composed of multiple polypeptides, we need to consider how it comes to interact with those other polypeptides (often termed subunits). As we think about polypeptide structure it is common to see the terms primary, secondary, tertiary, and quaternary structure. The primary structure of a polypeptide is the sequence of amino acids along the polypeptide chain, written from its N- or amino terminus to its C- or carboxyl terminus. As we will see below, the secondary structure of a polypeptide consists of local folding motifs: the α-heIix, the β-sheet, and connecting domains. The tertiary structure of a polypeptide is the overall three dimensional shape a polypeptide takes in space (as well as how its R-chains are oriented). Quaternary structure refers to how the various polypeptides and co-factors combine and are arranged to form a functional protein. In a protein that consists of a single polypeptide and no co-factors, tertiary and quaternary structures are the same. As a final complexity, a particular polypeptide can be part of a number of different proteins. This is one way in which a gene can play a role in a number of different processes and be involved in the generation of number of different phenotypes.
Polypeptide synthesis (translation), like most all processes that occur within the cell, is a stochastic process, meaning that it is based on random collisions between molecules. In the specific case of translation, the association of the mRNA with ribosomal components occurs stochastically; similarly, the addition of a new amino acid depends on the collision of the appropriate amino acid-charged tRNA with the RNA-ribosome complex. Since there are many different amino-acid charged tRNAs in the cytoplasm, the ribosomal complex must productively bind only the amino-acyl-tRNA that the mRNA specifies, that is the tRNA with the right anticodon. This enables its attached amino acid to interact productively, leading to the addition of the amino acid to C-terminus of the growing polypeptide chain. In most illustrations of polypeptide synthesis, you rarely see this fact illustrated. From 12 to 21 amino acids are added per second in bacterial cells (and about half that rate in mammalian cells)244.
Now you might wonder whether there are errors in polypeptide synthesis as there are in nucleic acid synthesis. In fact there are. For example, if a base is skipped by the ribosomal system, the reading frame will be thrown off. Typically, this leads to a completely different sequence of amino acids added to the end of the polypeptide (down-stream of the skip) and generally quickly leads to a stop codon, which terminates translation, leading to the release of a polypeptide that cannot fold correctly and is (generally) rapidly degraded245. Similarly, if the wrong amino acid is inserted at a particular position and it disrupts normal folding, the polypeptide could be degraded. What limits the effects of mistakes during translation is that most proteins (unlike DNA molecules) have finite and relatively short half-lives; that is, the time an average polypeptide exists before it is degraded by various enzymes. Normally (but not always) this limits the damage that a mis-translated polypeptide can do to the cell and organism.
Factors influencing polypeptide folding and structure: Polypeptides are synthesized, and they fold, in a vectorial, that is, directional manner. Synthesis occurs in an N- to C- terminal direction and the newly synthesized polypeptide exits the ribosome through a 10 nm long and 1.5 nm in diameter tunel. This tunnel is narrow enough to block the folding of the newly synthesized polypeptide chain. As the polypeptide emerges from the tunnel it begins to fold. At the same time it encounters the crowded cytoplasmic environment;the newly synthesized polypeptide needs to avoid low affinity, non-specific, and non-physiologically significant interactions with other cellular components246. If the polypeptide is part of a multi-subunit protein, it must also "find" its correct partner polypeptides, which again is a stochastic process. If the polypeptide does not fold correctly, it will not function correctly and may even damage the cell or the organism. A number of degenerative neurological disorders are due, at least in part, to the accumulation of misfolded polypeptides (see below).
We can think of the folding process as a “drunken” walk across an energy landscape, with movements driven by intermolecular interactions and collisions with other molecules. The successful goal of this process is to find the lowest point in the landscape, the energy minimum of the system. This is generally assumed to be the native or functional state of the polypeptide. That said, this native state is not necessarily static, since the folded polypeptide (and the final protein) will be subject to thermal fluctuations; it is possible that it will move between various states with similar, but not identical stabilities. The challenge to calculating the final folded state of a polypeptide is that it is a extremely complex problem. Generally two approaches are taken to characterizing the structure of a functional protein. In the first the structure of the protein is determined directly by X-ray crystallography or Nuclear Magnetic Resonance spectroscopy. In the second, if the structure of a homologous protein is known (and we will consider homologous proteins later on), it can be used as a framework to model the structure of a previously unsolved protein.
There are a number of constraints that influence the folding of a polypeptide. The first is the peptide bond itself. All polypeptides contain a string of peptide bonds. It is therefore not surprising that there are common patterns in polypeptide folding. The first of these common patterns to be recognized, the α-heIix, was discovered by Linus Pauling and Robert Corey in 1951. This was followed shortly thereafter by their description of the β-sheet. The forces that drive the formation of the α-helix and the β-sheet will be familiar. They are the same forces that underlie water structure.
In an α-helix and a β-sheet, all of the possible H-bonds involving the peptide bond's donor and acceptor groups (–N–H : O=C– with “:” indicating a H-bond) are formed within the polypeptide. In the α-helix these H-bond interactions run parallel to the polypeptide chain. In the β-sheet they occur between polypeptide chains. The interacting strands within a β-sheet can run parallel or anti-parallel to one another, and can in occur within a single polypeptide chain or between different polypeptide chains. In an α-helix, the R-groups point outward from the helix axis. In β-sheets the R-groups point in an alternating manner either above or below the sheet. While all amino acids can take part in either α-helix or β-sheet structures, the imino acid proline cannot - the N-group coming off the α-carbon has no H, so its presence in a polypeptide chain leads to a break in the pattern of intrachain H-bonds. It is worth noting that some polypeptides can adopt functionally different structures: for example in one form (PrPC) the prion protein contain a high level of α-helix (42%) and essentially no β-sheet (3%), while an alternative form (PrPSc), associated with the disease scrapiecontains high levels of β-sheet (43%) and 30% α-helix (see below)247.
Peptide bond rotation and proline: Although drawn as a single bond, the peptide bond behaves more like a double bond, or rather like a bond and a half. In the case of a single bond, there is free rotation around the bond axis in response to molecular collisions. In contrast, rotation around a peptide bond requires more energy to move from the trans to the cis configuration and back again, that is, it is more difficult to rotate around the peptide bond because it involves the partial breakage of the bond. In addition, in the cis configuration the R groups of adjacent amino acids are on the same side of the polypeptide chain. If these R groups are both large they can bump into each other. If they get too close they will repel each other. The result is that usually the polypeptide chain will be in the trans arrangement. In both α-helix and β-sheet configurations, the peptide bonds are in the trans configuration because the cis configuration disrupts their regular organization.
Peptide bonds involving a proline residue have a different problem. The amino group is “locked” into a particular shape by the ring and therefore inherently destabilizes both α-helix and β-sheet structures (see above). In addition, peptides bonds involving prolines are found in the cis configuration ~100 times as often as those between other amino acids. This cis configuration leads to a bend or kink in the polypeptide chain. The energy involved in the rotation around peptide bond involving a proline is much higher than that of a standard peptide bond; so high, in fact, that there are protein catalysts, peptidyl proline isomerases, that facilitate the cis-trans rotation.
Hydrophobic R-groups: Many polypeptides and proteins exist primarily in an aqueous (water-based) environment. Yet, a number of their amino acid R-groups are hydrophobic. That means that their interactions with water will decrease the entropy of the system, by leading to the organization of water molecules around the hydrophobic group, a thermodynamically unfavorable situation. This is very much like the process that drives the assembly of lipids into micelles and bilayers. A typical polypeptide, with hydrophobic R groups along its length will, in aqueous solution, tend to collapse onto itself so as to minimize (although not always completely eliminate) the interactions of its hydrophobic residues with water. In practice this means that the first step in the folding of a newly synthesized polypeptide is, generally to collapse the polypeptide so that the majority of its hydrophobic R groups are located internally, out of contact with water. In contrast, where there are no (or few) hydrophobic R groups in the polypeptide, the polypeptide will tend to adopt an extended configuration. On the other hand, if a protein comes to be embedded within a membrane (we will consider how this occurs later on), then the hydrophobic R-groups will tend to be located on the surface of the folded polypeptide that interacts with the hydrophobic interior of the lipid bilayer. Hopefully this makes sense to you, thermodynamically.
The path to the native (that is, most stable, functional) state is not necessarily a smooth or predetermined one. The folding polypeptide can get "stuck" in a local energy minimum; there may not be enough energy (derived from thermal collisions) for it to get out again. If a polypeptide gets stuck, structurally, there are active mechanisms to unfold it and let the process leading to the native state proceed again. This process of partial unfolding is carried out by proteins known as chaperones. An important point to recognize; chaperones do not determine the native state of a polypeptide. There are many types of protein chaperones; some interact with specific polypeptides as they are synthesized and attempt to keep them from getting into trouble, that is, folding in an unproductive way. Others can recognize inappropriately folded polypeptides and, through coupling to ATP hydrolysis, catalyze the unfolding of the polypeptide, allowing the polypeptide a second (or third or ... ) chance to fold correctly. In the “simple” eukaryote, the yeast Saccharomyces cerevisiae, there are at least 63 distinct molecular chaperones248.
chaperone video http://youtu.be/b39698t750c
One class of chaperones are known as “heat shock proteins.” The genes that encode these proteins are activated (expressed) in response to increased temperature (as long as the increase is not so severe that it kills the cell immediately.) At these higher temperatures, the native protein can unfold and misfold, it can denature. Given what you know about polypeptide/protein structure, you should be able to develop a plausible model by which to regulate the expression of heat shock genes. Once expressed, heat shock proteins recognize denatured polypeptides, couple ATP hydrolysis reactions to unfold them, and then then release them giving them another chance to refold correctly.
Heat shock proteins help an organism adapt. In classic experiments, when bacteria were grown at temperatures sufficient to turn on the expression of the genes that encode heat shock proteins, the bacteria had a higher survival rate when re-exposed to elevated temperatures compared to bacteria that had been grown continuously at lower temperature. Heat shock response-mediated survival at higher temperatures is an example of the ability of an organism to adapt to its environment - it is a physiological response. The presence of the heat shock system itself, however, is likely to be a selectable trait, encouraged by temperature variation in the environment. It is the result of evolutionary factors.
By now you might be asking yourself, how do chaperones recognize unfolded or abnormally folded proteins? Well unfolded proteins will tend to have hydrophobic amino acid side chains exposed on their surface. Because of that they will also tend to aggregate. Chaperones recognize and interact with surface hydrophobic regions.
Acidic and basic R-groups: Some amino acid R-groups contain carboxylic acid or amino groups and so act as weak acids and bases. Depending on the pH of their environment these groups may be uncharged, positively charged, or negatively charged. Whether a group is charged or uncharged can have a dramatic effect on the structure, and therefore the activity, of a protein. By regulating pH, an organism can modulate the activity of specific proteins. There are, in fact, compartments within eukaryotic cells that are maintained at low pH in part to regulate protein structure and activity. In particular, it is common for the internal regions of vesicles associated with endocytosis to become acidic (through the ATP-dependent pumping of H+ across their membrane), which in turn activates a number of enzymes (located within the vesicle) involved in the hydrolysis of proteins and nucleic acids.
Subunits and prosthetic groups: Now you might find yourself asking yourself, if most proteins are composed of multiple polypeptides, but polypeptides are synthesized individually, how are proteins assembled in a cytoplasm crowded with other proteins and molecules? This is a process that often involves specific chaperone proteins that bind to a newly synthesized polypeptide and either stabilizes its folding, or hold it until it interacts with the other polypeptides to form the final, functional protein. The absence of appropriate chaperones can make it difficult to assemble multisubunit proteins into functional proteins in vitro.
Many functional proteins also contain non-amino acid-based components, known generically as co-factors. A protein minus its cofactors is known as an apoprotein. Together with its cofactors, it is known as a holoprotein. Generally, without its cofactors, a protein is inactive and often unstable. Cofactors can range in complexity from a single metal ion to quite complex molecules, such as vitaminB12. The retinal group of bacteriorhodopsin and the heme group (with its central iron ion) are co-factors. In general, co-factors are synthesized by various anabolic pathways, and so they represent the activities of a number of genes. So a functional protein can be the direct product of a single gene, many genes, or (indirectly) entire metabolic pathways. At the same time, the formation of protein can be dependent upon chaperones, which are themselves projects of other genes.
Questions to answer & to ponder
•How does entropy drive protein folding and assembly?
•Why does it matter that rotation around a peptide bond is constrained?
•How might changing the pH of a solution alter a protein's structure and activity?
•What happens to a typical protein if you place it in a hydrophobic solvent?
•What would be your prediction for the structure of a polypeptide if all of its R-groups were hydrophilic?
•How might a chaperone recognize a misfolded polypeptide?
•How would a chaperone facilitate the assembly of a protein composed of multiple polypeptides?
•Summarize the differences in structure between a protein that is soluble in the cytoplasm and one that is buried in the membrane.
•Why might proteins that require co-factors misfold in the absence of the co-factor?
•How might surface hydrophobic R-groups facilitate protein-protein interactions.
•Suggest a reason why cofactors would be necessary in biological systems (proteins)?
•Map the ways that a mutation in a gene encoding a chaperone could influence a cell or organism?
245 Quality control by the ribosome following peptide bond formation: http://www.ncbi.nlm.nih.gov/pubmed/19092806
246Remember, all molecules interact with each other via van der Waals interactions.
247 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC47901/ and prion disease: https://en.wikipedia.org/wiki/Prion
248 An atlas of chaperone–protein interactions in Saccharomyces cerevisiae: implications to protein folding pathways in the cell: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2710862/