Protein structure is commonly presented in a hierarchical manner. While this is an over-simplification, it is a good place to start. When we think about how a polypeptide folds, we have to think about the environment it will inhabit, how it interacts with itself and with other polypeptides. In a protein composed of multiple polypeptides, we need to consider how it comes to interact with those other polypeptides (often termed subunits). As we think about polypeptide structure it is common to see the terms primary, secondary, tertiary, and quaternary structure. The primary structure of a polypeptide is the sequence of amino acids along the polypeptide chain, written from its N- or amino terminus to its C- or carboxyl terminus. As we will see below, the secondary structure of a polypeptide consists of local folding motifs: the α-heIix, the β-sheet, and connecting domains. The tertiary structure of a polypeptide is the overall three dimensional shape a polypeptide takes in space (as well as how its R-chains are oriented). Quaternary structure refers to how the various polypeptides and co-factors combine and are arranged to form a functional protein. In a protein that consists of a single polypeptide and no co-factors, tertiary and quaternary structures are the same. As a final complexity, a particular polypeptide can be part of a number of different proteins. This is one way in which a gene can play a role in a number of different processes and be involved in the generation of number of different phenotypes.
246. If the polypeptide is part of a multi-subunit protein, it must also "find" its correct partner polypeptides, which again is a stochastic process. If the polypeptide does not fold correctly, it will not function correctly and may even damage the cell or the organism. A number of degenerative neurological disorders are due, at least in part, to the accumulation of misfolded polypeptides (see below).
We can think of the folding process as a “drunken” walk across an energy landscape, with movements driven by intermolecular interactions and collisions with other molecules. The successful goal of this process is to find the lowest point in the landscape, the energy minimum of the system. This is generally assumed to be the native or functional state of the polypeptide. That said, this native state is not necessarily static, since the folded polypeptide (and the final protein) will be subject to thermal fluctuations; it is possible that it will move between various states with similar, but not identical stabilities. The challenge to calculating the final folded state of a polypeptide is that it is a extremely complex problem. Generally two approaches are taken to characterizing the structure of a functional protein. In the first the structure of the protein is determined directly by X-ray crystallography or Nuclear Magnetic Resonance spectroscopy. In the second, if the structure of a homologous protein is known (and we will consider homologous proteins later on), it can be used as a framework to model the structure of a previously unsolved protein.
There are a number of constraints that influence the folding of a polypeptide. The first is the peptide bond itself. All polypeptides contain a string of peptide bonds. It is therefore not surprising that there are common patterns in polypeptide folding. The first of these common patterns to be recognized, the α-heIix, was discovered by Linus Pauling and Robert Corey in 1951. This was followed shortly thereafter by their description of the β-sheet. The forces that drive the formation of the α-helix and the β-sheet will be familiar. They are the same forces that underlie water structure.
In an α-helix and a β-sheet, all of the possible H-bonds involving the peptide bond's donor and acceptor groups (–N–H : O=C– with “:” indicating a H-bond) are formed within the polypeptide. In the α-helix these H-bond interactions run parallel to the polypeptide chain. In the β-sheet they occur between polypeptide chains. The interacting strands within a β-sheet can run parallel or anti-parallel to one another, and can in occur within a single polypeptide chain or between different polypeptide chains. In an α-helix, the R-groups point outward from the helix axis. In β-sheets the R-groups point in an alternating manner either above or below the sheet. While all amino acids can take part in either α-helix or β-sheet structures, the imino acid proline cannot - the N-group coming off the α-carbon has no H, so its presence in a polypeptide chain leads to a break in the pattern of intrachain H-bonds. It is worth noting that some polypeptides can adopt functionally different structures: for example in one form (PrPC) the prion protein contain a high level of α-helix (42%) and essentially no β-sheet (3%), while an alternative form (PrPSc), associated with the disease scrapiecontains high levels of β-sheet (43%) and 30% α-helix (see below)247.
Peptide bond rotation and proline: Although drawn as a single bond, the peptide bond behaves more like a double bond, or rather like a bond and a half. In the case of a single bond, there is free rotation around the bond axis in response to molecular collisions. In contrast, rotation around a peptide bond requires more energy to move from the trans to the cis configuration and back again, that is, it is more difficult to rotate around the peptide bond because it involves the partial breakage of the bond. In addition, in the cis configuration the R groups of adjacent amino acids are on the same side of the polypeptide chain. If these R groups are both large they can bump into each other. If they get too close they will repel each other. The result is that usually the polypeptide chain will be in the trans arrangement. In both α-helix and β-sheet configurations, the peptide bonds are in the trans configuration because the cis configuration disrupts their regular organization.
Peptide bonds involving a proline residue have a different problem. The amino group is “locked” into a particular shape by the ring and therefore inherently destabilizes both α-helix and β-sheet structures (see above). In addition, peptides bonds involving prolines are found in the cis configuration ~100 times as often as those between other amino acids. This cis configuration leads to a bend or kink in the polypeptide chain. The energy involved in the rotation around peptide bond involving a proline is much higher than that of a standard peptide bond; so high, in fact, that there are protein catalysts, peptidyl proline isomerases, that facilitate the cis-trans rotation.
Hydrophobic R-groups: Many polypeptides and proteins exist primarily in an aqueous (water-based) environment. Yet, a number of their amino acid R-groups are hydrophobic. That means that their interactions with water will decrease the entropy of the system, by leading to the organization of water molecules around the hydrophobic group, a thermodynamically unfavorable situation. This is very much like the process that drives the assembly of lipids into micelles and bilayers. A typical polypeptide, with hydrophobic R groups along its length will, in aqueous solution, tend to collapse onto itself so as to minimize (although not always completely eliminate) the interactions of its hydrophobic residues with water. In practice this means that the first step in the folding of a newly synthesized polypeptide is, generally to collapse the polypeptide so that the majority of its hydrophobic R groups are located internally, out of contact with water. In contrast, where there are no (or few) hydrophobic R groups in the polypeptide, the polypeptide will tend to adopt an extended configuration. On the other hand, if a protein comes to be embedded within a membrane (we will consider how this occurs later on), then the hydrophobic R-groups will tend to be located on the surface of the folded polypeptide that interacts with the hydrophobic interior of the lipid bilayer. Hopefully this makes sense to you, thermodynamically.
The path to the native (that is, most stable, functional) state is not necessarily a smooth or predetermined one. The folding polypeptide can get "stuck" in a local energy minimum; there may not be enough energy (derived from thermal collisions) for it to get out again. If a polypeptide gets stuck, structurally, there are active mechanisms to unfold it and let the process leading to the native state proceed again. This process of partial unfolding is carried out by proteins known as chaperones. An important point to recognize; chaperones do not determine the native state of a polypeptide. There are many types of protein chaperones; some interact with specific polypeptides as they are synthesized and attempt to keep them from getting into trouble, that is, folding in an unproductive way. Others can recognize inappropriately folded polypeptides and, through coupling to ATP hydrolysis, catalyze the unfolding of the polypeptide, allowing the polypeptide a second (or third or ... ) chance to fold correctly. In the “simple” eukaryote, the yeast Saccharomyces cerevisiae, there are at least 63 distinct molecular chaperones248.
By now you might be asking yourself, how do chaperones recognize unfolded or abnormally folded proteins? Well unfolded proteins will tend to have hydrophobic amino acid side chains exposed on their surface. Because of that they will also tend to aggregate. Chaperones recognize and interact with surface hydrophobic regions.
Acidic and basic R-groups: Some amino acid R-groups contain carboxylic acid or amino groups and so act as weak acids and bases. Depending on the pH of their environment these groups may be uncharged, positively charged, or negatively charged. Whether a group is charged or uncharged can have a dramatic effect on the structure, and therefore the activity, of a protein. By regulating pH, an organism can modulate the activity of specific proteins. There are, in fact, compartments within eukaryotic cells that are maintained at low pH in part to regulate protein structure and activity. In particular, it is common for the internal regions of vesicles associated with endocytosis to become acidic (through the ATP-dependent pumping of H+ across their membrane), which in turn activates a number of enzymes (located within the vesicle) involved in the hydrolysis of proteins and nucleic acids.
Subunits and prosthetic groups: Now you might find yourself asking yourself, if most proteins are composed of multiple polypeptides, but polypeptides are synthesized individually, how are proteins assembled in a cytoplasm crowded with other proteins and molecules? This is a process that often involves specific chaperone proteins that bind to a newly synthesized polypeptide and either stabilizes its folding, or hold it until it interacts with the other polypeptides to form the final, functional protein. The absence of appropriate chaperones can make it difficult to assemble multisubunit proteins into functional proteins in vitro.
Many functional proteins also contain non-amino acid-based components, known generically as co-factors. A protein minus its cofactors is known as an apoprotein. Together with its cofactors, it is known as a holoprotein. Generally, without its cofactors, a protein is inactive and often unstable. Cofactors can range in complexity from a single metal ion to quite complex molecules, such as vitaminB12. The retinal group of bacteriorhodopsin and the heme group (with its central iron ion) are co-factors. In general, co-factors are synthesized by various anabolic pathways, and so they represent the activities of a number of genes. So a functional protein can be the direct product of a single gene, many genes, or (indirectly) entire metabolic pathways. At the same time, the formation of protein can be dependent upon chaperones, which are themselves projects of other genes.