The process of translation in biology is the decoding an mRNA message into a polypeptide product. Put another way, a message written in the chemical language of nucleotides is "translated" into the chemical language of amino acids. Amino acids are linearly strung together via covalent bonds (called peptide bonds) between amino and carboxyl termini of adjacent amino acids. The sequential polymerization of amino acids, in a strict order determined by the sequence of an mRNA, is catalyzed by a ribonucleoprotein complex called the ribosome working with a decoding "keys" termed charged tRNAs.
The resulting proteins are so important to the cell that their synthesis consumes more of a cell’s energy than any other metabolic process. Like DNA replication and transcription, translation is a complex molecular process that we can approach using both the Energy Story and Design Challenge rubrics. Describing the overall process, or steps in the process, requires the accounting of the matter and energy before the process and after the process and a description of how that matter is transformed and energy transferred during the process. From a Design Challenge standpoint, we can - even before digging any further into what is or is not understood about translation - try to infer some of the basic questions that we will need to answer regarding this process.
Let us start by considering the basic problem. We have a strand of RNA (called mRNA) and a bunch of amino acids and we need to somehow design a machine that will:
- decode the chemical language of nucleotides into the language of amino acids,
- join amino acids in a very specific squence,
- complete this process with reasonable accuracy, and
- do this at a reasonable speed. Reasonable, is of course defined by natural selection.
As before, we can identify subproblems
- How does our molecular machine determine where and when to start working?
- How does the molecular machine coordinate decoding and bond formations?
- where does the energy for this process come from and how much is used?
- how does the machine know where to stop?
Other questions and functional problems/challenges will certainly arise as we dig deeper.
The point, as always, is that even without knowing any specifics about translation we can use our imagination, curiosity and common sense to come up with some requirements for the process. Understanding these questions as the context for what follows is key.
A peptide bond links the carboxyl end of one amino acid with the amino end of another, expelling one water molecule. The R1 and R2 designation refer to side chain of amino acid the two amino acids. Attribution: Marc T. Facciotti (original work).
Protein Synthesis Machinery
The components that go into the process
Many different molecules and macromolecules contribute to the process of translation. While the exact composition of "the players" in the process may vary from species to species - for instance, ribosomes may consist of different numbers of rRNAs (ribosomal RNAs) and polypeptides depending on the organism - the general functions of the protein synthesis machinery are comparable from bacteria to human cells. We focus on these similarities. At a minimum, translation requires an mRNA template, amino acids, ribosomes, tRNAs, an energy source, and various additional accessory enzymes and small molecules.
Reminder: Amino acids
Let us simply recall that the basic structure of amino acids is composed of a backbone composed of an amino group, a central carbon (called the α-carbon), and a carboxyl group. Attached to the α-carbon is a variable group that helps determine some of the chemical properties and reactivity of the amino acid.
A ribosome is a complex macromolecule composed of structural and catalytic rRNAs, and many distinct polypeptides. As we start to try thinking about energy accounting in the cell it is worth noting that ribosomes do not come "free". Even before an mRNA is translated, a cell must invest energy to build each of its ribosomes. In E. coli, there are between 10,000 and 70,000 ribosomes present in each cell at any given time. Each of these ribosomes will be able to produce many proteins.
Ribosomes exist in the cytoplasm in bacteria and archaea and in the cytoplasm and on the rough endoplasmic reticulum in eukaryotes. Mitochondria and chloroplasts also have their own ribosomes in the matrix and stroma, which look more similar to bacterial ribosomes (and have similar drug sensitivities), than the other ribosomes found in the cytoplasm. Ribosomes dissociate into large and small subunits when they are not synthesizing proteins and reassociate during the initiation of translation. In E. coli, the small subunit is described as 30S (S is a unit of size), and the large subunit is 50S. Mammalian ribosomes have a small 40S subunit and a large 60S subunit. The small subunit is responsible for binding the mRNA template, whereas the large subunit sequentially binds tRNAs. Each mRNA molecule is simultaneously translated by many ribosomes, all synthesizing protein in the same direction: reading the mRNA from 5' to 3' and synthesizing the polypeptide from the N terminus to the C terminus. The complete mRNA/poly-ribosome structure is called a polysome.
Question: The ribosome reads the mRNA from 5' to 3', producing a proteins starting at the N terminus, and ending at the C terminus. How about RNA and DNA polymerases? Do they also read their templates from 5' to 3'? In what direction are their products synthesized? Can you think of any reason why ribosomes have evolved to read mRNAs from 5' to 3', rather than the opposite direction?
tRNAs are structural RNA molecules that were transcribed from genes. Depending on the species, 40 to 60 types of tRNAs exist in the cytoplasm. Serving as adaptors, specific tRNAs bind to sequences on the mRNA template and add the corresponding amino acid to the polypeptide chain. Therefore, tRNAs are the molecules that act as individual keys for decoding of particular nucleic acid sequences.
Of the 64 possible mRNA codons—or triplet combinations of A, U, G, and C, three specify the termination of protein synthesis and 61 specify the addition of amino acids to the polypeptide chain. Of these 61, one codon (AUG) also encodes the initiation of translation. Each tRNA anticodon can base pair with one of the mRNA codons and add an amino acid or terminate translation, according to the genetic code. For instance, if the sequence CUA occurred on an mRNA template in the proper reading frame, it would bind a tRNA expressing the complementary sequence, GAU, which would be linked to the amino acid leucine.
Aminoacyl tRNA Synthetases
The process of pre-tRNA synthesis by RNA polymerase III only creates the RNA portion of the adaptor molecule. The corresponding amino acid must be added later, once the tRNA is processed and exported to the cytoplasm. Through the process of tRNA “charging,” each tRNA molecule is linked to its correct amino acid by a group of enzymes called aminoacyl tRNA synthetases. At least one type of aminoacyl tRNA synthetase exists for each of the 20 amino acids; the exact number of aminoacyl tRNA synthetases varies by species. These enzymes first bind and hydrolyze ATP to catalyze a high-energy bond between an amino acid and adenosine monophosphate (AMP); a pyrophosphate molecule is expelled in this reaction. The activated amino acid is then transferred to the tRNA, and AMP is released. tRNA synthetases are the only molecules in the cell that are required to simultaneously read a nucleic acid sequence and recognize a particular amino acid. In a way, decoding (unfortunately the word "translation" is already taken!) is actually happening at the stage of tRNA charging by aminoacyl tRNA synthetases.
The Mechanism of Protein Synthesis
Just as with mRNA synthesis, protein synthesis can be divided into three phases: initiation, elongation, and termination. The process of translation is similar in bacteria, archaea and eukaryotes.
In general, protein synthesis begins with the formation of an initiation complex. The small ribosomal subunit will bind to the mRNA at the ribosomal binding site. Soon after, the methionine-tRNA will bind to the AUG start codon (through complementary binding with its anticodon). This complex is then joined by large ribosomal subunit. This initiation complex then recruits the second tRNA and thus translation begins.
Bacterial vs Eukaryotic initiation
Protein synthesis begins at an AUG (met) codon- but proteins may have many methionines, and mRNAs may have many AUGs. How does the ribosome know where to begin?
In prokayrotic mRNA, a sequence upstream of the first AUG codon, called the Shine-Dalgarno sequence (AGGAGG), base-pairs with a rRNA molecule within the small subunit of bacterial and archeal ribosomes. This interaction anchors the 30S ribosomal subunit at a precise location on the mRNA template. Stop for a moment to appreciate the repetition of a mechanism you've encountered before. In this case, getting a protein complex to associate - in proper register - with a nucleic acid polymer is accomplished by aligning two antiparallel strands of complementary nucleotides with one another. This is quite different from the recognition of a promoter by the RNA polymerase complex- that requires that a protein, not a nucleic acid, bind to a particular double stranded DNA sequence. We have, however, seen the alignment via base-pairing of a protein/RNA complex with a single-stranded DNA sequence before- when we discussed telomerase.
Instead of binding at the Shine-Dalgarno sequence, the eukaryotic initiation complex (a number of proteins in addition to the small subunit) recognizes the 7-methylguanosine cap at the 5' end of the mRNA. Once at the cap, the initiation complex tracks along the mRNA in the 5' to 3' direction, searching for the AUG start codon. Many eukaryotic mRNAs are translated from the first AUG, but this is not always the case. The nucleotides around the AUG affect the probability that it will be chosen as the start codon, and the consensus sequence varies between species. The "helper" proteins of the initiation complex fall off once the large subunit is loaded.
Note that in both cases the selection of an AUG establishes the reading frame (one of a possible 3) for the entire protein. A very important difference between these modes of start-site selection is that a single prokaryotic transcript can potentially encode several sequential proteins, as the ribosome can scan the entire length of the message for Shine-Dagarno sequences. Often the several proteins involved in a single process (subunits of a holoenzyme. or sequential steps in a metabolic pathway) are encoded on a single message. In contrast, in eukaryotic nuclear genes, each transcript only encodes a single protein (as always, there are exceptions).
During translation elongation, the mRNA template provides specificity. As the ribosome moves along the mRNA, each mRNA codon comes into 'view', and specific binding with the corresponding charged tRNA anticodon is ensured. If mRNA were not present in the elongation complex, the ribosome would bind tRNAs nonspecifically. Note again the use of base pairing between two antiparallel strands of complementary nucleotides to bring and keep our molecular machine in register and in this case also to accomplish the job of "translating" between the language of nucleotides and amino acids.
The large ribosomal subunit consists of three compartments: the A site binds incoming charged tRNAs (tRNAs with their attached specific amino acids), the P site binds charged tRNAs carrying amino acids that have formed bonds with the growing polypeptide chain but have not yet dissociated from their corresponding tRNA, and the E site which releases dissociated tRNAs so they can be recharged with another free amino acid.
Elongation proceeds with charged tRNAs entering the A site and then shifting to the P site followed by the E site with each single-codon “step” of the ribosome. We will describe this process here, but we highly recommend that you watch any of the many animated versions of this process- particularly the unrealistically slow ones- such as this one. Ribosomal steps are induced by conformational changes that advance the ribosome by three bases in the 3' direction. The energy for each step of the ribosome is donated by an elongation factor that hydrolyzes GTP. Peptide bonds form between the amino group of the amino acid attached to the A-site tRNA and the carboxyl group of the amino acid attached to the P-site tRNA. The formation of each peptide bond is catalyzed by peptidyl transferase, a catalytic RNA (surprise! not a protein) that is integrated into the 50S ribosomal subunit. The energy for each peptide bond formation is derived from GTP hydrolysis, which is catalyzed by a separate elongation factor. The amino acid bound to the P-site tRNA is linked to the growing polypeptide chain. As the ribosome steps across the mRNA, the former P-site tRNA enters the E site, detaches from the amino acid, and is expelled (it will be recharged by tRNA synthetase later). The ribosome moves along the mRNA, one codon at a time, catalyzing each process that occurs in the three sites. With each step, a charged tRNA enters the complex, the polypeptide becomes one amino acid longer, and an uncharged tRNA departs. This process occurs amazingly rapidly in the cell, the E. coli translation apparatus takes only 0.05 seconds to add each amino acid, meaning that a 200-amino acid polypeptide could be translated in just 10 seconds. This is particularly startling in that charged tRNAs must be diffusing to the A site at random, and the ribosome has to wait for the correct tRNA to arrive!
This velocity also raises the (perhaps childish) question of: in a race, who would win: RNA polymerase or the ribosome? The answer is that both machines move at about the same speed: about 60 nt/sec.
Many antibiotics inhibit bacterial protein synthesis. For example, tetracycline blocks the A site on the bacterial ribosome, and chloramphenicol blocks peptidyl transfer. What specific effect would you expect each of these antibiotics to have on protein synthesis?
The Genetic Code
To summarize what we know to this point, the cellular process of transcription generates messenger RNA (mRNA), a mobile molecular copy of one or more genes with an alphabet of A, C, G, and uracil (U). Translation of the mRNA template converts nucleotide-based genetic information into a protein product. Protein sequences consist of 20 commonly occurring amino acids; therefore, it can be said that the protein alphabet consists of 20 letters. Each amino acid is defined by a three-nucleotide sequence called the triplet codon. The relationship between a nucleotide codon and its corresponding amino acid is called the genetic code. Given the different numbers of “letters” in the mRNA and protein “alphabets,” means that there are a total of 64 (4 × 4 × 4) possible codons; therefore, a given amino acid (20 total) must be encoded for by more than one codon.
Three of the 64 codons terminate protein synthesis and release the polypeptide from the translation machinery. These triplets are called stop codons. Another codon, AUG, also has a special function. In addition to specifying the amino acid methionine, it also serves as the start codon to initiate translation. The reading frame for translation is set by the AUG start codon near the 5' end of the mRNA. The genetic code is universal. With a few exceptions, virtually all species use the same genetic code for protein synthesis, which is powerful evidence that all life on Earth shares a common origin.
Redundant, not Ambiguous
The information in the genetic code is redundant. Multiple codons code for the same amino acid. For example, using the chart above, you can find 4 different codons that code for Valine, likewise, there are two codons that code for Leucine, etc. But the code is not ambiguous, meaning, that if you were given a codon you would know definitively which amino acid it is coding for, a codon will only code for a specific amino acid. For example, GUU will always code for Valine, and AUG will always code for Methionine.
Termination of translation occurs when a stop codon (UAA, UAG, or UGA) is encountered. When the ribosome encounters the stop codon no tRNA enters into the A site. Instead a protein know as a release factor binds to the complex. This interaction destabilizes the translation machinery, causing the release of the polypeptide and the dissociation of the ribosome subunits from the mRNA. After many ribosomes have completed translation, the mRNA is degraded so the nucleotides can be reused in another transcription reaction.
What are the benefits and drawbacks to translating a single mRNA multiple times?
Coupling between Transcription and Translation
As discussed previously, bacteria and archaea do not need to transport their RNA transcripts between a membrane-enclosed nucleus and the cytoplasm. Their RNA polymerase is therefore transcribing RNA directly into the cytoplasm. Here ribosomes can bind to the RNA and begin the process of translation, in some instances while transciption is still occurring. The coupling of these two processes, and even mRNA degradation, is facilitated not only because transcription and translation happen in the same compartment but also because both of the processes happen in the same direction - the RNA transcript is synthesized from 5' to 3' and the transcript is translated from 5' to 3'. This "coupling" of transcription with translation occurs in both bacteria and archaea and is in some instances essential for proper gene expression.
Protein Localization (a quick introduction)
In context of a protein synthesis Design Challenge we can also raise the question/problem of how proteins get to where they are supposed to go. We know that some proteins are destined for the plasma membrane, others in eukaryotic cells need to be directed to various organelles, some proteins, like hormones or nutrient scavenging proteins, are intended to be secreted by cells while others may need to be directed to parts of the cytosol to serve structural roles. How does this happen?
Since various mechanisms have been uncovered, the details of this process are not easily summarized in a brief paragraph or two. However, some key common elements of all mechanisms can be mentioned. First, is the need for a specific "tag" that can provide some molecular information about where the protein of interest is destined. This tag usually takes the form of a short string of amino acids - a so called signal peptide - that can encode information about where the protein is intended to end up. The second required component of the protein sorting machinery must be a system to actually read and sort the proteins. In bacterial and archaeal systems this usually consists of proteins that can identify the signal peptide during translation, bind to it, and direct the synthesis of the nascent protein to the plasma membrane. In eukaryotic systems, localization is by necessity more complex. It could be widely classified into processes requiring the endomembrane system and vesicle-mediated transport, and localization systems that simply rely on diffusion. Both systems rely on the recognition of various signal peptides encoded in the protein. In endomembrane-based localization these signals occur very close to the N-terminus (the beginning) of the protein, as they act to block further synthesis under the nascent protein and it ribosome encounter docking proteins in the rough endoplasmic reticulum. In some cases the signal peptide is cleaved once the protein arrives at its destination compartment.
Some of these specific mechanisms may be discussed by your instructor in class, and more details are available in the reading "Protein Translocaion". The topic is mentioned here because it is sometimes coupled to translation (as just described above).
Post-translational Protein Modification
After translation individual amino acids may be chemically modified. These modifications add chemical variation not present in the genetically encoded amino acids, and new properties that are rooted in the chemistries of the functional groups that are being added. Common modifications include phosphate groups, methyl, acetate, and amide groups. Some proteins, typically targeted to membranes, will be lipidated - a lipid will be added. Other proteins will be glycosylated - a sugar will be added. Another common post-translational modification is cleavage or linking of parts of the protein itself. Signal-peptides may be cleaved, parts may be excised from the middle of the protein, or new covalent linkages may be made between cysteine or other amino acid side chains. Nearly all modifications will be catalyzed by enzymes and all change the functional behavior of the protein.
mRNA is used to synthesize proteins by the process of translation. The genetic code is the correspondence between the three-nucleotide mRNA codon and an amino acid. The genetic code is “translated” by the tRNA molecules, which associate a specific codon with a specific amino acid. The genetic code is degenerate because 64 triplet codons in mRNA specify only 20 amino acids and three stop codons. This means that more than one codon corresponds to an amino acid. Almost every species on the planet uses the same genetic code; the "deviant codes" are not radically different, but change the meaning of one or two codons. More impressive exceptions are species that encode 21 or 22 amino acids, rather than the usual 20.
The players in translation include the mRNA template, ribosomes, tRNAs, and various enzymatic factors. The small ribosomal subunit binds to the mRNA template. Translation begins at the initiating AUG on the mRNA (this also establishes the reading frame). The formation of bonds occurs between sequential amino acids specified by the mRNA template according to the genetic code. The ribosome accepts charged tRNAs, and as it steps along the mRNA, it catalyzes bonding between the new amino acid and the end of the growing polypeptide. The entire mRNA is translated in three-nucleotide “steps” of the ribosome. When a stop codon is encountered, a release factor binds and dissociates the components and frees the new protein.