Skip to main content
Biology LibreTexts

3.3: Transcription of RNA

  • Page ID
    135663
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)
    Biotech Focus

    The process of transcribing DNA into RNA is a complex system involving numerous enzyme complexes all working together in a highly coordinated fashion. In the 1990s, a research group led by Roger Kornberg at Stanford University successfully developed an in vitro transcription system from baker’s yeast that faithfully replicated the in vivo process found in cells. The group then used this system to isolate several dozen proteins required for transcription, including the Mediator protein complex required for efficient transcription in eukaryotes. Thanks to this groundbreaking research, we now know that the protein components of transcription are remarkably conserved across the spectrum of eukaryotes, from yeast to human cells. Based on this work, Roger Kornberg won the Nobel Prize in Chemistry in 2006.

    Read Roger Kornberg’s recollections of his research that culminated in this Nobel Prize.

    Introduction

    The goal of transcription is to make an RNA copy of a gene's DNA sequence. Transcription is the first step in gene expression, in which information from a gene is used to produce a functional product, such as RNA or a protein. This flow of genetic information is also known the central dogma of molecular biology (Figure \(\PageIndex{1}\)), which states that a gene becomes transcribed and then can be translated into a protein.

    For a protein-coding gene, the RNA copy, made through transcription, carries the information needed to build the protein. Transcription is relatively straightforward. RNA nucleotides, complementary to the DNA template, are linked together to form an RNA copy known as an RNA transcript. The translation of this transcript into a polypeptide sequence is more complex and will require the reading of RNA nucleotides in a specific manner. This process will be covered in Chapter 3.4: Translation.

    three circles connected to each other by arrows
    Figure \(\PageIndex{1}\): The Central Dogma of Molecular Biology. DNA is transcribed into RNA which is then translated into protein. (Central Dogma, from LibreText, CC BY-SA 4.0)
    Learning Objectives

    RNA is a nucleic acid that converts the genetic code found in DNA into the proteins that perform cellular functions.At the end of this section, you will be able to:

    • List and explain types of RNA found in cells
    • Describe the modifications needed to create mRNA, rRNA, and tRNA
    • Explain the structure of a protein-coding gene and its regulatory sequences (i.e., the transcription unit)
    • List the types of RNA polymerases involved in transcription
    • List and explain the stages of transcription
    • Explain the steps involved in mRNA processing
    • Explain the spliceosome and its role in spicing
    • Explain alternative splicing and its importance

    Types of RNA

    RNA, like DNA, is a nucleic acid made up of nucleotides. However, there some important differences between RNA and DNA. RNA nucleotides have a pentose sugar, named ribose, that has a hydroxyl group (OH), rather than a hydrogen, at the 2' carbon. The bases of RNA are adenine, guanine, cytosine, and uracil. Finally, RNA is a single polynucleotide chain. For more information about nucleic acids, refer to Chapter 2.7 Nucleic Acids.

    There are several types of RNA found in cells. These RNA types can be broken down into two categories: coding and non-coding. There is only one type of RNA that is classified as coding. This RNA is known as messenger RNA or mRNA and it is produced following the transcription of protein-coding genes in the nucleus.

    There are numerous types of non-coding RNA molecules, including:

    • ribosomal RNA or rRNA
    • transfer RNA of tRNA
    • microRNA or miRNA
    • small nuclear RNA or snRNA

    Non-coding RNA molecules, like coding RNA, are transcribed. However, they are not translated into proteins. More about protein translation can be found in Chapter 3.4: Translation.

    mRNA

    Messenger RNA, or mRNA, is produced upon transcription of protein-coding genes. This transcription takes place in the nucleoplasm, the fluid of the nucleus. Following transcription, this RNA transcript, which is often called pre-mRNA, is exported out into the cytoplasm to be modified into mRNA.

    Pre-mRNA processing into mRNA involves three events (Figure \(\PageIndex{2}\)):

    • addition of a modified guanine nucleotide, known as the 5'-methylated cap, to the 5' end of the pre-mRNA
    • addition of a long stretch of adenine nucleotides, known as the poly-A tail, to the 3' end of the pre-mRNA
    • splicing to remove all introns from the pre-mRNA
    details found in the caption
    Figure \(\PageIndex{2}\): mRNA processing. Mature mRNA is processed from the pre-mRNA molecule (i.e. RNA transcript) through the addition of a 5'-methylated cap to the 5' end and a polyA tail added to the 3' end. Introns are removed through splicing. (mRNA Maturation by Editorzcj, CC BY-SA 4.0)

    rRNA

    Ribosomal RNA, or rRNA, makes up the majority of RNA found in cells. rRNA is produced upon transcription of a class of genes called ribosomal genes. This transcription takes place in the nucleolus of the nucleus. Following transcription, the rRNA is folded into a series of stems and loops, resulting in a characteristic secondary structure (Figure \(\PageIndex{3}\)). This mature rRNA will become part of the ribosome, the "machine" for translation of proteins. rRNA molecules are named based on their size. rRNA size is quantified using a technique called sedimentation. In sedimentation, molecules are added to columns of sugar and then centrifuged. The rate of sedimentation through the column is denoted with a capital "S". Larger rRNA molecules will have a faster sedimentation rate and a larger "S" value in comparison to smaller rRNA molecules. The types of rRNA molecules made are dependent upon species. For example, humans have the following rRNA molecules: 5S, 5.8S, 18S, and 28S. In contrast, E.coli cells produce 5S, 16S, and 23S rRNA molecules. More about rRNA can be found in Chapter 3.4: Translation.

    details found in the caption
    Figure \(\PageIndex{3}\): rRNA molecules (A) The stem-loop configuration of rRNA is produced through complementary base pairing to create a stem, with the non-complementary bases producing a loop. (Stem-loop by Sakurambo, CC BY 2.5)
    (B) The 4 types of rRNA molecules found in human cells are 18S and a complex made of 28S, 5.8S, and 5S. (adapted from Petrov et al, CC BY-SA 4.0)

    tRNA

    Transfer RNA, or tRNA, is transcribed in the nucleoplasm of the nucleus. tRNA is a short, single-stranded RNA molecule approximately 80 nucleotides in length that is then folded into a characteristic secondary structure of stems and loops not unlike rRNA. This secondary structure is then folded further to create a unique tertiary structure (Figure \(\PageIndex{4}\)). More about tRNA can be found in Chapter 3.4: Translation.

    details found in the caption
    Figure \(\PageIndex{4}\): The tRNA molecule. (A) The secondary structure of the tRNA molecule shows numerous stem and loop regions. (B) The tertiary structure shows the proposed three-dimensional structure of a tRNA molecule. (tRNA-Phe by Yikrazuul, CC BY-SA 3.0)

    Genes are Transcribed

    Not every stretch of DNA in our genome is transcribed into RNA. A significant portion of our genome is considered to be "junk" DNA, sequences to which a function has not yet been assigned. Those stretches of DNA that are transcribed are known as genes. A gene is defined as a sequence of DNA that is transcribed into functional RNA. Protein-coding genes are transcribed into coding RNA (i.e. mRNA). Non-protein coding genes are transcribed into a specific type of non-coding RNA.

    The eukaryotic gene is part of a "transcription unit", which is comprised of the stretch of DNA that is transcribed, in addition to several regulatory sequences that will control transcription (Figure \(\PageIndex{5}\)).

    For a protein-coding gene, the gene itself is made of three regions:

    • the coding sequence (CDS), which is the majority of the transcribed DNA. The coding sequence is often called an Open Reading Frame (ORF). The CDS/ORF is made of exons, which are DNA sequences that will be translated into amino acids, and introns, which are intervening stretches of DNA that are not translated
    • the 5' untranslated region (5'UTR), a regulatory DNA sequence before (upstream from) the coding sequence
    • the 3' untranslated region (3'UTR), a regulatory DNA sequence after (downstream from) the coding sequence

    Both prokaryotic and eukaryotic protein-coding genes will have UTRs flanking the coding sequence. However, these UTRs are significantly shorter in prokaryotes.

    Associated with the gene are regulatory sequences known as the promotor and the enhancer. The promoter is a DNA sequence located immediately upstream of the gene that "promotes" transcription. The majority of prokaryotic and eukaryotic genes will have some type of promoter. In both prokaryotes and eukaryotes, the promoter region of most genes typically contains a sequence of DNA known as a core promoter. The DNA sequence of this core promoter is very similar in many prokaryotic and eukaryotic genes (i.e., is conserved). The most well-known eukaryotic core promoter has the sequence "TATAAA" and is known as the TATA box. In prokaryotic cells, this core promoter sequence is "TATAAT" and is called the Pribnow box, after its discoverer. In eukaryotes, additional sequences can be found next to the core promoter. These sequences are called proximal promoter elements (PPEs) and collectively they are known as the proximal promoter. The sequences making up the proximal promoter will vary from gene to gene. If present, they function as regulators of transcription.

    In addition to a promoter region, eukaryotic genes are also associated with a stretch of DNA known as the enhancer. As its name implies, the enhancer "enhances" the efficiency of transcription. The enhancer is usually found at a significant distance upstream of the promoter and gene, although some enhancers can be found downstream of the gene. The enhancer is a sequence that is unique to eukaryotic cells and is not found in prokaryotes.

    details found in the caption
    Figure \(\PageIndex{5}\): The transcription unit of a eukaryotic gene. The transcription unit includes a coding sequence/CDS (shown in red) and regulatory sequences known as an enhancer and a promoter (shown in yellow). The promoter region of a transcription unit often contains distinct sequences known as the core promoter and the proximal promoter. The coding sequence contains sequences that will be expressed as amino acids in the polypeptide (exons) and regions that will be spliced out (introns). The coding sequence is also flanked on either side by untranslated regions, or UTRs (shown in blue). (The Transcription Unit by Patricia Zuk, CC BY 4.0; adapted from Gene Structure by Thomas Shafee, CC BY-SA 4.0)

    Transcription by the RNA polymerase

    The enzyme complex responsible for transcription is the RNA polymerase. In prokaryotes, there is a single RNA polymerase that transcribes all genes. In eukaryotes, there are three RNA polymerases: RNA polymerase I, RNA polymerase II, and RNA polymerase III. Each of these are responsible for the transcription of specific RNA molecules in the eukaryotic cell. Table \(\PageIndex{1}\) summarizes these polymerases.

    Table \(\PageIndex{1}\): Comparison of RNA polymerases
    RNA Polymerase Transcription Location RNAs transcribed
    RNA polymerase I nucleolus 5.8S, 18S, and 28S rRNA
    RNA polymerase II nucleoplasm mRNA
    RNA polymerase III nucleoplasm 5S rRNA, tRNA

    Both prokaryotic and eukaryotic RNA polymerases are made of several subunits, with each subunit performing a specific task in transcription. For example, specific subunits in the polymerase act to recognize the promoter sequence, open the DNA helix (i.e., act like a helicase), or hold the two DNA strands apart (i.e. act like SSBs). Additional subunits bind to the DNA template and synthesize the RNA transcript by linking RNA nucleotides together.

    Transcription: from DNA to RNA

    In both prokaryotes and eukaryotes, transcription occurs in three stages: initiation, elongation, and termination.

    Initiation

    Transcription begins with the binding of a large complex of proteins, called the Transcription Initiation Complex, to the promoter of the transcription unit (Figure \(\PageIndex{6}\)). Contained with this complex is the RNA polymerase. The RNA polymerase recognizes the sequence of the core promoter and binds to the DNA at that region. This binding places the RNA polymerase in close proximity to the first nucleotide to be transcribed, otherwise known as the transcription start site (TSS). In eukaryotes, the TSS is found in the 5' UTR. Polymerase binding in eukaryotes is made more efficient by the proximal promoter elements found next to the core promoter. In eukaryotes, the binding of the RNA polymerase to the promoter region is further enhanced by the presence of DNA-binding proteins called transcription factors. Transcription factors called general transcription factors will help position the RNA polymerase at the core promoter, while more specialized transcription factors, called specific transcription factors, will bind to the proximal promoter and help regulate RNA polymerase activity. Additional transcription factors, known as activators, bind the gene's enhancer, significantly enhancing the initiation of transcription by the RNA polymerase. In order to do this, a "DNA bending protein" bends the DNA so that the enhancer sequence with its activators is brought into proximity of the promoter. Additional proteins, forming a mediator complex, facilitates the interaction between the activators at the enhancer and the transcription factors at the promoter so that the initiation of transcription occurs efficiently and rapidly. Together with the RNA polymerase, the transcription factors, activators, and the mediator complex make up a significant portion of the Transcription Initiation Complex. Once this complex is finished forming, the RNA polymerase acts to partially unwind the DNA helix at the TSS creating a "transcription bubble". Transcription then enters the elongation stage.

    details found in the caption
    Figure \(\PageIndex{6}\): The transcription initiation complex. The start of transcription in eukaryotes requires a transcription initiation complex containing several proteins, including transcription factors, activators, and a mediator protein complex that facilitates the interactions between these proteins. A DNA bending protein bends the DNA, bringing the enhancer region close to the promoter region. The transcription initiation complex forms to increase the efficiency of RNA polymerase (RNA pol) binding to the promoter and the initiation of transcription at the transcription start site (TSS). (The Transcription Initiation Complex by Patricia Zuk, CC BY 4.0; adapted from Regulation of transcription by Bernstein0275, CC BY-SA 4.0)

    Elongation

    Elongation in both prokaryotes and eukaryotes occurs when the RNA polymerase reads the DNA sequence and links the incoming RNA nucleotides together to produce the RNA transcript (Figure \(\PageIndex{7}\)). The RNA polymerase reads the DNA strand with the 3' to 5' orientation so that it can synthesize the RNA transcript in the 5' to 3' direction. This DNA strand is referred to as the template strand. The strand in the 5' to 3' direction is not read by the RNA polymerase and is called the non-template strand. During elongation, the RNA polymerase acts as a helicase to unwind the DNA helix, increasing the size of the transcription bubble. It holds the two DNA strands away from each other, acting as single-stranded binding proteins (SSBs). The RNA polymerase also does not require primers to bind to the DNA template strand. As such, a primase is not required for transcription. The RNA transcript that is made during elongation is complementary to the template strand, with the nucleotide uracil (U) used in place of thymine (T). This means if the template sequence is TAAGTG, the RNA transcript sequence would be UUCAC.

    details in caption
    Figure \(\PageIndex{7}\): Transcription elongation. During transcription elongation, the DNA strand with the 3' to 5' orientation is used as the template for transcription (template DNA strand, red letters). The DNA strand with the 5' to 3' orientation (non-template strand, black letters) is not involved in transcription. In this figure, the template DNA sequence of TACGGCGTAAGTG is transcribed into an RNA strand with the sequence AUGCCGCAAUUCAC. (Transcription Elongation by Patricia Zuk, CC BY 4.0)

    Termination

    Once the gene is transcribed, the RNA polymerase needs to dissociate from the template strand and release the newly made RNA transcript. This is happens at a termination sequence located at the 3' end of the gene. The termination sequence can vary depending on the gene and on whether that gene is prokaryotic or eukaryotic. In eukaryotes, the termination sequence is found within the 3'UTR. The sequence is usually comprised of repeated nucleotide sequences that result in the RNA polymerase stalling, detaching from the template, and freeing the RNA transcript.

    Concept in Action
    Animation: Transcription

    mRNA Processing in Eukaryotes

    Newly transcribed RNA molecules made from eukaryotic protein-coding genes undergo specific processing steps in order to turn them into mature mRNA molecules. As listed in the section on mRNA above, these modifications are: the addition of a "cap" to the 5' end of the RNA, the addition of a "poly-A tail" to the 3' end of the RNA, and the removal of introns within the coding sequence of the gene. These three modifications must be completed before the resulting mRNA can be transferred from the nucleus to the cytoplasm so that it can translated into a protein. These modifications also increase the stability of the mRNA once in the cytoplasm. As a result, eukaryotic mRNAs last for several hours, whereas the typical prokaryotic mRNA, which lack most of these modifications, lasts no more than five seconds before being degraded.

    Addition of the "Cap"

    As the RNA transcript is being transcribed, a modified guanine nucleotide called 7-methylguanosine, is added as a “cap” to the 5' end of the growing RNA transcript through an unusual 5'-to-5' triphosphate linkage. This nucleotide is referred to as the 5' methylated cap owing to the presence of a methyl group attached to the guanine base of the nucleotide (Figure \(\PageIndex{8}\)). The 5' methylated cap plays a role in the export of the mRNA from the nucleus and protects the 5' end of the mRNA from degradation by RNAses, RNA-degrading enzymes. As will be discussed in Chapter 3.4: Translation, the cap also helps to regulate the start of translation.

    the chemical structure of the 5 prime cap attached to the mRNA
    Figure \(\PageIndex{8}\): The 5' methylated cap of mRNA. mRNA is capped with a unique nucleotide called 7-methylguanosine (pink) that is attached to the 5' end of the mRNA (blue) through a unique 5' to 5' triphosphate bridge. (5' cap structure by Zephyris, CC BY-SA 3.0)

    Addition of the "Tail"

    Following the termination of transcription, a long chain of adenine nucleotides, called the poly-A tail, is added to the 3' end of the RNA transcript through a process called polyadenylation. A sequence in the 3'UTR region, known as the polyadenylation signal (AAUAAA) determines the location of this poly-A tail. During polyadenylation, the majority of the 3'UTR is cleaved off and a specialized RNA polymerase, called polyA polymerase, catalyzes the addition of about 250 adenine nucleotides. Like the cap, the poly-A tail plays a role in nuclear export, protects the RNA transcript from RNAse activity (at the 3' end) and plays a role in regulating translation. Interestingly, the addition of a poly-A tail to prokaryotic transcripts enhances mRNA degradation.

    Splicing

    The coding sequence of a eukaryotic protein-coding gene will contain regions that will be translated into the amino acids of a polypeptide chain. These regions are called exons because they are expressed. In addition, there are regions that will not be translated as amino acids. These intervening sequences are called introns. Introns must be removed or "spliced out" of the RNA transcript and the exons ligated or joined together during processing. The process of removing introns and reconnecting exons is called splicing. Splicing take place in the nucleus by a complex of RNA and proteins, called a spliceosome. The spliceosome is made up of five snRP subunits, with each snRP comprised of a stretch of snRNA (small, nuclear RNA) and supportive proteins (Figure \(\PageIndex{9}\)).

    details in caption
    Figure \(\PageIndex{9}\): The spliceosome. Splicing is performed in eukaryotes by a spliceosome. The spliceosome is made of subunits called small nuclear ribonucleoproteins (snRPs). Each snRP is made of protein and a piece of small nuclear RNA (snRNA). Five snRPs (U1, U2, U4, U5, U6) assemble to form the functional spliceosome. (The Spliceosome by Patricia Zuk, CC BY 4.0)

    The individual snRPs (U1, U2, U4, U5, and U6) assemble in a specific order on the coding sequence of the RNA transcript. They assemble at sequences flanking the intron called splice sites. The sequence at the start of an intron is called the donor site and usually has the nucleotides "GU" somewhere within the sequence. The sequence at the end of the intron is called the acceptor site and contains the nucleotides "AG" within its sequence. These two sequences are recognized by specific parts of the spliceosome. Once the spliceosome forms, it cuts out the intron and ligates the flanking exons (Figure \(\PageIndex{10}\)). Following splicing, translation of the coding sequence produces the polypeptide that will become the functional protein.

    details found in the caption
    Figure \(\PageIndex{10}\): Intron Splicing in eukaryotes. DNA is transcribed into an RNA transcript containing introns and exons. The spliceosome (blue circles) assembles at an intron using splice site sequences found flanking the intron (shown in black). Each intron is spliced out and exons are ligated together by the spliceosome. Translation of the spliced mRNA produces a polypeptide which will become a functional protein. (Intron Splicing by Patricia Zuk, CC BY 4.0; adapted from Transcript and Splicing by Ganeshmanohar, CC BY-SA 4.0)

    The splicing activity of the spliceosome is actually performed by the snRNA of the spliceosome and not the proteins. An RNA with enzymatic activity is classified as a ribozyme. Therefore, the spliceosome is a ribozyme.

    Concept in Action
    Animation: Splicing

    While all introns must be spliced from eukaryotic mRNA, exons also can be spliced. The removal of exons by the spliceosome creates novel mRNA transcripts and new "versions" of polypeptides. In this way, multiple polypeptide isoforms can be made from one coding sequence, depending on the exons spliced out. The process of exon removal to create different polypeptide isoforms is known as alternative splicing (Figure \(\PageIndex{11}\)). Alternative splicing increases genetic diversity as more than one polypeptide can be made from a single mRNA.

    details found in the caption
    Figure \(\PageIndex{11}\): Alternative splicing of RNA in eukaryotes.The removal of different exons can produce distinct, alternatively spliced mRNA molecules that, when translated, produce different protein isoforms. In splice plan "A", exon 3 is spliced out of the RNA transcription along with all introns; exons 1, 2, and 4 are retained. In splice plan "B", exon 2 is spliced out of the RNA transcription along with all introns; exons 1, 3, and 4 are retained. As each exon produces a region of the polypeptide, different polypeptide isoforms can be produced. (Alternative Splicing by Patricia Zuk, CC BY 4.0; adapted from Splicing Overview by Agathman, CC BY-SA 3.0)

    Splicing must be carefully performed by the spliceosome. If the splicing process errs by even a single nucleotide, the nucleotide sequence of the rejoined exons would be shifted, and the resulting protein would be nonfunctional following translation.

    Key Concepts

    Transcription is the first step in gene expression in which the DNA coding sequence is copied to make an RNA molecule. However, distinctions do exist. Protein-coding genes in both prokaryotes and eukaryotes are transcribed into mRNA molecules. These mRNA molecules will form the starting point for the last step in gene expression: translation. Transcription is performed by an RNA polymerase, which links nucleotides to form an RNA transcript.

    The major concepts to remember are:

    • Prokaryotes and eukaryotes share many similarities when undergoing transcription
    • Only the DNA strand with the 3' to 5' orientation serves as a template for transcription
    • Transcription has three stages: initiation, elongation, and termination
    • Transcription of DNA into RNA is performed by an RNA polymerase
    • Protein-coding genes are transcribed into mRNA molecules
    • In eukaryotes, mRNA is made through the addition of a 5' cap, and a 3' poly-A tail to the RNA transcript. All introns are cut out of the mRNA coding sequence through splicing.
    • In eukaryotes, most protein-coding genes require the formation of a transcription initiation complex to increase the efficiency of transcription

    Glossary

    5' methylated cap: a modified guanine (G) nucleotide found at the 5' end of the mRNA strand; prevents mRNA degredation

    3' poly-A tail: a long chain of adenine (A) nucleotides attached to the 3' end of the mRNA strand

    Activators: DNA binding proteins that bind to the enhancer regions of a DNA strand

    Alternative splicing: the process that allows a single gene to code for multiple proteins by varying the combination of exons included in the final mRNA

    Coding sequence: the section of the DNA/RNA sequence that is translated into a polypeptide strand; composed of exons and introns; has the same sequence as the transcribed RNA (with thymine replaced with uracil)

    Enhancer: a sequence of DNA that binds activators and enhances the process of transcription

    Exon: a section of the coding sequence that is translated into amino acids

    Intron: a section of the coding sequence that is not translated

    Mediator protein: a large complex of proteins that mediates the interaction between activator proteins and transcription factors during transcription initiation

    Messenger RNA (mRNA): a sequence of RNA produced through the process of transcription; specifies the sequence of amino acids following translation; comprised of a coding sequence and flanked by a 5' methylated cap and a 3' poly-A tail

    Pre-mRNA: unmodified RNA produced through transcription; must be modified through the addition of a 5' cap, a 3' poly-A tail, and splicing; also called an RNA transcript

    Polyadenylation: the addition of a long chain of adenine nucleotides (called the poly-A tail) to the 3' end of mRNA; enhances stability of the mRNA and translation

    Promoter sequence: a sequence of DNA where the RNA polymerase binds; determines the site of transcription initiation

    Ribosomal RNA (rRNA): a type of RNA that is a structural component of ribosomes; aids in protein synthesis

    RNA polymerase: an enzyme that reads the DNA template strand and synthesizes an complementary RNA strand using RNA nucleotides

    RNA processing: the modifications made to pre-mRNA, including capping, splicing, and polyadenylation, to produce mature mRNA

    RNA transcription: the process by which a DNA sequence is copied into a complementary RNA strand

    Spliceosome: a large complex of snRPs that splices out introns (and exons) from the coding sequence of mRNA

    Small nuclear RNA (snRNA): a small sequence of RNA found as part of a small nuclear ribonucleoprotein (snRNP)

    Small nuclear ribonucleoprotein (snRNP): a combination of an snRNA molecule and a protein; also known as a ribonucleoprotein (RNP)

    Splicing: the process of removing introns (and exons) from the pre-mRNA transcript; also called RNA splicing

    TATA Box: a specific DNA sequence found in the promoter region that helps position RNA polymerase for transcription initiation

    Template strand: the DNA strand copied into a complementary RNA strand through the process of transcription; has the 3' to 5' orientation; also called the anti-sense strand

    Termination Sequence: a specific DNA sequence that signals the end of transcription

    Transcription factors: DNA binding proteins that bind specific sequences of DNA and assist in the initiation and regulation of transcription

    Transfer RNA (tRNA): an RNA molecule that helps decode mRNA sequences into proteins by carrying specific amino acids to the ribosome

    Untranslated region (UTR): a stretch of RNA that is not translated; contains important regulatory sequences for transcription and translation; found in front of the coding sequence (5' UTR) and behind the coding sequence (3' UTR); found in both DNA and RNA


    3.3: Transcription of RNA is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by LibreTexts.